Since 2023, BristolMyersSquibb, the Y company and cynkra have teamed up to develop a novel no-code solution for R.
library(blockr)
Attaching package: 'blockr'
The following object is masked from 'package:graphics':
layout
library(pracma)
library(shiny)
Introduction
blockr is an R package designed to democratize data analysis by providing a flexible, intuitive, and code-free approach to building data pipelines. It has 2 main user targets:
- On the one hand, it empowers non technical users to create insightful data workflows using pre-built blocks that can be easily connected, all without writing a single line of code.
- On the other hand, it provides developers with a set of tools to seamlessly create new blocks, thereby enhancing the entire framework and fostering collaboration within organizations teams.
blockr is data agnostic, meaning it can work with any kind of dataset, that is pharmaceutical data or sport analytics data. It builds on top of shiny to ensure real time feedback to any data change. Finally, it allows to export code to create reproducible data analysis.
Getting started
As a simple user
As a simple user, youâre not expected to write any single line of code to use blockr. You can use the below kitchen sink to get started. This example is based on the palmer penguins data and running a single stack with 3 blocks: the first block to select the data, another one to create the plot and then add the points to it.
blockr has a its own validation system. For instance, using the below example, you can try to press return on the first block select box (penguins is the selected default). Youâll notice an immediate feedback message. A global message is displayed in the block upper middle part: â1 error(s) found in this blockâ. You get more detailed mesages next to the faulty input(s): âselected value(s) not among provided choicesâ. You can repeat the same experience with the last plot layer block, by emptying the color and shape select inputs. Error messages can accumulate.
You can dynamically add blocks to a current stack, that gathers a
set of related blocks. You can think a stack as a data analysis
recipe as in cooking, where blocks are instructions. To add a new
block, you can click on the +
icon on the stack top right corner. This
opens a sidebar on the left side, where one may search for blocks that
are compatible with the current state of the pipeline. With an empty
stack, only entry point blocks are suggested, so you can import data.
Then, after clicking on the block, the suggestion list changes so you
can, for instance, filter data or select only a subset of columns, and
much more.
library(blockr)
library(palmerpenguins)
library(ggplot2)
new_ggplot_block <- function(col_x = character(), col_y = character(), ...) {
data_cols <- function(data) colnames(data)
new_block(
fields = list(
x = new_select_field(col_x, data_cols, type = "name"),
y = new_select_field(col_y, data_cols, type = "name")
),
expr = quote(
ggplot(mapping = aes(x = .(x), y = .(y)))
),
class = c("ggplot_block", "plot_block"),
...
)
}
new_geompoint_block <- function(color = character(), shape = character(), ...) {
data_cols <- function(data) colnames(data$data)
new_block(
fields = list(
color = new_select_field(color, data_cols, type = "name"),
shape = new_select_field(shape, data_cols, type = "name")
),
expr = quote(
geom_point(aes(color = .(color), shape = .(shape)), size = 2)
),
class = c("geompoint_block", "plot_layer_block", "plot_block"),
...
)
}
stack <- new_stack(
data_block = new_dataset_block("penguins", "palmerpenguins"),
plot_block = new_ggplot_block("flipper_length_mm", "body_mass_g"),
layer_block = new_geompoint_block("species", "species")
)
serve_stack(stack)
Toward more complex analysis
Letâs consider this dataset, which contains 120 years of olympics athletes data until Rio in 2016. In the below kitchen sink, we first add an upload block:
- Download the dataset file locally.
- CLick on
Add stack
. - Click on the stack
+
button and search forbrowser
, then select thenew_filesbrowser_block
. - Uncollapse the stack by click on the top right arrow icon. This makes the upload block file input visible.
- Click on
File select
and select the downloaded file at step 1 (athlete_events.csv
). - As we obtain a csv file, we must parse it with a
new_csv_block
. Repeat step 3 to add thenew_csv_block
. The table is271116
rows and15
columns. - Add a
new_filter_block
and selectSex
as column and thenF
in the values input. We leave the comparison to==
and click on theRun
button. Notice we now have 74522 rows. - Add a
new_mutate_block
with the following expression:birth_year = Year - Age
(this gives us an approximate birth year). Click on submit.
From now on, we leave the first stack as is and will reuse it in other stacks. We want to display the average height distribution for female athletes. Letâs do it below.
- Create a new stack by clicking on
Add stack
. - Add it a
new_result_block
. This allows to import the data from the first stack (and potentially any stack from the dashboard). If you donât see any data, select another stack name from the dropdown menu. - Add a
new_ggplot_block
, leavex
as default function and selectHeight
as variable in the columns input. - Add a
new_geomhistogram_block
. Now we have our distribution plot.
Alternatively, you could remove the 2 plot blocks and add a
new_summarize_block
using mean
as function and Height
as column
(result: 168 cm).
In the following, we create a look-up table to be able to retrieve the
athlete names based on their ID
.
- Create a new stack.
- Add a result block to import data from the very first stack.
- Add a
new_select_block
and only selectID
,Name
,birth_year
,Team
andSport
as columns.
Our goal is now to find which athlete did 2 or more different sports.
- Create a new stack.
- Add a result block to import data from the very first stack.
- Add a
new_filter_block
, selectMedal
as column,!=
as comparison operator and leave the value empty. Click on run, which will only get athletes with medals. - Add a
new_group_by_block
, grouping byID
(as some athletes have the same name). - Add a
new_summarize_block
by choising the functionn_distinct
applied on theSport
columns. - Add a
new_filter_block
, selectN_DISTINCT
as column,>=
as comparison operator and set the value to 2. Click on run. This gives us the athletes that are doing 2 sports or more. - Add a
new_join_block
. Selectleft_join
as join function, select the third stack (lookup table) as join table andID
as column. - Add a
new_arrange_block
for thebirth_year
column.
As a conclusion, Hjrdis Viktoria Tpel (1904) was the first recorded athlete to compete in 2 different sports, swimming and diving for Sweden. Lauryn Chenet Williams (1984) is the latest for US with Athletics and Bobsleigh. Itâs actually quite amazing to see people competing in two quite unrelated sports like swimming and handbain the case of Roswitha Krause.
library(blockr)
library(blockr.ggplot2)
options(shiny.maxRequestSize = 100 * 1024^2)
do.call(set_workspace, args = list(title = "My workspace"))
serve_workspace(clear = FALSE)
As an end-user, you are not supposed to write code. As such, if you think anything is missing, you can open an issue here, or ask any developer you are working with to create new blocks. This leads us to the second part of this blog post ⌠How to use blockr as a developers?
As a developer
How to install it:
pak::pak("BristolMyersSquibb/blockr")
blockr canât provide any single data manipulation or visualization block. Thatâs the reason why we made it easily extensible. You can get an introduction to blockr for developers here.
In the following, we create an ordinary differential equations solver block using the pracma package. We choose the Lorenz attractor. With R, equations may be written as:
lorenz <- function(t, y, parms) {
c(
X = parms[1] * y[1] + y[2] * y[3],
Y = parms[2] * (y[2] - y[3]),
Z = -y[1] * y[2] + parms[3] * y[2] - y[3]
)
}
where t
is the time, y
a vector of solutions and params
the
various parameters. If you are familiar with
deSolve,
equations are defined with similar functions. For this blog post, we
selected pracma as deSolve does not run in shinylive, so you could not
see the embedded demonstration.
Add interactivity with the fields
We want to add interactivity on the 3 different parameters. Hence, we
create our new block function with 3 fields inside a list. Since the
expected values are numbers, we leverage the new_numeric_field
.
Parameters are only explicitly shown for the first field:
new_ode_block <- function(...) {
fields <- list(
a = new_numeric_field(value = -8 / 3, min = -10, max = 20),
b = new_numeric_field(-10, -50, 100),
c = new_numeric_field(28, 1, 100)
)
# TBD
# ...
}
As you may imagine, these fields are subsequently translated into shiny
inputs, that is numericInput
in our example. If you face a situation
where you need to implement a custom field, not included in blockr, you
can read this
vignette.
Create the block expression
As next step, we instantiate our block with the new_block
blockr
constructor:
new_block(
fields = fields,
expr = quote(<EXPR>),
...,
class = <CLASSES>,
submit = FALSE
)
A block is composed of fields, a quoted expression which involved
fields (to delay the evaluation), somes classes which control the
block behavior, and extra parameters passed with ...
. Finally,
submit
allows to delay the block evaluation by requiring the user to
click on a submit button (FALSE by default). This prevents from
triggering unwanted intensive computations.
In our example, the expression calls the ode45
function. Notice the
usage of substitute
to inject the lorenz
function within the
expression. This is necessary since lorenz
is defined outside of the
expression, and using quote
would fail. Fields are invoked with
.(field_name)
, a rather strange notation, required by bquote
to
process the expression. It is not mandory to understand this technical
underlying detail, but this standard must be respected. You may also
notice that some parameters like the initial conditions y0
or time
values are hardcoded. We leave the reader to transform them into fields,
as an exercise:
new_block(
fields = fields,
expr = substitute(
as.data.frame(
ode45(
fun,
y0 = c(X = 1, Y = 1, Z = 1),
t0 = 0,
tfinal = 100,
parms = c(.(a), .(b), .(c))
)
),
list(fun = lorenz)
)
# TBD
)
Add the right classes
We give our block 2 classes, namely ode_block
and data_block
:
new_ode_block <- function(...) {
fields <- list(
a = new_numeric_field(-8 / 3, -10, 20),
b = new_numeric_field(-10, -50, 100),
c = new_numeric_field(28, 1, 100)
)
new_block(
fields = fields,
expr = substitute(
as.data.frame(
ode45(
fun,
y0 = c(X = 1, Y = 1, Z = 1),
t0 = 0,
tfinal = 100,
parms = c(.(a), .(b), .(c))
)
),
list(fun = lorenz)
),
...,
class = c("ode_block", "data_block")
)
}
As explained earlier, they are required to control the block behavior,
as blockr is build with S3. For
instance, data_block
have a specific evaluation method, to
calculate the expression:
evaluate_block.data_block <- function (x, ...)
{
stopifnot(...length() == 0L)
eval(generate_code(x), new.env())
}
where generate_code
processes the block code. Data blocks are
considered as entry point blocks, as opposed to transformation
blocks, that operate on data. Therefore, you may easily understand that
the evaluation method for a transform block requires to pass the data
from the previous block with %>%
:
evaluate_block.block <- function (x, data, ...)
{
stopifnot(...length() == 0L)
eval(substitute(data %>% expr, list(expr = generate_code(x))), list(data = data))
}
If you want to build a plot block and plot layers blocks, you would have
to design a specific evaluate method, that accounts for the +
operator
required by ggplot2. To learn more about how to create a plot block, you
can read this
article.
Demo
library(blockr)
library(pracma)
library(blockr.ggplot2)
lorenz <- function(t, y, parms) {
c(
X = parms[1] * y[1] + y[2] * y[3],
Y = parms[2] * (y[2] - y[3]),
Z = -y[1] * y[2] + parms[3] * y[2] - y[3]
)
}
new_ode_block <- function(...) {
fields <- list(
a = new_numeric_field(-8 / 3, -10, 20),
b = new_numeric_field(-10, -50, 100),
c = new_numeric_field(28, 1, 100)
)
new_block(
fields = fields,
expr = substitute(
as.data.frame(
ode45(
fun,
y0 = c(X = 1, Y = 1, Z = 1),
t0 = 0,
tfinal = 100,
parms = c(.(a), .(b), .(c))
)
),
list(fun = lorenz)
),
...,
class = c("ode_block", "data_block")
)
}
stack <- new_stack(
new_ode_block,
new_ggplot_block(
func = c("x", "y"),
default_columns = c("y.1", "y.2")
),
new_geompoint_block
)
serve_stack(stack)
Packaging new blocks: the registry
In the above example, we define the block on the fly. However, an other outstanding feature of blockr is the registry, which you can see as a blocks supermarket. From the R side, the registry is an environment that can be extended by developers who bring their own blocks packages:
To get an overview of all available blocks within the blockr core
package, we call get_registry
:
get_registry()
ctor description category
1 arrange_block Arrange columns transform
2 csv_block Read a csv dataset parser
3 dataset_block Choose a dataset from a package data
4 filesbrowser_block Select files on the server file system data
5 filter_block filter rows in a table transform
6 group_by_block Group by columns transform
7 head_block Select n first rows of dataset transform
8 join_block Join 2 datasets transform
9 json_block Read a json dataset parser
10 mutate_block Mutate block transform
11 rds_block Read a rds dataset parser
12 result_block Shows result of another stack as data source data
13 select_block select columns in a table transform
14 summarize_block summarize data groups transform
15 upload_block Upload files from location data
16 xpt_block Read a xpt dataset parser
classes input output
1 arrange_block, transform_block, block data.frame data.frame
2 csv_block, parser_block, transform_block, block string data.frame
3 dataset_block, data_block, block <NA> data.frame
4 filesbrowser_block, data_block, block <NA> string
5 filter_block, transform_block, block data.frame data.frame
6 group_by_block, transform_block, block data.frame data.frame
7 head_block, transform_block, block data.frame data.frame
8 join_block, transform_block, block data.frame data.frame
9 json_block, parser_block, transform_block, block string data.frame
10 mutate_block, transform_block, block data.frame data.frame
11 rds_block, parser_block, transform_block, block string data.frame
12 result_block, data_block, block <NA> data.frame
13 select_block, transform_block, block data.frame data.frame
14 summarize_block, transform_block, block data.frame data.frame
15 upload_block, data_block, block <NA> string
16 xpt_block, parser_block, transform_block, block string data.frame
package
1 blockr
2 blockr
3 blockr
4 blockr
5 blockr
6 blockr
7 blockr
8 blockr
9 blockr
10 blockr
11 blockr
12 blockr
13 blockr
14 blockr
15 blockr
16 blockr
This function returns a dataframe containing information about blocks
such as their constructors, like new_ode_block
, the description, the
category (data, transform, plot ⌠this is user defined), classes,
accepted input, returned output and package.
To register a block we call register_block
(or register_blocks
for
multiple blocks):
register_my_blocks <- function() {
register_block(
constructor = new_ode_block,
name = "ode block",
description = "Computed the Lorent attractor solutions",
classes = c("ode_block", "data_block"),
input = NA_character_,
output = "data.frame",
package = "<YOUR_PACKAGE>",
category = "data"
)
# You can register any other blocks here ...
}
where <YOUR_PACKAGE>
must be replaced by your real package name.
Within a zzz.R
script, you can ensure to register any block when the
package loads with a hook:
.onLoad <- function(libname, pkgname) {
register_my_blocks()
invisible(NULL)
}
After the registration, you can check whether the registry is updated, by looking at the ode block:
register_my_blocks()
reg <- get_registry()
reg[reg$package == "<YOUR_PACKAGE>", ]
ctor description category
11 ode_block Computed the Lorent attractor solutions data
classes input output package
11 ode_block, data_block, block <NA> data.frame <YOUR_PACKAGE>