05 July 2017 / Luke Canavan
I was inspired to write a cookbook-style bokeh educational guide after reading Tom Auspurger’s recent Modern Pandas series. As a Bokeh developer who has written a lot of Bokeh visualizations, I’ve developed a some personal opinions on best practices for using Bokeh. A great resource for following along is the User Guide section of the Bokeh documentation.
Let’s define some Bokeh terminology before diving into things. In Bokeh, Models
represent all the parts of your visualization: plots, data sources,
axes, tools, etc. They are classes that have some defined properties
Plot models have a
plot_width), as well as
some associated event handling (i.e. changing the
Plot.background_fill_color property will cause the plot to re-render and
update the background_fill_color).
Glyphs are a subset of Models and represent Bokeh’s plotting primitives- the Lines, Circles and Rects that compose visualizations. You could think of them as the equivalent to D3’s elements.
Bokeh has two main API levels for assemble these Models and Glyphs into full-fledged visualization:
bokeh.plottingfor mid-level, with reasonable defaults for conciseness
bokeh.modelsfor low-level, very explicit development
Additionally, some libraries built on top of Bokeh offer an even higher level of abstraction such as HoloViews.
There are several valid Bokeh-development patterns, but I find the low-level,
primitives-based bokeh.models API to be ideal for developing custom, bespoke
visualizations. Under the hood, Bokeh generates a JSON blob called a
“scenegraph” that describes the visualization (a “Document” in Bokeh) and
is consumed by the BokehJS client library in order to render an output.
You can view this scenegraph via the
Document.to_json method, though it’s
generally not friendly by humans.
It’s helpful to know that all Bokeh visualizations boil down to a collection of models and attributes. In order to build up a scenegraph, you just have to build up a collection of models. There’s not any magic beyond that.
Please note, I haven’t used Bokeh much in an exploratory manner, so I don’t know how well the below workflow fits. Usually I have an idea of the shape of my data and what my intended visualization looks like.
The basis of using the low-level
bokeh.models API is building visualizations
Plot Model. A
Plot object is the foundation onto which any
Bokeh visualization is composed - it holds all of axes, grids, glyphs and
toolbar tools. Additionally, there are a few Plot-specific properties like
height, width and background color. A good reference for understanding these
You only have to specify a
attributes for it to be renderable. Creating a range-only Plot will yield an
empty rectangle when passed to the
bokeh.io.show method, which is exactly
what we expect. It’s an empty canvas without any axes, glyphs or tools!
from bokeh.models import Plot, Range1d plot = Plot(x_range=Range1d(), y_range=Range1d(), plot_height=200)
This is exciting stuff!
There are three range types and it’s helpful to pick the right one. You use the
Range1d Model when you want to set the start and end values of your range,
regardless of what data is being plotted. For example, I would use a Range1d
when I want a range to span from 0 - 100%, even if my data’s range is from 12 - 94%.
range = Range(start=0, end=100)
FactorRange Model is similar to the Range1d model, except that handles
categorical values instead of continuous ones. An example situation to use
a FactorRange is when plotting values by months of the year.
range = FactorRange(factors=["Jan", "Feb", "Mar", "Apr", "May", "Jun"])
You use a
DataRange1d Model when you want the range to reflect the extent
of your plotted data. For example, Dask.distributed uses Bokeh for its
diagnostic web interface link.
The interface takes advantage of a several DataRange1d models to plot the
progress of computation tasks and CPU/memory usage over time. The DataRange1d
inspects the min and max of the plotted values and automatically adjusts the
visible range accordingly.
range = DataRange1d()
One new requirement in Bokeh 0.12.6 is that range models correspond with an
Scale Model (which maps between screen and data space) on
the same dimension. If you have a misconfigured Range and Scale pair, you’ll
raise a validation error.
In general, I prefer to develop Bokeh visualizations in an additive way, where each element or interaction is explicitly inserted. The result is usually more verbose, but I feel the code is easier to write and read later.
I think it’s best to show a counter-example to demonstrate the model mutation
and deletion tangle I’m hoping to avoid. The core of the bokeh.plotting API is
figure method, which creates a subclassed instance of Plot with defaults
related to ranges, axes, grids, tools, etc.
from bokeh.plotting import figure plot = figure()
These defaults are very reasonable and many can be modified via arguments to
figure method. In my experience, however, they often include models I
need to remove or modify and my code starts to look like this:
from bokeh.plotting import figure plot = figure() # hide the x and y grids plot.xgrid.visible = False plot.ygrid.visible = False # change x axis defaults plot.xaxis.ticker = BasicTicker(num_desired_ticks=2) plot.xaxis.formatter = NumericalTickFormatter(format="0b") plot.xaxis.min_tick_line_width = 4 plot.xaxis.major_tick_line_width = 2
At this level of customization, it’s easier to explicitly set the property
values in the Models’
__init__ methods and not have created unwanted Models
in the first place. The code below generated the same output as above, but reads
plot = Plot() x_axis = LinearAxis( formatter=NumericalTickFormatter(format="0b"), ticker=BasicTicker(num_desired_ticks=2), min_tick_line_width=4, major_tick_line_width=2) y_axis = LinearAxis() plot.add_layout(x_axis, 'below') plot.add_layout(y_axis, 'left')
I generally only include a single tool with non-default attributes in finalized
visualizations, whether it’s a HoverTool with a custom tooltips or a TapTool
that initiates some sort of custom interaction. Therefore, instead of selecting
and mutating a tool that is added by default in via the
figure method, it’s
cleaner to explicitly create and add tools.
# Don't do this hover = plot.select_one(HoverTool) hover.tooltips = my_custom_tooltips hover.line_policy = 'nearest' pan = plot.select_one(PanTool) pan.dimension = 'x' # Do this instead hover_tool = HoverTool(tooltips=my_custom_tooltips, line_policy='nearest') pan_tool = PanTool(dimension="x") plot.add_tools(hover_tool, pan_tool)
figure method also supports adding tools like this or via the
init method (i.e. ``figure(tools=[hover_tool, pan_tool])), so it may be a
good habit to build.
Glyphs are added via the
Plot.add_glyph method, which requires creating
a ColumnDataSource instead of optionally creating one (or implicitly having
it happen internally) like the convenience glyphs methods of
The string keyword argument values are matched up against equally-named columns in the ColumnDataSource while single values are broadcast across all of the glyphs.
source = ColumnDataSource(data=dict( x=[5, 10, 15], y=[3, 2, 3], height=[1, 1, 1] )) plot.add_glyph( source, Rect(x='x', y='y', width=10, height='height', fill_color='red', line_color='blue'))
Using the bokeh.models interface trades a little extra typing for a simpler incremental approach to building interactive visualizations. Refactoring code developed in this way is easier because it’s obvious where elements of the visualization are created and added. All-in-all, I think it’s the way to go