Interactive Visualizations with plotly#

So far, we have learned about two visualization libraries - matplotlib and seaborn. A third option we will discuss is the plotting library plotly.

Plotly can be used to create interactive visualizations. It is a newer library than matplotlib, and it doesn’t have quite as many options for statistical graphs as seaborn. However, it is gaining more of a userbase every day, and is nice to use in Jupyter notebooks because the graphs are all automatically interactive. Plotly also has a javascript interface, and can be used with plotly-dash to create web apps.

In this section, we will be making the same visualizations we made in the seaborn lesson. In that lesson, we loaded data we cleaned from the paper

Potts, R.O., Guy, R.H. A Predictive Algorithm for Skin Permeability: The Effects of Molecular Size and Hydrogen Bond Activity. Pharm Res 12, 1628–1633 (1995).

To visualize with plotly, we will import plotly express.

import os
import pandas as pd

import as px
file_path = os.path.join("data", "potts_table1_clean.csv")
df = pd.read_csv(file_path)
Compound log P pi Hd Ha MV R_2 log K_oct log K_hex log K_hep
0 water -6.85 0.45 0.82 0.35 10.6 0.00 -1.38 NaN NaN
1 methanol -6.68 0.44 0.43 0.47 21.7 0.28 -0.73 -2.42 -2.80
2 methanoicacid -7.08 0.60 0.75 0.38 22.3 0.30 -0.54 -3.93 -3.63
3 ethanol -6.66 0.42 0.37 0.48 31.9 0.25 -0.32 -2.24 -2.10
4 ethanoicacid -7.01 0.65 0.61 0.45 33.4 0.27 -0.31 -3.28 -2.90

Creating Scatter Plots#

First, we will create a scatter plot of log P vs pi. This is accomplished using the px.scatter command. Plotly-express works with pandas dataframes. We must pass a pandas dataframe and indicate the column names for x and y.

When working with plotly, you will capture the output of this function as a variable. When we want to see the plot, we use This does not have to be done in the same cell as figure creation.

fig = px.scatter(df, x='pi', y='log P')

You will notice in the figure above that you can hover your mouse over the data points and see information about the points. In the upper right corner of the figure you will find a set of buttons which will allow you to select different options for interacting with the graph.

Visualizing Linear Relationships#

The scatter function has the ability to add a trendline built-in. If you would like a linear fit to be performed between x and y, add the argument trendline='ols' to perform an ordinary least squares fit. Under the hood, plotly will call statsmodels to perform an ordinary least squares fit and will add a line with this fit to the plot. If you want to perform the fit using scikit-learn you will have to manually do the fit an add the line yourself. The trendline argument can also be set to lowess for a ‘locally weighted scatterplot smoothing line’.

fig = px.scatter(
    df, x='pi', y='log P', trendline='ols', trendline_color_override='darkblue')

To make a figure with subplots which shows all of the variables, the data has to be in long form, similar to when we created a plot with lmplot in seaborn. Then, we add another argument to px.scatter - facet_col which will make a new plot for each new value in the variable column.

# Get columns which are numbers - this is the same processing as seaborn.
df2 = df.select_dtypes(include="float")
df2_melt = df2.melt(id_vars="log P")

log P variable value
0 -6.85 pi 0.45
1 -6.68 pi 0.44
2 -7.08 pi 0.60
3 -6.66 pi 0.42
4 -7.01 pi 0.65
fig = px.scatter(df2_melt, x="value", y="log P", facet_col="variable", 

By default, the x and y axes will be on the same scale. In this particular case, we do not want the x-axis to be on the same scale. Add an additional argument after figure creation (fig.update_xaxes(matches=None) to make the axes be on different scales.

The argument facet_col_wrap can be used to specify how many columns shoould be in the figure. The plots will be wrapped into rows using this number of columns. Finally, we add arguments for height and width to make the plot have a better size.

fig = px.scatter(df2_melt, x="value", y="log P", facet_col="variable", facet_col_wrap=2, 
                 trendline='ols', trendline_color_override='darkblue', height=800, width=600)

Correlation Plots#

We can visualize correlation plots using imshow.

corr = df.corr()
heatmap = px.imshow(corr)
heatmap = px.imshow(corr.iloc[:6, :6])

Plotly Color Schemes#

The following section demonstrates using the help function to find more information about available color schemes in plotly express.

heatmap = px.imshow(corr.iloc[:6, :6], color_continuous_scale="RdBu", color_continuous_midpoint=0)

Saving Images#

heatmap.write_image("correlation.png")   # to save png
heatmap.write_html("correlation.html")  # to save html