Interactive Visualizations with plotly#

So far, we have learned about two visualization libraries - matplotlib and seaborn. A third option we will discuss is the plotting library plotly.

Plotly can be used to create interactive visualizations. It is a newer library than matplotlib, and it doesn’t have quite as many options for statistical graphs as seaborn. However, it is gaining more of a userbase every day, and is nice to use in Jupyter notebooks because the graphs are all automatically interactive. Plotly also has a javascript interface, and can be used with plotly-dash to create web apps.

In this section, we will be making the same visualizations we made in the seaborn lesson. In that lesson, we loaded data we cleaned from the paper

Potts, R.O., Guy, R.H. A Predictive Algorithm for Skin Permeability: The Effects of Molecular Size and Hydrogen Bond Activity. Pharm Res 12, 1628–1633 (1995). https://doi.org/10.1023/A:1016236932339

To visualize with plotly, we will import plotly express.

import os
import pandas as pd

import plotly.express as px
file_path = os.path.join("data", "potts_table1_clean.csv")
df = pd.read_csv(file_path)
df.head()
Compound log P pi Hd Ha MV R_2 log K_oct log K_hex log K_hep
0 water -6.85 0.45 0.82 0.35 10.6 0.00 -1.38 NaN NaN
1 methanol -6.68 0.44 0.43 0.47 21.7 0.28 -0.73 -2.42 -2.80
2 methanoicacid -7.08 0.60 0.75 0.38 22.3 0.30 -0.54 -3.93 -3.63
3 ethanol -6.66 0.42 0.37 0.48 31.9 0.25 -0.32 -2.24 -2.10
4 ethanoicacid -7.01 0.65 0.61 0.45 33.4 0.27 -0.31 -3.28 -2.90

Creating Scatter Plots#

First, we will create a scatter plot of log P vs pi. This is accomplished using the px.scatter command. Plotly-express works with pandas dataframes. We must pass a pandas dataframe and indicate the column names for x and y.

When working with plotly, you will capture the output of this function as a variable. When we want to see the plot, we use variable_name.show(). This does not have to be done in the same cell as figure creation.

fig = px.scatter(df, x='pi', y='log P')

fig.show()

You will notice in the figure above that you can hover your mouse over the data points and see information about the points. In the upper right corner of the figure you will find a set of buttons which will allow you to select different options for interacting with the graph.

Visualizing Linear Relationships#

The scatter function has the ability to add a trendline built-in. If you would like a linear fit to be performed between x and y, add the argument trendline='ols' to perform an ordinary least squares fit. Under the hood, plotly will call statsmodels to perform an ordinary least squares fit and will add a line with this fit to the plot. If you want to perform the fit using scikit-learn you will have to manually do the fit an add the line yourself. The trendline argument can also be set to lowess for a ‘locally weighted scatterplot smoothing line’.

fig = px.scatter(
    df, x='pi', y='log P', trendline='ols', trendline_color_override='darkblue')
fig.show()

To make a figure with subplots which shows all of the variables, the data has to be in long form, similar to when we created a plot with lmplot in seaborn. Then, we add another argument to px.scatter - facet_col which will make a new plot for each new value in the variable column.

# Get columns which are numbers - this is the same processing as seaborn.
df2 = df.select_dtypes(include="float")
df2_melt = df2.melt(id_vars="log P")

df2_melt.head()
log P variable value
0 -6.85 pi 0.45
1 -6.68 pi 0.44
2 -7.08 pi 0.60
3 -6.66 pi 0.42
4 -7.01 pi 0.65
fig = px.scatter(df2_melt, x="value", y="log P", facet_col="variable", 
                 trendline='ols', 
                 trendline_color_override='darkblue')

fig.show()

By default, the x and y axes will be on the same scale. In this particular case, we do not want the x-axis to be on the same scale. Add an additional argument after figure creation (fig.update_xaxes(matches=None) to make the axes be on different scales.

The argument facet_col_wrap can be used to specify how many columns shoould be in the figure. The plots will be wrapped into rows using this number of columns. Finally, we add arguments for height and width to make the plot have a better size.

fig = px.scatter(df2_melt, x="value", y="log P", facet_col="variable", facet_col_wrap=2, 
                 trendline='ols', trendline_color_override='darkblue', height=800, width=600)
fig.update_xaxes(matches=None)
fig.show()

Correlation Plots#

We can visualize correlation plots using imshow.

corr = df.corr()
heatmap = px.imshow(corr)
heatmap.show()
heatmap = px.imshow(corr.iloc[:6, :6])
heatmap.show()

Plotly Color Schemes#

The following section demonstrates using the help function to find more information about available color schemes in plotly express.

help(px.colors)
Help on package plotly.express.colors in plotly.express:

NAME
    plotly.express.colors - For a list of colors available in `plotly.express.colors`, please see

DESCRIPTION
    * the `tutorial on discrete color sequences <https://plotly.com/python/discrete-color/#color-sequences-in-plotly-express>`_
    * the `list of built-in continuous color scales <https://plotly.com/python/builtin-colorscales/>`_
    * the `tutorial on continuous colors <https://plotly.com/python/colorscales/>`_
    
    Color scales are available within the following namespaces
    
    * cyclical
    * diverging
    * qualitative
    * sequential

PACKAGE CONTENTS


FUNCTIONS
    color_parser(colors, function)
        Takes color(s) and a function and applies the function on the color(s)
        
        In particular, this function identifies whether the given color object
        is an iterable or not and applies the given color-parsing function to
        the color or iterable of colors. If given an iterable, it will only be
        able to work with it if all items in the iterable are of the same type
        - rgb string, hex string or tuple
    
    colorscale_to_colors(colorscale)
        Extracts the colors from colorscale as a list
    
    colorscale_to_scale(colorscale)
        Extracts the interpolation scale values from colorscale as a list
    
    convert_colors_to_same_type(colors, colortype='rgb', scale=None, return_default_colors=False, num_of_defualt_colors=2)
        Converts color(s) to the specified color type
        
        Takes a single color or an iterable of colors, as well as a list of scale
        values, and outputs a 2-pair of the list of color(s) converted all to an
        rgb or tuple color type, aswell as the scale as the second element. If
        colors is a Plotly Scale name, then 'scale' will be forced to the scale
        from the respective colorscale and the colors in that colorscale will also
        be coverted to the selected colortype. If colors is None, then there is an
        option to return portion of the DEFAULT_PLOTLY_COLORS
        
        :param (str|tuple|list) colors: either a plotly scale name, an rgb or hex
            color, a color tuple or a list/tuple of colors
        :param (list) scale: see docs for validate_scale_values()
        
        :rtype (tuple) (colors_list, scale) if scale is None in the function call,
            then scale will remain None in the returned tuple
    
    convert_colorscale_to_rgb(colorscale)
        Converts the colors in a colorscale to rgb colors
        
        A colorscale is an array of arrays, each with a numeric value as the
        first item and a color as the second. This function specifically is
        converting a colorscale with tuple colors (each coordinate between 0
        and 1) into a colorscale with the colors transformed into rgb colors
    
    convert_dict_colors_to_same_type(colors_dict, colortype='rgb')
        Converts a colors in a dictioanry of colors to the specified color type
        
        :param (dict) colors_dict: a dictioanry whose values are single colors
    
    convert_to_RGB_255(colors)
        Multiplies each element of a triplet by 255
        
        Each coordinate of the color tuple is rounded to the nearest float and
        then is turned into an integer. If a number is of the form x.5, then
        if x is odd, the number rounds up to (x+1). Otherwise, it rounds down
        to just x. This is the way rounding works in Python 3 and in current
        statistical analysis to avoid rounding bias
        
        :param (list) rgb_components: grabs the three R, G and B values to be
            returned as computed in the function
    
    find_intermediate_color(lowcolor, highcolor, intermed, colortype='tuple')
        Returns the color at a given distance between two colors
        
        This function takes two color tuples, where each element is between 0
        and 1, along with a value 0 < intermed < 1 and returns a color that is
        intermed-percent from lowcolor to highcolor. If colortype is set to 'rgb',
        the function will automatically convert the rgb type to a tuple, find the
        intermediate color and return it as an rgb color.
    
    hex_to_rgb(value)
        Calculates rgb values from a hex color code.
        
        :param (string) value: Hex color string
        
        :rtype (tuple) (r_value, g_value, b_value): tuple of rgb values
    
    label_rgb(colors)
        Takes tuple (a, b, c) and returns an rgb color 'rgb(a, b, c)'
    
    make_colorscale(colors, scale=None)
        Makes a colorscale from a list of colors and a scale
        
        Takes a list of colors and scales and constructs a colorscale based
        on the colors in sequential order. If 'scale' is left empty, a linear-
        interpolated colorscale will be generated. If 'scale' is a specificed
        list, it must be the same legnth as colors and must contain all floats
        For documentation regarding to the form of the output, see
        https://plot.ly/python/reference/#mesh3d-colorscale
        
        :param (list) colors: a list of single colors
    
    n_colors(lowcolor, highcolor, n_colors, colortype='tuple')
        Splits a low and high color into a list of n_colors colors in it
        
        Accepts two color tuples and returns a list of n_colors colors
        which form the intermediate colors between lowcolor and highcolor
        from linearly interpolating through RGB space. If colortype is 'rgb'
        the function will return a list of colors in the same form.
    
    named_colorscales()
        Returns lowercased names of built-in continuous colorscales.
    
    unconvert_from_RGB_255(colors)
        Return a tuple where each element gets divided by 255
        
        Takes a (list of) color tuple(s) where each element is between 0 and
        255. Returns the same tuples where each tuple element is normalized to
        a value between 0 and 1
    
    unlabel_rgb(colors)
        Takes rgb color(s) 'rgb(a, b, c)' and returns tuple(s) (a, b, c)
        
        This function takes either an 'rgb(a, b, c)' color or a list of
        such colors and returns the color tuples in tuple(s) (a, b, c)
    
    validate_colors(colors, colortype='tuple')
        Validates color(s) and returns a list of color(s) of a specified type
    
    validate_colors_dict(colors, colortype='tuple')
        Validates dictioanry of color(s)
    
    validate_colorscale(colorscale)
        Validate the structure, scale values and colors of colorscale.
    
    validate_scale_values(scale)
        Validates scale values from a colorscale
        
        :param (list) scale: a strictly increasing list of floats that begins
            with 0 and ends with 1. Its usage derives from a colorscale which is
            a list of two-lists (a list with two elements) of the form
            [value, color] which are used to determine how interpolation weighting
            works between the colors in the colorscale. Therefore scale is just
            the extraction of these values from the two-lists in order

DATA
    DEFAULT_PLOTLY_COLORS = ['rgb(31, 119, 180)', 'rgb(255, 127, 14)', 'rg...
    PLOTLY_SCALES = {'Blackbody': [[0, 'rgb(0,0,0)'], [0.2, 'rgb(230,0,0)'...
    __all__ = ['named_colorscales', 'cyclical', 'diverging', 'sequential',...

FILE
    /home/janash/miniconda3/envs/python-scripting-2/lib/python3.9/site-packages/plotly/express/colors/__init__.py
help(px.colors.diverging)
Help on module _plotly_utils.colors.diverging in _plotly_utils.colors:

NAME
    _plotly_utils.colors.diverging

DESCRIPTION
    Diverging color scales are appropriate for continuous data that has a natural midpoint other otherwise informative special value, such as 0 altitude, or the boiling point
    of a liquid. The color scales in this module are mostly meant to be passed in as the `color_continuous_scale` argument to various functions, and to be used with the `color_continuous_midpoint` argument.

FUNCTIONS
    swatches(template=None)
        Parameters
        ----------
        template : str or dict or plotly.graph_objects.layout.Template instance
            The figure template name or definition.
        
        Returns
        -------
        fig : graph_objects.Figure containing the displayed image
            A `Figure` object. This figure demonstrates the color scales and
            sequences in this module, as stacked bar charts.

DATA
    __all__ = ['swatches']

FILE
    /home/janash/miniconda3/envs/python-scripting-2/lib/python3.9/site-packages/_plotly_utils/colors/diverging.py
px.colors.diverging.swatches()
heatmap = px.imshow(corr.iloc[:6, :6], color_continuous_scale="RdBu", color_continuous_midpoint=0)
heatmap.show()

Saving Images#

heatmap.write_image("correlation.png")   # to save png
heatmap.write_html("correlation.html")  # to save html