Interactive Visualizations with plotly
============================

So far, we have learned about two visualization libraries - [matplotlib](https://matplotlib.org/) and [seaborn](https://seaborn.pydata.org/). A third option we will discuss is the plotting library [plotly](https://plotly.com/python/). 

Plotly can be used to create interactive visualizations. It is a newer library than matplotlib, and it doesn't have quite as many options for statistical graphs as seaborn. However, it is gaining more of a userbase every day, and is nice to use in Jupyter notebooks because the graphs are all automatically interactive. Plotly also has a javascript interface, and can be used with `plotly-dash` to create web apps.

In this section, we will be making the same visualizations we made in the seaborn lesson. In that lesson, we loaded data we cleaned from the paper

> Potts, R.O., Guy, R.H. A Predictive Algorithm for Skin Permeability: The Effects of Molecular Size and Hydrogen Bond Activity. Pharm Res 12, 1628â€“1633 (1995). https://doi.org/10.1023/A:1016236932339

To visualize with plotly, we will import `plotly express`.

In [1]:
import os
import pandas as pd

import plotly.express as px

In [2]:
file_path = os.path.join("data", "potts_table1_clean.csv")
df = pd.read_csv(file_path)
df.head()

Unnamed: 0,Compound,log P,pi,Hd,Ha,MV,R_2,log K_oct,log K_hex,log K_hep
0,water,-6.85,0.45,0.82,0.35,10.6,0.0,-1.38,,
1,methanol,-6.68,0.44,0.43,0.47,21.7,0.28,-0.73,-2.42,-2.8
2,methanoicacid,-7.08,0.6,0.75,0.38,22.3,0.3,-0.54,-3.93,-3.63
3,ethanol,-6.66,0.42,0.37,0.48,31.9,0.25,-0.32,-2.24,-2.1
4,ethanoicacid,-7.01,0.65,0.61,0.45,33.4,0.27,-0.31,-3.28,-2.9


## Creating Scatter Plots

First, we will create a scatter plot of `log P` vs `pi`. This is accomplished using the `px.scatter` command. Plotly-express works with pandas dataframes. We must pass a pandas dataframe and indicate the column names for `x` and `y`. 

When working with plotly, you will capture the output of this function as a variable. When we want to see the plot, we use `variable_name.show()`. This does not have to be done in the same cell as figure creation.

In [3]:
fig = px.scatter(df, x='pi', y='log P')

fig.show()

You will notice in the figure above that you can hover your mouse over the data points and see information about the points. In the upper right corner of the figure you will find a set of buttons which will allow you to select different options for interacting with the graph.


## Visualizing Linear Relationships

The scatter function has the ability to add a trendline built-in. If you would like a linear fit to be performed between `x` and `y`, add the argument `trendline='ols'` to perform an ordinary least squares fit. Under the hood, plotly will call `statsmodels` to perform an ordinary least squares fit and will add a line with this fit to the plot. If you want to perform the fit using scikit-learn you will have to manually do the fit an add the line yourself. The trendline argument can also be set to `lowess` for a 'locally weighted scatterplot smoothing line'.

In [4]:
fig = px.scatter(
    df, x='pi', y='log P', trendline='ols', trendline_color_override='darkblue')
fig.show()

To make a figure with subplots which shows all of the variables, the data has to be in long form, similar to when we created a plot with `lmplot` in seaborn. Then, we add another argument to `px.scatter` - `facet_col` which will make a new plot for each new value in the `variable` column.

In [5]:
# Get columns which are numbers - this is the same processing as seaborn.
df2 = df.select_dtypes(include="float")
df2_melt = df2.melt(id_vars="log P")

df2_melt.head()

Unnamed: 0,log P,variable,value
0,-6.85,pi,0.45
1,-6.68,pi,0.44
2,-7.08,pi,0.6
3,-6.66,pi,0.42
4,-7.01,pi,0.65


In [6]:
fig = px.scatter(df2_melt, x="value", y="log P", facet_col="variable", 
                 trendline='ols', 
                 trendline_color_override='darkblue')

fig.show()

By default, the x and y axes will be on the same scale. In this particular case, we do not want the x-axis to be on the same scale. Add an additional argument after figure creation (`fig.update_xaxes(matches=None)` to make the axes be on different scales. 

The argument `facet_col_wrap` can be used to specify how many columns shoould be in the figure. The plots will be wrapped into rows using this number of columns. Finally, we add arguments for height and width to make the plot have a better size.

In [7]:
fig = px.scatter(df2_melt, x="value", y="log P", facet_col="variable", facet_col_wrap=2, 
                 trendline='ols', trendline_color_override='darkblue', height=800, width=600)
fig.update_xaxes(matches=None)
fig.show()

## Correlation Plots

We can visualize correlation plots using `imshow`.

In [8]:
corr = df.corr()
heatmap = px.imshow(corr)
heatmap.show()

In [9]:
heatmap = px.imshow(corr.iloc[:6, :6])
heatmap.show()

## Plotly Color Schemes

The following section demonstrates using the help function to find more information about available color schemes in plotly express.

In [10]:
help(px.colors)

Help on package plotly.express.colors in plotly.express:

NAME
    plotly.express.colors - For a list of colors available in `plotly.express.colors`, please see

DESCRIPTION
    * the `tutorial on discrete color sequences <https://plotly.com/python/discrete-color/#color-sequences-in-plotly-express>`_
    * the `list of built-in continuous color scales <https://plotly.com/python/builtin-colorscales/>`_
    * the `tutorial on continuous colors <https://plotly.com/python/colorscales/>`_
    
    Color scales are available within the following namespaces
    
    * cyclical
    * diverging
    * qualitative
    * sequential

PACKAGE CONTENTS


FUNCTIONS
    color_parser(colors, function)
        Takes color(s) and a function and applies the function on the color(s)
        
        In particular, this function identifies whether the given color object
        is an iterable or not and applies the given color-parsing function to
        the color or iterable of colors. If given an iterable,

In [11]:
help(px.colors.diverging)

Help on module _plotly_utils.colors.diverging in _plotly_utils.colors:

NAME
    _plotly_utils.colors.diverging

DESCRIPTION
    Diverging color scales are appropriate for continuous data that has a natural midpoint other otherwise informative special value, such as 0 altitude, or the boiling point
    of a liquid. The color scales in this module are mostly meant to be passed in as the `color_continuous_scale` argument to various functions, and to be used with the `color_continuous_midpoint` argument.

FUNCTIONS
    swatches(template=None)
        Parameters
        ----------
        template : str or dict or plotly.graph_objects.layout.Template instance
            The figure template name or definition.
        
        Returns
        -------
        fig : graph_objects.Figure containing the displayed image
            A `Figure` object. This figure demonstrates the color scales and
            sequences in this module, as stacked bar charts.

DATA
    __all__ = ['swatches']

FILE


In [12]:
px.colors.diverging.swatches()

In [13]:
heatmap = px.imshow(corr.iloc[:6, :6], color_continuous_scale="RdBu", color_continuous_midpoint=0)
heatmap.show()

## Saving Images

In [14]:
heatmap.write_image("correlation.png")   # to save png
heatmap.write_html("correlation.html")  # to save html