Scatter Plot Choropleth Heatmap Surface Plot
go.Scatter() go.Choropleth() go.Heatmap() go.Surface()

Introduction to plotly

So far in this tutorial we have been using seaborn and pandas, two mature libraries designed around matplotlib. These libraries all focus on building "static" visualizations: visualizations that have no moving parts. In other words, all of the plots we've built thus far could appear in a dead-tree journal article.

The web unlocks a lot of possibilities when it comes to interactivity and animations. There are a number of plotting libraries available which try to provide these features.

In this section we will examine plotly, an open-source plotting library that's one of the most popular of these libraries.

In [1]:
import pandas as pd
reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
reviews.head()
Out[1]:
country description designation points price province region_1 region_2 taster_name taster_twitter_handle title variety winery
0 Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco 87 NaN Sicily & Sardinia Etna NaN Kerin O’Keefe @kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) White Blend Nicosia
1 Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87 15.0 Douro NaN NaN Roger Voss @vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red Quinta dos Avidagos
2 US Tart and snappy, the flavors of lime flesh and... NaN 87 14.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Rainstorm 2013 Pinot Gris (Willamette Valley) Pinot Gris Rainstorm
3 US Pineapple rind, lemon pith and orange blossom ... Reserve Late Harvest 87 13.0 Michigan Lake Michigan Shore NaN Alexander Peartree NaN St. Julian 2013 Reserve Late Harvest Riesling ... Riesling St. Julian
4 US Much like the regular bottling from 2012, this... Vintner's Reserve Wild Child Block 87 65.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Sweet Cheeks 2012 Vintner's Reserve Wild Child... Pinot Noir Sweet Cheeks

plotly provides both online and offline modes. The latter injects the plotly source code directly into the notebook; the former does not. I recommend using plotly in offline mode the vast majority of the time, and it's the only mode that works on Kaggle (which disables network access in Python).

In [2]:
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

We'll start by creating a basic scatter plot.

In [3]:
import plotly.graph_objs as go

iplot([go.Scatter(x=reviews.head(1000)['points'], y=reviews.head(1000)['price'], mode='markers')])

This chart is fully interactive. We can use the toolbar on the top-right to perform various operations on the data: zooming and panning, for example. When we hover over a data point, we get a tooltip. We can even save the plot as a PNG image.

This chart also demonstrates the disadvantage of this fancier plotting library. In order to keep performance reasonable, we had to limit ourselves to the first 1000 points in the dataset. While this was necessary anyway (to avoid too much overplotting) it's important to note that in general, interactive graphics are much, much more resource-intensive than static ones. It's easier to "max out" how many points of data you can show.

Notice the "shape" of the plotly API. iplot takes a list of plot objects and composes them for you, plotting the combined end result. This makes it easy to stack plots.

As another example, here's a KDE plot (what plotly refers to as a Histogram2dContour) and scatter plot of the same data.

In [4]:
iplot([go.Histogram2dContour(x=reviews.head(500)['points'], 
                             y=reviews.head(500)['price'], 
                             contours=go.Contours(coloring='heatmap')),
       go.Scatter(x=reviews.head(1000)['points'], y=reviews.head(1000)['price'], mode='markers')])
/opt/conda/lib/python3.6/site-packages/plotly/graph_objs/_deprecations.py:204: DeprecationWarning:

plotly.graph_objs.Contours is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.contour.Contours
  - plotly.graph_objs.surface.Contours
  - etc.


plotly exposes several different APIs, ranging in complexity from low-level to high-level. iplot is the highest-level API, and hence, the most convenient one for general-purpose use.

Personally I've always found the plotly Surface its most impressive feature (albeit one of the hardest to get right):

In [5]:
df = reviews.assign(n=0).groupby(['points', 'price'])['n'].count().reset_index()
df = df[df["price"] < 100]
v = df.pivot(index='price', columns='points', values='n').fillna(0).values.tolist()
In [6]:
iplot([go.Surface(z=v)])

On Kaggle, plotly is often used to make choropleths. Choropleths are a type of map, where all of the entities (countries, US states, etc.) are colored according to some variable in the dataset. plotly is one of the most convenient plotting libraries available for making them.

In [7]:
df = reviews['country'].replace("US", "United States").value_counts()

iplot([go.Choropleth(
    locationmode='country names',
    locations=df.index.values,
    text=df.index,
    z=df.values
)])

Overall, plotly is a powerful, richly interactive data visualization library. It allows us to generate plots with more "pizazz" than standard pandas or seaborn output.

The tradeoff is that while pandas and seaborn are well-established, plotly is still new. As a result, and in particular, plotly documentation is much harder to and find and interpret; the office documentation on the plotly website uses a mix of different features for plotting, which makes it harder to use than it has to be.

Additionally, it's important to recognize when interactivity is useful, and when it is not. The most effective plots do not need to use hovers or tooltips to get their work done. As a result plotly, though extremely attractive, is rarely more useful than an equivalent plot in pandas or seaborn.

Exercises

For the following exercise, try forking and running this notebook, and then reproducing the chart that follows. Hint: Attack on the x-axis, Defense on the y-axis.

In [8]:
import pandas as pd
pokemon = pd.read_csv("../input/pokemon/Pokemon.csv")
pokemon.head(3)
Out[8]:
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
In [12]:
import plotly.graph_objs as go

iplot([go.Scatter(x=pokemon['Attack'], y=pokemon['Defense'], mode='markers')])

Conclusion

In this section we looked at plotly, an interactive plotting library that produces very attractive-looking charts. It is one of a number of alternatives to matplotlib-based tools that provide first-class interactivity (bokeh is another one worth mentioning).

In the next section we'll look at another plotting library, plotnine, which is designed around a peculiar but powerful idea called the grammar of graphics.

Click here to go the next section, "Grammar of graphics with plotnine".