![]() |
![]() |
![]() |
![]() |
| Scatter Plot | Choropleth | Heatmap | Surface Plot |
| go.Scatter() | go.Choropleth() | go.Heatmap() | go.Surface() |
So far in this tutorial we have been using seaborn and pandas, two mature libraries designed around matplotlib. These libraries all focus on building "static" visualizations: visualizations that have no moving parts. In other words, all of the plots we've built thus far could appear in a dead-tree journal article.
The web unlocks a lot of possibilities when it comes to interactivity and animations. There are a number of plotting libraries available which try to provide these features.
In this section we will examine plotly, an open-source plotting library that's one of the most popular of these libraries.
import pandas as pd
reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
reviews.head()
plotly provides both online and offline modes. The latter injects the plotly source code directly into the notebook; the former does not. I recommend using plotly in offline mode the vast majority of the time, and it's the only mode that works on Kaggle (which disables network access in Python).
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
We'll start by creating a basic scatter plot.
import plotly.graph_objs as go
iplot([go.Scatter(x=reviews.head(1000)['points'], y=reviews.head(1000)['price'], mode='markers')])
This chart is fully interactive. We can use the toolbar on the top-right to perform various operations on the data: zooming and panning, for example. When we hover over a data point, we get a tooltip. We can even save the plot as a PNG image.
This chart also demonstrates the disadvantage of this fancier plotting library. In order to keep performance reasonable, we had to limit ourselves to the first 1000 points in the dataset. While this was necessary anyway (to avoid too much overplotting) it's important to note that in general, interactive graphics are much, much more resource-intensive than static ones. It's easier to "max out" how many points of data you can show.
Notice the "shape" of the plotly API. iplot takes a list of plot objects and composes them for you, plotting the combined end result. This makes it easy to stack plots.
As another example, here's a KDE plot (what plotly refers to as a Histogram2dContour) and scatter plot of the same data.
iplot([go.Histogram2dContour(x=reviews.head(500)['points'],
y=reviews.head(500)['price'],
contours=go.Contours(coloring='heatmap')),
go.Scatter(x=reviews.head(1000)['points'], y=reviews.head(1000)['price'], mode='markers')])
plotly exposes several different APIs, ranging in complexity from low-level to high-level. iplot is the highest-level API, and hence, the most convenient one for general-purpose use.
Personally I've always found the plotly Surface its most impressive feature (albeit one of the hardest to get right):
df = reviews.assign(n=0).groupby(['points', 'price'])['n'].count().reset_index()
df = df[df["price"] < 100]
v = df.pivot(index='price', columns='points', values='n').fillna(0).values.tolist()
iplot([go.Surface(z=v)])
On Kaggle, plotly is often used to make choropleths. Choropleths are a type of map, where all of the entities (countries, US states, etc.) are colored according to some variable in the dataset. plotly is one of the most convenient plotting libraries available for making them.
df = reviews['country'].replace("US", "United States").value_counts()
iplot([go.Choropleth(
locationmode='country names',
locations=df.index.values,
text=df.index,
z=df.values
)])
Overall, plotly is a powerful, richly interactive data visualization library. It allows us to generate plots with more "pizazz" than standard pandas or seaborn output.
The tradeoff is that while pandas and seaborn are well-established, plotly is still new. As a result, and in particular, plotly documentation is much harder to and find and interpret; the office documentation on the plotly website uses a mix of different features for plotting, which makes it harder to use than it has to be.
Additionally, it's important to recognize when interactivity is useful, and when it is not. The most effective plots do not need to use hovers or tooltips to get their work done. As a result plotly, though extremely attractive, is rarely more useful than an equivalent plot in pandas or seaborn.
For the following exercise, try forking and running this notebook, and then reproducing the chart that follows. Hint: Attack on the x-axis, Defense on the y-axis.
import pandas as pd
pokemon = pd.read_csv("../input/pokemon/Pokemon.csv")
pokemon.head(3)
import plotly.graph_objs as go
iplot([go.Scatter(x=pokemon['Attack'], y=pokemon['Defense'], mode='markers')])
In this section we looked at plotly, an interactive plotting library that produces very attractive-looking charts. It is one of a number of alternatives to matplotlib-based tools that provide first-class interactivity (bokeh is another one worth mentioning).
In the next section we'll look at another plotting library, plotnine, which is designed around a peculiar but powerful idea called the grammar of graphics.
Click here to go the next section, "Grammar of graphics with plotnine".