Maps allow us to transform data in a DataFrame or Series one value at a time for an entire column. However, often we want to group our data, and then do something specific to the group the data is in. We do this with the groupby operation.
In these exercises we'll apply groupwise analysis to our dataset.
Run the code cell below to load the data before running the exercises.
import pandas as pd
from learntools.advanced_pandas.grouping_and_sorting import *
reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)
Check your answers in each exercise using the check_qN function (replacing N with the number of the exercise). For example here's how you would check an incorrect answer to exercise 1:
check_q1(pd.DataFrame())
If you get stuck, use the answer_qN function to see the code with the correct answer.
For the first set of questions, running the check_qN on the correct answer returns True.
For the second set of questions, using this function to check a correct answer will present an informative graph!
reviews.head()
Exercise 1: Who are the most common wine reviewers in the dataset? Create a Series whose index is the taster_twitter_handle category from the dataset, and whose values count how many reviews each person wrote.
common_wine_reviewers = reviews.groupby('taster_twitter_handle').taster_twitter_handle.count()
common_wine_reviewers
print(check_q1(common_wine_reviewers))
print('-------')
print(answer_q1())
Exercise 2: What is the best wine I can buy for a given amount of money? Create a Series whose index is wine prices and whose values is the maximum number of points a wine costing that much was given in a review. Sort the valeus by price, ascending (so that 4.0 dollars is at the top and 3300.0 dollars is at the bottom).
best_wine = reviews.groupby('price').points.max().sort_index()
best_wine
print(check_q2(best_wine))
print('-------')
print(answer_q2())
Exercise 3: What are the minimum and maximum prices for each variety of wine? Create a DataFrame whose index is the variety category from the dataset and whose values are the min and max values thereof.
wine_price_extremes = reviews.groupby(['variety']).price.agg([min, max])
wine_price_extremes
print(check_q3(wine_price_extremes))
print('-------')
print(answer_q3())
The rest of the exercises are visual.
Exercise 4: Are there significant differences in the average scores assigned by the various reviewers? Create a Series whose index is reviewers and whose values is the average review score given out by that reviewer. Hint: you will need the taster_name and points columns.
reviewer_mean_ratings = reviews.groupby('taster_name').points.mean()
reviewer_mean_ratings
print(check_q4(reviewer_mean_ratings))
print('-------')
print(answer_q4())
Exercise 5: What are the most expensive wine varieties? Create a DataFrame whose index is wine varieties and whose values are columns with the min and the max price of wines of this variety. Sort in descending order based on min first, max second.
wine_price_range = reviews.groupby(['variety']).price.agg([min, max]).sort_values(by=['min', 'max'], ascending=False)
wine_price_range
print(check_q5(wine_price_range))
print('-------')
print(answer_q5())
Exercise 6: What combination of countries and varieties are most common? Create a Series whose index is a MultiIndexof {country, variety} pairs. For example, a pinot noir produced in the US should map to {"US", "Pinot Noir"}. Sort the values in the Series in descending order based on wine count.
Hint: first run reviews['n'] = 0. Then groupby the dataset and run something on the column n. You won't need reset_index.
reviews['n'] = 0
country_variety_pairs = reviews.groupby(['country', 'variety']).n.count().sort_values(ascending=False)
country_variety_pairs
print(check_q6(country_variety_pairs))
print('-------')
print(answer_q6())
Move on to Data types and missing data workbook.
This is part of the Learn Pandas series.