Renaming and combining workbook

Introduction

This is the worbook part of the "Renaming and combining" section of the Advanced Pandas tutorial. For the reference section, click here.

Renaming is covered in its own section in the "Essential Basic Functionality" (http://archive.is/e20r8#60.7%) section of the extensive official documentation. Combining is covered by the "Merge, join, concatenate" (http://archive.is/f5FsP) section there.

In [1]:
import pandas as pd

from learntools.advanced_pandas.renaming_and_combining import *

pd.set_option('max_rows', 5)
reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

Checking Answers

Check your answers in each exercise using the check_qN function (replacing N with the number of the exercise). For example here's how you would check an incorrect answer to exercise 1:

In [2]:
check_q1(pd.DataFrame())
Out[2]:
False

If you get stuck, use the answer_qN function to see the code with the correct answer.

For the first set of questions, running the check_qN on the correct answer returns True.

For the second set of questions, using this function to check a correct answer will present an informative graph!

Exercises

Look at your data by running the cell below:

In [3]:
reviews.head()
Out[3]:
country description designation points price province region_1 region_2 taster_name taster_twitter_handle title variety winery
0 Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco 87 NaN Sicily & Sardinia Etna NaN Kerin O’Keefe @kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) White Blend Nicosia
1 Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87 15.0 Douro NaN NaN Roger Voss @vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red Quinta dos Avidagos
2 US Tart and snappy, the flavors of lime flesh and... NaN 87 14.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Rainstorm 2013 Pinot Gris (Willamette Valley) Pinot Gris Rainstorm
3 US Pineapple rind, lemon pith and orange blossom ... Reserve Late Harvest 87 13.0 Michigan Lake Michigan Shore NaN Alexander Peartree NaN St. Julian 2013 Reserve Late Harvest Riesling ... Riesling St. Julian
4 US Much like the regular bottling from 2012, this... Vintner's Reserve Wild Child Block 87 65.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Sweet Cheeks 2012 Vintner's Reserve Wild Child... Pinot Noir Sweet Cheeks

Exercise 1: region_1 and region_2 are pretty uninformative names for locale columns in the dataset. Rename these columns to region and locale.

In [4]:
reviews.rename(columns={'region_1': 'region', 'region_2': 'locale'})
Out[4]:
country description designation points price province region locale taster_name taster_twitter_handle title variety winery
0 Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco 87 NaN Sicily & Sardinia Etna NaN Kerin O’Keefe @kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) White Blend Nicosia
1 Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87 15.0 Douro NaN NaN Roger Voss @vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red Quinta dos Avidagos
... ... ... ... ... ... ... ... ... ... ... ... ... ...
129969 France A dry style of Pinot Gris, this is crisp with ... NaN 90 32.0 Alsace Alsace NaN Roger Voss @vossroger Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Pinot Gris Domaine Marcel Deiss
129970 France Big, rich and off-dry, this is powered by inte... Lieu-dit Harth Cuvée Caroline 90 21.0 Alsace Alsace NaN Roger Voss @vossroger Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... Gewürztraminer Domaine Schoffit

129971 rows × 13 columns

In [5]:
print(check_q1(reviews.rename(columns={'region_1': 'region', 'region_2': 'locale'})))
print('-------')
print(answer_q1())
True
-------
reviews.rename(columns={'region_1': 'region', 'region_2': 'locale'})
None

Exercise 2: Set the index name in the dataset to wines.

In [6]:
reviews.rename_axis("wines", axis='rows')
Out[6]:
country description designation points price province region_1 region_2 taster_name taster_twitter_handle title variety winery
wines
0 Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco 87 NaN Sicily & Sardinia Etna NaN Kerin O’Keefe @kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) White Blend Nicosia
1 Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87 15.0 Douro NaN NaN Roger Voss @vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red Quinta dos Avidagos
... ... ... ... ... ... ... ... ... ... ... ... ... ...
129969 France A dry style of Pinot Gris, this is crisp with ... NaN 90 32.0 Alsace Alsace NaN Roger Voss @vossroger Domaine Marcel Deiss 2012 Pinot Gris (Alsace) Pinot Gris Domaine Marcel Deiss
129970 France Big, rich and off-dry, this is powered by inte... Lieu-dit Harth Cuvée Caroline 90 21.0 Alsace Alsace NaN Roger Voss @vossroger Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car... Gewürztraminer Domaine Schoffit

129971 rows × 13 columns

In [7]:
print(check_q2(reviews.rename_axis("wines", axis='rows')))
print('-------')
print(answer_q2())
True
-------
reviews.rename_axis("wines", axis="rows")
None

Exercise 3: The Things on Reddit dataset includes product links from a selection of top-ranked forums ("subreddits") on Reddit.com. Create a DataFrame of products mentioned on either subreddit. Use the following data:

In [8]:
gaming_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/g/gaming.csv")
gaming_products['subreddit'] = "r/gaming"
movie_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/m/movies.csv")
movie_products['subreddit'] = "r/movies"
In [9]:
pd.concat([gaming_products, movie_products])
Out[9]:
name category amazon_link total_mentions subreddit_mentions subreddit
0 BOOMco Halo Covenant Needler Blaster Toys & Games https://www.amazon.com/BOOMco-Halo-Covenant-Ne... 4.0 4 r/gaming
1 Raspberry PI 3 Model B 1.2GHz 64-bit quad-core... Electronics https://www.amazon.com/Raspberry-Model-A1-2GHz... 19.0 3 r/gaming
... ... ... ... ... ... ...
301 Apocalypto [Blu-ray] Movies & TV https://www.amazon.com/Apocalypto-Blu-ray-Rudy... 1.0 1 r/movies
302 Cinelinx: A Card Game for People Who Love Movi... Toys & Games https://www.amazon.com/Cinelinx-Card-Game-Peop... 1.0 1 r/movies

796 rows × 6 columns

In [10]:
print(check_q3(pd.concat([gaming_products, movie_products])))
print('-------')
print(answer_q3())
True
-------
pd.concat([gaming_products, movie_products])
None

Exercise 4: The Powerlifting Database dataset on Kaggle includes one CSV table for powerlifting meets and a separate one for powerlifting competitors. Both tables include references to a MeetID, a unique key for each meet (competition) included in the database. Using this, generate a dataset combining the two tables into one. Use the following data:

In [14]:
powerlifting_meets = pd.read_csv("../input/powerlifting-database/meets.csv")
powerlifting_competitors = pd.read_csv("../input/powerlifting-database/openpowerlifting.csv")
In [16]:
powerlifting_meets.set_index("MeetID").join(powerlifting_competitors.set_index("MeetID"))
Out[16]:
MeetPath Federation Date MeetCountry MeetState MeetTown MeetName Name Sex Equipment Age Division BodyweightKg WeightClassKg Squat4Kg BestSquatKg Bench4Kg BestBenchKg Deadlift4Kg BestDeadliftKg TotalKg Place Wilks
MeetID
0 365strong/1601 365Strong 2016-10-29 USA NC Charlotte 2016 Junior & Senior National Powerlifting Cha... Angie Belk Terry F Wraps 47.0 Mst 45-49 59.60 60 NaN 47.63 NaN 20.41 NaN 70.31 138.35 1 155.05
0 365strong/1601 365Strong 2016-10-29 USA NC Charlotte 2016 Junior & Senior National Powerlifting Cha... Dawn Bogart F Single-ply 42.0 Mst 40-44 58.51 60 NaN 142.88 NaN 95.25 NaN 163.29 401.42 1 456.38
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
8481 xpc/2017-finals XPC 2017-03-03 USA OH Columbus 2017 XPC Finals Jeff Bumanglag M Multi-ply NaN Elite 126.73 140 NaN NaN NaN NaN NaN 320.00 320.00 3 181.85
8481 xpc/2017-finals XPC 2017-03-03 USA OH Columbus 2017 XPC Finals Shane Hammock M Multi-ply NaN Elite 129.46 140 NaN NaN NaN NaN NaN 362.50 362.50 2 205.18

386414 rows × 23 columns

In [18]:
print(check_q4(powerlifting_meets.set_index("MeetID").join(powerlifting_competitors.set_index("MeetID"))))
print('-------')
print(answer_q4())
True
-------
powerlifting_meets.set_index("MeetID").join(powerlifting_competitors.set_index("MeetID"))
None