Introduction

Welcome to the Learn Pandas track. These hands-on exercises are targeted for someone who has worked with Pandas a little before. Each page has a list of relevant resources you can use if you get stumped. The top item in each list has been custom-made to help you with the exercises on that page.

The first step in most data analytics projects is reading the data file. In this section, you'll create Series and DataFrame objects, both by hand and by reading data files.

Relevant Resources

Set Up

Run the code cell below to load libraries you will need (including coad to check your answers).

In [1]:
import pandas as pd
pd.set_option('max_rows', 5)
from learntools.advanced_pandas.creating_reading_writing import *

Checking Answers

You can check your answers in each of the exercises that follow using the check_qN function provided in the code cell above (replacing N with the number of the exercise). For example here's how you would check an incorrect answer to exercise 1:

In [2]:
check_q1(pd.DataFrame())
Out[2]:
False

For the questions that follow, if you use check_qN on your answer, and your answer is right, a simple True value will be returned.

If you get stuck, you may run the print(answer_qN()) function to print the answer outright.

Exercises

Exercise 1: Create a DataFrame that looks like this:

In [3]:
pd.DataFrame({'Apples': [30], 'Bananas': [21]})
Out[3]:
Apples Bananas
0 30 21
In [4]:
print(check_q1(pd.DataFrame({'Apples': [30], 'Bananas': [21]})))
print('-------')
print(answer_q1())
True
-------
pd.DataFrame({'Apples': [30], 'Bananas': [21]})
None

Exercise 2: Create the following DataFrame:

In [5]:
pd.DataFrame({'Apples': [35, 41], 
              'Bananas': [21, 34]},
             index=['2017 Sales', '2018 Sales'])
Out[5]:
Apples Bananas
2017 Sales 35 21
2018 Sales 41 34
In [6]:
print(check_q2(pd.DataFrame({'Apples': [35, 41], 
              'Bananas': [21, 34]},
             index=['2017 Sales', '2018 Sales'])))
print('-------')
print(answer_q2())
True
-------
pd.DataFrame(
    {'Apples': [35, 41], 'Bananas': [21, 34]},
    index=['2017 Sales', '2018 Sales']
)
None

Exercise 3: Create a Series that looks like this:

Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object
In [7]:
pd.Series(['4 cups', '1 cup', '2 large', '1 can'], index=['Flour', 'Milk', 'Eggs', 'Spam'], name='Dinner')
Out[7]:
Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object
In [8]:
print(check_q3(pd.Series(['4 cups', '1 cup', '2 large', '1 can'], index=['Flour', 'Milk', 'Eggs', 'Spam'], name='Dinner')))
print('-------')
print(answer_q3())
True
-------
pd.Series(['4 cups', '1 cup', '2 large', '1 can'], 
index=['Flour', 'Milk', 'Eggs', 'Spam'], 
name='Dinner')
None

Exercise 4: Read the following csv dataset on wine reviews into the a DataFrame:

The filepath to the CSV file is ../input/wine-reviews/winemag-data_first150k.csv.

In [9]:
pd.read_csv("../input/wine-reviews/winemag-data_first150k.csv", index_col=0)
Out[9]:
country description designation points price province region_1 region_2 variety winery
0 US This tremendous 100% varietal wine hails from ... Martha's Vineyard 96 235.0 California Napa Valley Napa Cabernet Sauvignon Heitz
1 Spain Ripe aromas of fig, blackberry and cassis are ... Carodorum Selección Especial Reserva 96 110.0 Northern Spain Toro NaN Tinta de Toro Bodega Carmen Rodríguez
... ... ... ... ... ... ... ... ... ... ...
150928 France A perfect salmon shade, with scents of peaches... Grand Brut Rosé 90 52.0 Champagne Champagne NaN Champagne Blend Gosset
150929 Italy More Pinot Grigios should taste like this. A r... NaN 90 15.0 Northeastern Italy Alto Adige NaN Pinot Grigio Alois Lageder

150930 rows × 10 columns

In [10]:
print(check_q4(pd.read_csv("../input/wine-reviews/winemag-data_first150k.csv", index_col=0)))
print('-------')
print(answer_q4())
True
-------
pd.read_csv("../input/wine-reviews/winemag-data_first150k.csv", index_col=0)
None

Exercise 6**: Suppose we have the following DataFrame:

In [11]:
q6_df = pd.DataFrame({'Cows': [12, 20], 'Goats': [22, 19]}, index=['Year 1', 'Year 2'])

Save this DataFrame to disc as a csv file with the name cows_and_goats.csv.

In [12]:
q6_df.to_csv("cows_and_goats.csv")
In [13]:
print(check_q6(q6_df.to_csv("cows_and_goats.csv")))
print('-------')
print(answer_q6())
True
-------
q6_df.to_csv("cows_and_goats.csv")
None

Exercise 7: This exercise is optional. Read the following SQL data into a DataFrame:

The filepath is ../input/pitchfork-data/database.sqlite. Hint: use the sqlite3 library. The name of the table is artists.

In [14]:
import sqlite3
conn = sqlite3.connect("../input/pitchfork-data/database.sqlite")
pd.read_sql_query("SELECT * FROM artists", conn)
Out[14]:
reviewid artist
0 22703 massive attack
1 22721 krallice
... ... ...
18829 2413 don caballero
18830 3723 neil hamburger

18831 rows × 2 columns

In [15]:
print(check_q7(pd.read_sql_query("SELECT * FROM artists", conn)))
print('-------')
print(answer_q7())
True
-------
import sqlite3
conn = sqlite3.connect("../input/pitchfork-data/database.sqlite")
pd.read_sql_query("SELECT * FROM artists", conn)
None

Keep going

Move on to the indexing, selecting and assigning workbook


This is part of the Learn Pandas series.