Introduction to the Research Environment¶

The research environment is powered by IPython notebooks, which allow one to perform a great deal of data analysis and statistical validation. We'll demonstrate a few simple techniques here.

Code Cells vs. Text Cells¶

As you can see, each cell can be either code or text. To select between them, choose from the 'Cell Type' dropdown menu on the top left.

Executing a Command¶

A code cell will be evaluated when you press play, or when you press the shortcut, shift-enter. Evaluating a cell evaluates each line of code in sequence, and prints the results of the last line below the cell.

2 + 2

4

Sometimes there is no result to be printed, as is the case with assignment.

X = 2

Remember that only the result from the last line is printed.

2 + 2
3 + 3

6

However, you can print whichever lines you want using the print statement.

print 2 + 2
3 + 3

4

6

Knowing When a Cell is Running¶

While a cell is running, a [*] will display on the left. When a cell has yet to be executed, [ ] will display. When it has been run, a number will display indicating the order in which it was run during the execution of the notebook [5]. Try on this cell and note it happening.

#Take some time to run something
c = 0
for i in range(10000000):
    c = c + i
c

49999995000000

Importing Libraries¶

The vast majority of the time, you'll want to use functions from pre-built libraries. You can't import every library on Quantopian due to security issues, but you can import most of the common scientific ones. Here I import numpy and pandas, the two most common and useful libraries in quant finance. I recommend copying this import statement to every new notebook.

Notice that you can rename libraries to whatever you want after importing. The as statement allows this. Here we use np and pd as aliases for numpy and pandas. This is a very common aliasing and will be found in most code snippets around the web. The point behind this is to allow you to type fewer characters when you are frequently accessing these libraries.

import numpy as np
import pandas as pd

# This is a plotting library for pretty pictures.
import matplotlib.pyplot as plt

Tab Autocomplete¶

Pressing tab will give you a list of IPython's best guesses for what you might want to type next. This is incredibly valuable and will save you a lot of time. If there is only one possible option for what you could type next, IPython will fill that in for you. Try pressing tab very frequently, it will seldom fill in anything you don't want, as if there is ambiguity a list will be shown. This is a great way to see what functions are available in a library.

Try placing your cursor after the . and pressing tab.

np.random.beta

<function beta>

Getting Documentation Help¶

Placing a question mark after a function and executing that line of code will give you the documentation IPython has for that function. It's often best to do this in a new cell, as you avoid re-executing other code and running into bugs.

np.random.normal?

Sampling¶

We'll sample some random data using a function from numpy.

# Sample 100 points with a mean of 0 and an std of 1. This is a standard normal distribution.
X = np.random.normal(0, 1, 100)

Plotting¶

We can use the plotting library we imported as follows.

plt.plot(X)

[<matplotlib.lines.Line2D at 0x7f442f459dd0>]

Squelching Line Output¶

You might have noticed the annoying line of the form [<matplotlib.lines.Line2D at 0x7f72fdbc1710>] before the plots. This is because the .plot function actually produces output. Sometimes we wish not to display output, we can accomplish this with the semi-colon as follows.

plt.plot(X);

Adding Axis Labels¶

No self-respecting quant leaves a graph without labeled axes. Here are some commands to help with that.

X = np.random.normal(0, 1, 100)
X2 = np.random.normal(0, 1, 100)

plt.plot(X);
plt.plot(X2);
plt.xlabel('Time') # The data we generated is unitless, but don't forget units in general.
plt.ylabel('Returns')
plt.legend(['X', 'X2']);

Generating Statistics¶

Let's use numpy to take some simple statistics.

np.mean(X)

-0.026898970513545093

np.std(X)

0.99233783955549493

Getting Real Pricing Data¶

Randomly sampled data can be great for testing ideas, but let's get some real data. We can use get_pricing to do that. You can use the ? syntax as discussed above to get more information on get_pricing's arguments.

data = get_pricing('MSFT', start_date='2012-1-1', end_date='2015-6-1')

data

Our data is now a dataframe. You can see the datetime index and the colums with different pricing data.

This is a pandas dataframe, so we can index in to just get price like this. For more info on pandas, please click here.

X = data['price']

X

2012-01-03 00:00:00+00:00    24.319
2012-01-04 00:00:00+00:00    24.826
2012-01-05 00:00:00+00:00    25.080
2012-01-06 00:00:00+00:00    25.488
2012-01-09 00:00:00+00:00    25.143
2012-01-10 00:00:00+00:00    25.234
2012-01-11 00:00:00+00:00    25.125
2012-01-12 00:00:00+00:00    25.379
2012-01-13 00:00:00+00:00    25.606
2012-01-17 00:00:00+00:00    25.615
2012-01-18 00:00:00+00:00    25.588
2012-01-19 00:00:00+00:00    25.497
2012-01-20 00:00:00+00:00    26.929
2012-01-23 00:00:00+00:00    26.947
2012-01-24 00:00:00+00:00    26.594
2012-01-25 00:00:00+00:00    26.802
2012-01-26 00:00:00+00:00    26.748
2012-01-27 00:00:00+00:00    26.485
2012-01-30 00:00:00+00:00    26.838
2012-01-31 00:00:00+00:00    26.775
2012-02-01 00:00:00+00:00    27.092
2012-02-02 00:00:00+00:00    27.142
2012-02-03 00:00:00+00:00    27.391
2012-02-06 00:00:00+00:00    27.373
2012-02-07 00:00:00+00:00    27.518
2012-02-08 00:00:00+00:00    27.790
2012-02-09 00:00:00+00:00    27.890
2012-02-10 00:00:00+00:00    27.622
2012-02-13 00:00:00+00:00    27.699
2012-02-14 00:00:00+00:00    27.599
                              ...  
2015-04-20 00:00:00+00:00    42.623
2015-04-21 00:00:00+00:00    42.365
2015-04-22 00:00:00+00:00    42.702
2015-04-23 00:00:00+00:00    43.070
2015-04-24 00:00:00+00:00    47.561
2015-04-27 00:00:00+00:00    47.720
2015-04-28 00:00:00+00:00    48.843
2015-04-29 00:00:00+00:00    48.733
2015-04-30 00:00:00+00:00    48.346
2015-05-01 00:00:00+00:00    48.336
2015-05-04 00:00:00+00:00    47.929
2015-05-05 00:00:00+00:00    47.303
2015-05-06 00:00:00+00:00    45.991
2015-05-07 00:00:00+00:00    46.398
2015-05-08 00:00:00+00:00    47.412
2015-05-11 00:00:00+00:00    47.064
2015-05-12 00:00:00+00:00    47.054
2015-05-13 00:00:00+00:00    47.322
2015-05-14 00:00:00+00:00    48.405
2015-05-15 00:00:00+00:00    47.988
2015-05-18 00:00:00+00:00    47.700
2015-05-19 00:00:00+00:00    47.580
2015-05-20 00:00:00+00:00    47.580
2015-05-21 00:00:00+00:00    47.420
2015-05-22 00:00:00+00:00    46.900
2015-05-26 00:00:00+00:00    46.600
2015-05-27 00:00:00+00:00    47.620
2015-05-28 00:00:00+00:00    47.450
2015-05-29 00:00:00+00:00    46.860
2015-06-01 00:00:00+00:00    47.240
Freq: C, Name: price, dtype: float64

Because there is now also date information in our data, we provide two series to .plot. X.index gives us the datetime index, and X.values gives us the pricing values. These are used as the X and Y coordinates to make a graph.

plt.plot(X.index, X.values)
plt.ylabel('Price')
plt.legend(['MSFT']);

We can get statistics again on real data.

np.mean(X)

34.49160093348889

np.std(X)

7.309055602383863

Getting Returns from Prices¶

We can use the pct_change function to get returns. Notice how we drop the first element after doing this, as it will be NaN (nothing -> something results in a NaN percent change).

R = X.pct_change()[1:]

R

2012-01-04 00:00:00+00:00    0.020848
2012-01-05 00:00:00+00:00    0.010231
2012-01-06 00:00:00+00:00    0.016268
2012-01-09 00:00:00+00:00   -0.013536
2012-01-10 00:00:00+00:00    0.003619
2012-01-11 00:00:00+00:00   -0.004320
2012-01-12 00:00:00+00:00    0.010109
2012-01-13 00:00:00+00:00    0.008944
2012-01-17 00:00:00+00:00    0.000351
2012-01-18 00:00:00+00:00   -0.001054
2012-01-19 00:00:00+00:00   -0.003556
2012-01-20 00:00:00+00:00    0.056163
2012-01-23 00:00:00+00:00    0.000668
2012-01-24 00:00:00+00:00   -0.013100
2012-01-25 00:00:00+00:00    0.007821
2012-01-26 00:00:00+00:00   -0.002015
2012-01-27 00:00:00+00:00   -0.009833
2012-01-30 00:00:00+00:00    0.013328
2012-01-31 00:00:00+00:00   -0.002347
2012-02-01 00:00:00+00:00    0.011839
2012-02-02 00:00:00+00:00    0.001846
2012-02-03 00:00:00+00:00    0.009174
2012-02-06 00:00:00+00:00   -0.000657
2012-02-07 00:00:00+00:00    0.005297
2012-02-08 00:00:00+00:00    0.009884
2012-02-09 00:00:00+00:00    0.003598
2012-02-10 00:00:00+00:00   -0.009609
2012-02-13 00:00:00+00:00    0.002788
2012-02-14 00:00:00+00:00   -0.003610
2012-02-15 00:00:00+00:00   -0.006594
                               ...   
2015-04-20 00:00:00+00:00    0.030761
2015-04-21 00:00:00+00:00   -0.006053
2015-04-22 00:00:00+00:00    0.007955
2015-04-23 00:00:00+00:00    0.008618
2015-04-24 00:00:00+00:00    0.104272
2015-04-27 00:00:00+00:00    0.003343
2015-04-28 00:00:00+00:00    0.023533
2015-04-29 00:00:00+00:00   -0.002252
2015-04-30 00:00:00+00:00   -0.007941
2015-05-01 00:00:00+00:00   -0.000207
2015-05-04 00:00:00+00:00   -0.008420
2015-05-05 00:00:00+00:00   -0.013061
2015-05-06 00:00:00+00:00   -0.027736
2015-05-07 00:00:00+00:00    0.008850
2015-05-08 00:00:00+00:00    0.021854
2015-05-11 00:00:00+00:00   -0.007340
2015-05-12 00:00:00+00:00   -0.000212
2015-05-13 00:00:00+00:00    0.005696
2015-05-14 00:00:00+00:00    0.022886
2015-05-15 00:00:00+00:00   -0.008615
2015-05-18 00:00:00+00:00   -0.006002
2015-05-19 00:00:00+00:00   -0.002516
2015-05-20 00:00:00+00:00    0.000000
2015-05-21 00:00:00+00:00   -0.003363
2015-05-22 00:00:00+00:00   -0.010966
2015-05-26 00:00:00+00:00   -0.006397
2015-05-27 00:00:00+00:00    0.021888
2015-05-28 00:00:00+00:00   -0.003570
2015-05-29 00:00:00+00:00   -0.012434
2015-06-01 00:00:00+00:00    0.008109
Freq: C, Name: price, dtype: float64

We can plot the returns distribution as a histogram.

plt.hist(R, bins=20)
plt.xlabel('Return')
plt.ylabel('Frequency')
plt.legend(['MSFT Returns']);

Get statistics again.

np.mean(R)

0.000879089143363588

np.std(R)

0.014347860964324364

Now let's go backwards and generate data out of a normal distribution using the statistics we estimated from Microsoft's returns. We'll see that we have good reason to suspect Microsoft's returns may not be normal, as the resulting normal distribution looks far different.

plt.hist(np.random.normal(np.mean(R), np.std(R), 10000), bins=20)
plt.xlabel('Return')
plt.ylabel('Frequency')
plt.legend(['Normally Distributed Returns']);

Generating a Moving Average¶

pandas has some nice tools to allow us to generate rolling statistics. Here's an example. Notice how there's no moving average for the first 60 days, as we don't have 60 days of data on which to generate the statistic.

# Take the average of the last 60 days at each timepoint.
MAVG = pd.rolling_mean(X, window=60)
plt.plot(X.index, X.values)
plt.plot(MAVG.index, MAVG.values)
plt.ylabel('Price')
plt.legend(['MSFT', '60-day MAVG']);

/usr/local/lib/python2.7/dist-packages/ipykernel_launcher.py:2: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with 
	Series.rolling(window=60,center=False).mean()

This presentation is for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation for any security; nor does it constitute an offer to provide investment advisory or other services by Quantopian, Inc. ("Quantopian"). Nothing contained herein constitutes investment advice or offers any opinion with respect to the suitability of any security, and any views expressed herein should not be taken as advice to buy, sell, or hold any security or as an endorsement of any security or company. In preparing the information contained herein, Quantopian, Inc. has not taken into account the investment needs, objectives, and financial circumstances of any particular investor. Any views expressed and data illustrated herein were prepared based upon information, believed to be reliable, available to Quantopian, Inc. at the time of publication. Quantopian makes no guarantees as to their accuracy or completeness. All information is subject to change and may quickly become unreliable for various reasons, including changes in market conditions or economic circumstances.

	open_price	high	low	close_price	volume	price
2012-01-03 00:00:00+00:00	24.065	24.436	23.920	24.319	60891291.0	24.319
2012-01-04 00:00:00+00:00	24.309	24.899	24.273	24.826	76534029.0	24.826
2012-01-05 00:00:00+00:00	24.817	25.133	24.736	25.080	53479335.0	25.080
2012-01-06 00:00:00+00:00	24.953	25.551	24.949	25.488	91671771.0	25.488
2012-01-09 00:00:00+00:00	25.424	25.470	25.125	25.143	56352965.0	25.143
2012-01-10 00:00:00+00:00	25.316	25.515	25.152	25.234	54223945.0	25.234
2012-01-11 00:00:00+00:00	24.862	25.361	24.808	25.125	62855941.0	25.125
2012-01-12 00:00:00+00:00	25.261	25.397	25.057	25.379	46186121.0	25.379
2012-01-13 00:00:00+00:00	25.316	25.606	25.189	25.606	55326851.0	25.606
2012-01-17 00:00:00+00:00	25.742	25.968	25.533	25.615	66537766.0	25.615
2012-01-18 00:00:00+00:00	25.660	25.742	25.352	25.588	60912302.0	25.588
2012-01-19 00:00:00+00:00	25.524	25.773	25.406	25.497	66169544.0	25.497
2012-01-20 00:00:00+00:00	26.122	26.956	26.059	26.929	157989713.0	26.929
2012-01-23 00:00:00+00:00	26.784	27.147	26.603	26.947	70185739.0	26.947
2012-01-24 00:00:00+00:00	26.711	26.802	26.449	26.594	48606276.0	26.594
2012-01-25 00:00:00+00:00	26.349	26.875	26.349	26.802	55871304.0	26.802
2012-01-26 00:00:00+00:00	26.838	26.920	26.648	26.748	46229333.0	26.748
2012-01-27 00:00:00+00:00	26.693	26.766	26.440	26.485	41452599.0	26.485
2012-01-30 00:00:00+00:00	26.258	26.847	26.131	26.838	46520829.0	26.838
2012-01-31 00:00:00+00:00	26.884	26.920	26.494	26.775	40812893.0	26.775
2012-02-01 00:00:00+00:00	27.002	27.237	26.974	27.092	64067228.0	27.092
2012-02-02 00:00:00+00:00	27.101	27.346	26.929	27.142	49959760.0	27.142
2012-02-03 00:00:00+00:00	27.319	27.554	27.273	27.391	38728694.0	27.391
2012-02-06 00:00:00+00:00	27.228	27.391	27.165	27.373	26067704.0	27.373
2012-02-07 00:00:00+00:00	27.328	27.631	27.237	27.518	36751791.0	27.518
2012-02-08 00:00:00+00:00	27.428	27.799	27.391	27.790	47382414.0	27.790
2012-02-09 00:00:00+00:00	27.808	27.917	27.627	27.890	45580004.0	27.890
2012-02-10 00:00:00+00:00	27.772	27.917	27.518	27.622	38526519.0	27.622
2012-02-13 00:00:00+00:00	27.763	27.890	27.582	27.699	31447419.0	27.699
2012-02-14 00:00:00+00:00	27.672	27.791	27.234	27.599	50770567.0	27.599
...	...	...	...	...	...	...
2015-04-20 00:00:00+00:00	41.461	42.891	41.411	42.623	38568282.0	42.623
2015-04-21 00:00:00+00:00	42.722	42.871	42.255	42.365	22952534.0	42.365
2015-04-22 00:00:00+00:00	42.394	42.852	42.275	42.702	21182014.0	42.702
2015-04-23 00:00:00+00:00	42.613	43.328	42.524	43.070	37178490.0	43.070
2015-04-24 00:00:00+00:00	45.365	47.829	45.355	47.561	114396800.0	47.561
2015-04-27 00:00:00+00:00	46.925	47.819	46.915	47.720	52844498.0	47.720
2015-04-28 00:00:00+00:00	47.471	48.892	47.392	48.843	53637731.0	48.843
2015-04-29 00:00:00+00:00	48.405	48.992	48.187	48.733	43326023.0	48.733
2015-04-30 00:00:00+00:00	48.386	49.220	48.286	48.346	56865088.0	48.346
2015-05-01 00:00:00+00:00	48.266	48.559	48.087	48.336	32389494.0	48.336
2015-05-04 00:00:00+00:00	48.058	48.554	47.869	47.929	30328340.0	47.929
2015-05-05 00:00:00+00:00	47.511	47.849	47.005	47.303	45944460.0	47.303
2015-05-06 00:00:00+00:00	47.263	47.462	45.723	45.991	47535797.0	45.991
2015-05-07 00:00:00+00:00	45.971	46.781	45.862	46.398	27047479.0	46.398
2015-05-08 00:00:00+00:00	47.243	47.670	47.213	47.412	27844465.0	47.412
2015-05-11 00:00:00+00:00	47.243	47.601	47.064	47.064	17346165.0	47.064
2015-05-12 00:00:00+00:00	46.547	47.372	46.120	47.054	24031639.0	47.054
2015-05-13 00:00:00+00:00	47.879	48.008	47.263	47.322	28548931.0	47.322
2015-05-14 00:00:00+00:00	47.720	48.505	47.720	48.405	26847365.0	48.405
2015-05-15 00:00:00+00:00	48.554	48.589	47.740	47.988	23411783.0	47.988
2015-05-18 00:00:00+00:00	47.670	47.909	47.303	47.700	20390841.0	47.700
2015-05-19 00:00:00+00:00	47.560	47.810	47.180	47.580	22462876.0	47.580
2015-05-20 00:00:00+00:00	47.390	47.930	47.270	47.580	18413349.0	47.580
2015-05-21 00:00:00+00:00	47.280	47.600	47.005	47.420	18679997.0	47.420
2015-05-22 00:00:00+00:00	47.300	47.350	46.820	46.900	21500514.0	46.900
2015-05-26 00:00:00+00:00	46.830	46.880	46.190	46.600	25219693.0	46.600
2015-05-27 00:00:00+00:00	46.820	47.770	46.620	47.620	22291914.0	47.620
2015-05-28 00:00:00+00:00	47.500	48.020	47.390	47.450	16527053.0	47.450
2015-05-29 00:00:00+00:00	47.430	47.570	46.590	46.860	25462536.0	46.860
2015-06-01 00:00:00+00:00	47.060	47.770	46.620	47.240	24322867.0	47.240