10. More Reshaping#

Continuing from Friday.

import pandas as pd
import seaborn as sns

# make plots look nicer, increase font size, and use colorblind compatible colors
sns.set_theme(font_scale=2,palette='colorblind')
arabica_data_url = 'https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/arabica_data_cleaned.csv'

coffee_df = pd.read_csv(arabica_data_url)

# compute ___ per ___
bag_total_df = coffee_df.groupby('Country.of.Origin')['Number.of.Bags'].sum()

# subset the summary Series for countries with over 15000 total and store as a list
high_prod_countries = list(bag_total_df[bag_total_df>15000].index)

# a lambda function that checks if a string c is one of the
#  countries in high_prod_countries
high_prod = lambda c: c in high_prod_countries

# add a column that indicates that the country is a high producer
coffee_df['high_production'] = coffee_df['Country.of.Origin'].apply(high_prod)



# filter based on production level threshold
high_prod_coffee_df = coffee_df[coffee_df['high_production']]

What happened when we filtered the data?

coffee_df.shape, high_prod_coffee_df.shape
((1311, 45), (732, 45))

We have many fewer rows.

Now that we’ve filtered the data. Let’s practice reshaping data to by Tidy again.

# replace the FIXMEs
scores_of_interest = ['Balance','Aroma','Body','Aftertaste']
attrs_of_interest = ['Country.of.Origin','Color']
high_prod_coffee_df_melted = high_prod_coffee_df.melt(
    id_vars = attrs_of_interest,
    value_vars = scores_of_interest,
    var_name = 'Score')

What happened?

high_prod_coffee_df_melted.shape
(2928, 4)

Now the shape is 4 times as long (because the length of the list we passed to value_vars is 4). And it has 4 columns: the length of the list we passed to id_vars + 2 (variable, value)

len(scores_of_interest)
4
len(scores_of_interest)*len(high_prod_coffee_df)
2928

We can seee the column names and what they have in them here:

high_prod_coffee_df_melted.head()
Country.of.Origin Color Score value
0 Guatemala NaN Balance 8.42
1 Brazil Bluish-Green Balance 8.33
2 Mexico Green Balance 8.17
3 Brazil Green Balance 8.00
4 Brazil Green Balance 8.00

Note that we passed a value to var_name to make that column named “Score”. We could also not pass that

high_prod_coffee_df.melt(
    id_vars = attrs_of_interest,
    value_vars = scores_of_interest)
Country.of.Origin Color variable value
0 Guatemala NaN Balance 8.42
1 Brazil Bluish-Green Balance 8.33
2 Mexico Green Balance 8.17
3 Brazil Green Balance 8.00
4 Brazil Green Balance 8.00
... ... ... ... ...
2923 Mexico Green Aftertaste 6.42
2924 Mexico Green Aftertaste 6.83
2925 Brazil Green Aftertaste 6.83
2926 Mexico None Aftertaste 6.25
2927 Guatemala Green Aftertaste 6.67

2928 rows × 4 columns

then we have variable and value as column names.

Try it yourself

How could you rename the value column?

The head has only ‘Balance’ in the ‘Score’ column, we could use sample to pick a random subset of the rows instead to see different values.

high_prod_coffee_df_melted.sample(5)
Country.of.Origin Color Score value
1276 Mexico Green Aroma 7.42
1549 Colombia Green Body 7.83
2641 Brazil Green Aftertaste 7.42
1244 Guatemala Green Aroma 7.50
852 Brazil Green Aroma 7.25

What does this let us do?

One thing is it makes plots easier, because seaborn is organized around tidy data.

sns.displot(data= high_prod_coffee_df_melted,
           x='value',hue='Country.of.Origin',
           col = 'Score', col_wrap=2, kind='kde',aspect =1.5)
<seaborn.axisgrid.FacetGrid at 0x7f9468e1ae80>
../_images/2021-09-29_18_1.png
sns.displot(data= high_prod_coffee_df_melted,
           x='value',hue='Color',
           col = 'Score', col_wrap=2, kind='kde',aspect =1.5)
<seaborn.axisgrid.FacetGrid at 0x7f9429ef4a60>
../_images/2021-09-29_19_1.png
sns.displot(data= high_prod_coffee_df_melted,
           x='value',hue='Country.of.Origin',
           col = 'Score', row='Color', kind='kde',aspect =1.5)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/distributions.py:316: UserWarning: Dataset has 0 variance; skipping density estimate. Pass `warn_singular=False` to disable this warning.
  warnings.warn(msg, UserWarning)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/distributions.py:316: UserWarning: Dataset has 0 variance; skipping density estimate. Pass `warn_singular=False` to disable this warning.
  warnings.warn(msg, UserWarning)
<seaborn.axisgrid.FacetGrid at 0x7f942ca1e5e0>
../_images/2021-09-29_20_2.png

10.1. Unpacking Jsons#

rhodyprog4ds_gh_events_url = 'https://api.github.com/orgs/rhodyprog4ds/events'
course_gh_df = pd.read_json(rhodyprog4ds_gh_events_url)
course_gh_df.head()
id type actor repo payload public created_at org
0 23017405497 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 400283911, 'name': 'rhodyprog4ds/BrownF... {'push_id': 10511497820, 'size': 1, 'distinct_... True 2022-07-21 23:02:34+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
1 23017380330 PushEvent {'id': 41898282, 'login': 'github-actions[bot]... {'id': 400283911, 'name': 'rhodyprog4ds/BrownF... {'push_id': 10511484308, 'size': 1, 'distinct_... True 2022-07-21 23:00:14+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
2 23017297788 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 400283911, 'name': 'rhodyprog4ds/BrownF... {'push_id': 10511442158, 'size': 1, 'distinct_... True 2022-07-21 22:52:08+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
3 23017256816 PushEvent {'id': 41898282, 'login': 'github-actions[bot]... {'id': 400283911, 'name': 'rhodyprog4ds/BrownF... {'push_id': 10511421441, 'size': 1, 'distinct_... True 2022-07-21 22:48:25+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
4 23017147996 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 400283911, 'name': 'rhodyprog4ds/BrownF... {'push_id': 10511370784, 'size': 1, 'distinct_... True 2022-07-21 22:40:28+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...

We want to transform each one of those from a dictionary like thing into a row in a data frame.

type(course_gh_df['actor'])
pandas.core.series.Series

Recall, that base python types can be used as function, to cast an object from type to another.

5
5
type(5)
int
str(5)
'5'

To unpack one column we can cast each element of the column to a series and then stack them back together.

First, let’s look at one row of one column

course_gh_df['actor'][0]
{'id': 10656079,
 'login': 'brownsarahm',
 'display_login': 'brownsarahm',
 'gravatar_id': '',
 'url': 'https://api.github.com/users/brownsarahm',
 'avatar_url': 'https://avatars.githubusercontent.com/u/10656079?'}

Now let’s cast it to a Series

pd.Series(course_gh_df['actor'][0])
id                                                        10656079
login                                                  brownsarahm
display_login                                          brownsarahm
gravatar_id                                                       
url                       https://api.github.com/users/brownsarahm
avatar_url       https://avatars.githubusercontent.com/u/10656079?
dtype: object

What we want is to do this over and over and stack them.

The apply method does this for us, in one compact step.

course_gh_df['actor'].apply(pd.Series)
id login display_login gravatar_id url avatar_url
0 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079?
1 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282?
2 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079?
3 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282?
4 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079?
5 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079?
6 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079?
7 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282?
8 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079?
9 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282?
10 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079?
11 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079?
12 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079?
13 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079?
14 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079?

How can we do this for all of the columns and put them back together after?

First, let’s make a list of the columns we need to convert.

js_cols = ['actor','repo','payload','org']

When we use .apply(pd.Series) we get a a DataFrame.

type(course_gh_df['actor'].apply(pd.Series))
pandas.core.frame.DataFrame

pd.concat takes a list of DataFrames and puts the together in one DataFrame.

to illustrate, it’s nice to make small dataFrames.

df1 = pd.DataFrame([[1,2,3],[3,4,7]], columns = ['A','B','t'])
df2 = pd.DataFrame([[10,20,30],[30,40,70]], columns = ['AA','BB','t'])
df1
A B t
0 1 2 3
1 3 4 7
df2
AA BB t
0 10 20 30
1 30 40 70

If we use concat with the default settings, it stacks them vertically and aligns any columns that have the same name.

pd.concat([df1,df2])
A B t AA BB
0 1.0 2.0 3 NaN NaN
1 3.0 4.0 7 NaN NaN
0 NaN NaN 30 10.0 20.0
1 NaN NaN 70 30.0 40.0

So, since the original DataFrames were both 2 rows with 3 columns each, with one column name appearing in both, we end up with a new DataFrame with shape (4,5) and it fills with NaN in the top right and the bottom left.

pd.concat([df1,df2]).shape
(4, 5)

We can use the axis parameter to tell it how to combine them. The default is axis=0, but axis=1 will combine along rows.

pd.concat([df1,df2], axis =1)
A B t AA BB t
0 1 2 3 10 20 30
1 3 4 7 30 40 70

So now we get no NaN values, because both DataFrames have the same number of rows and the same index.

df1.index == df2.index
array([ True,  True])

and we have a total of 6 columns and 2 rows.

pd.concat([df1,df2], axis =1).shape
(2, 6)

Back to our gh data, we want to make a list of DataFrames where each DataFrame corresponds to one of the columns in the original DataFrame, but unpacked and then stack them horizontally (axis=1) because each DataFrame in the list is based on the same original DataFrame, they again have the same index.

pd.concat([course_gh_df[cur_col].apply(pd.Series) for cur_col in js_cols],
         axis=1)
id login display_login gravatar_id url avatar_url id name url push_id ... commits ref_type master_branch description pusher_type id login gravatar_id url avatar_url
0 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051150e+10 ... [{'sha': '433500586ca5565bef504ff4c27f320bf466... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
1 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051148e+10 ... [{'sha': '90aeafc7245cb0fe0bd2622014d176527612... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
2 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051144e+10 ... [{'sha': 'ddd8f010d5da187edb15d8bbad1aebe20360... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
3 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051142e+10 ... [{'sha': '201d9ae167e019db90cd647ca8b25ec3b9ff... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
4 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051137e+10 ... [{'sha': '7dd816ec4eca8c85bb6e0c3fb296b1559230... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
5 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051130e+10 ... [{'sha': '8ba204abebd86331f83a40f4c7b2962f650d... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
6 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051100e+10 ... [{'sha': '2f4229efc77bf4ed85286c02205523e6c4f5... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
7 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? 337917892 rhodyprog4ds/rhodyprog4ds.github.io https://api.github.com/repos/rhodyprog4ds/rhod... 1.029240e+10 ... [{'sha': '89f1e98e001a74542a4be7bf34222cb422b7... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
8 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 337917892 rhodyprog4ds/rhodyprog4ds.github.io https://api.github.com/repos/rhodyprog4ds/rhod... 1.029240e+10 ... [{'sha': '6af6f8dd48787d9ef93186ad9155b471f38a... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
9 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? 501809605 rhodyprog4ds/jupyterlite https://api.github.com/repos/rhodyprog4ds/jupy... NaN ... NaN branch main NaN user 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
10 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 501809605 rhodyprog4ds/jupyterlite https://api.github.com/repos/rhodyprog4ds/jupy... NaN ... NaN branch main NaN user 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
11 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 501809605 rhodyprog4ds/jupyterlite https://api.github.com/repos/rhodyprog4ds/jupy... NaN ... NaN repository main NaN user 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
12 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 404401191 rhodyprog4ds/.github https://api.github.com/repos/rhodyprog4ds/.github 9.874933e+09 ... [{'sha': '2135ed9bc8bfe199e5d41313054b96646eca... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
13 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 404401191 rhodyprog4ds/.github https://api.github.com/repos/rhodyprog4ds/.github 9.874923e+09 ... [{'sha': '095cdd63ca1bcecfb3744b332130509ccc51... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
14 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 404401191 rhodyprog4ds/.github https://api.github.com/repos/rhodyprog4ds/.github 9.874895e+09 ... [{'sha': '169f607d53821cc7dfda9a89be3b2d31bd2c... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?

15 rows × 25 columns

Try it Yourself

examine the list of DataFrames to see what structure they share and do not share

In this case we get the same 30 rows, beacuse that’s what the API gave us and turned our 4 columns from js_cols into 26 columns.

pd.concat([course_gh_df[cur_col].apply(pd.Series) for cur_col in js_cols],
         axis=1).shape
(15, 25)

If we had used the default, we’d end up with 120 rows (30*4) and we have only 19 columns, because there are subfield names that are shared across the original columns. (eg most have an id)

pd.concat([course_gh_df[cur_col].apply(pd.Series) for cur_col in js_cols],
         axis=0).shape
(60, 18)

we might want to rename the new columns so that they have the original column name prepended to the new name. This will help us distinguish between the different id columns

pandas has a rename method for this.

and this is another job for lambdas.

pd.concat([course_gh_df[cur_col].apply(pd.Series).rename(columns = lambda c: cur_col + '_' +c)
           for cur_col in js_cols],
         axis=1)
actor_id actor_login actor_display_login actor_gravatar_id actor_url actor_avatar_url repo_id repo_name repo_url payload_push_id ... payload_commits payload_ref_type payload_master_branch payload_description payload_pusher_type org_id org_login org_gravatar_id org_url org_avatar_url
0 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051150e+10 ... [{'sha': '433500586ca5565bef504ff4c27f320bf466... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
1 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051148e+10 ... [{'sha': '90aeafc7245cb0fe0bd2622014d176527612... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
2 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051144e+10 ... [{'sha': 'ddd8f010d5da187edb15d8bbad1aebe20360... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
3 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051142e+10 ... [{'sha': '201d9ae167e019db90cd647ca8b25ec3b9ff... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
4 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051137e+10 ... [{'sha': '7dd816ec4eca8c85bb6e0c3fb296b1559230... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
5 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051130e+10 ... [{'sha': '8ba204abebd86331f83a40f4c7b2962f650d... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
6 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051100e+10 ... [{'sha': '2f4229efc77bf4ed85286c02205523e6c4f5... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
7 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? 337917892 rhodyprog4ds/rhodyprog4ds.github.io https://api.github.com/repos/rhodyprog4ds/rhod... 1.029240e+10 ... [{'sha': '89f1e98e001a74542a4be7bf34222cb422b7... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
8 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 337917892 rhodyprog4ds/rhodyprog4ds.github.io https://api.github.com/repos/rhodyprog4ds/rhod... 1.029240e+10 ... [{'sha': '6af6f8dd48787d9ef93186ad9155b471f38a... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
9 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? 501809605 rhodyprog4ds/jupyterlite https://api.github.com/repos/rhodyprog4ds/jupy... NaN ... NaN branch main NaN user 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
10 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 501809605 rhodyprog4ds/jupyterlite https://api.github.com/repos/rhodyprog4ds/jupy... NaN ... NaN branch main NaN user 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
11 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 501809605 rhodyprog4ds/jupyterlite https://api.github.com/repos/rhodyprog4ds/jupy... NaN ... NaN repository main NaN user 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
12 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 404401191 rhodyprog4ds/.github https://api.github.com/repos/rhodyprog4ds/.github 9.874933e+09 ... [{'sha': '2135ed9bc8bfe199e5d41313054b96646eca... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
13 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 404401191 rhodyprog4ds/.github https://api.github.com/repos/rhodyprog4ds/.github 9.874923e+09 ... [{'sha': '095cdd63ca1bcecfb3744b332130509ccc51... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
14 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 404401191 rhodyprog4ds/.github https://api.github.com/repos/rhodyprog4ds/.github 9.874895e+09 ... [{'sha': '169f607d53821cc7dfda9a89be3b2d31bd2c... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?

15 rows × 25 columns

the rename method’s column parameter can take a lambda defined inline, which is helpful, because we want that function to take one parameter (the current columnt name) and do the same thing to all of the columns within a single DataFrame, but to prepend a different thing for each DataFrame

pd.concat([course_gh_df[cur_col].apply(pd.Series).rename(columns = lambda c: cur_col + '_' +c)
           for cur_col in js_cols],
         axis=1)
actor_id actor_login actor_display_login actor_gravatar_id actor_url actor_avatar_url repo_id repo_name repo_url payload_push_id ... payload_commits payload_ref_type payload_master_branch payload_description payload_pusher_type org_id org_login org_gravatar_id org_url org_avatar_url
0 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051150e+10 ... [{'sha': '433500586ca5565bef504ff4c27f320bf466... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
1 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051148e+10 ... [{'sha': '90aeafc7245cb0fe0bd2622014d176527612... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
2 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051144e+10 ... [{'sha': 'ddd8f010d5da187edb15d8bbad1aebe20360... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
3 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051142e+10 ... [{'sha': '201d9ae167e019db90cd647ca8b25ec3b9ff... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
4 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051137e+10 ... [{'sha': '7dd816ec4eca8c85bb6e0c3fb296b1559230... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
5 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051130e+10 ... [{'sha': '8ba204abebd86331f83a40f4c7b2962f650d... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
6 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 400283911 rhodyprog4ds/BrownFall21 https://api.github.com/repos/rhodyprog4ds/Brow... 1.051100e+10 ... [{'sha': '2f4229efc77bf4ed85286c02205523e6c4f5... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
7 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? 337917892 rhodyprog4ds/rhodyprog4ds.github.io https://api.github.com/repos/rhodyprog4ds/rhod... 1.029240e+10 ... [{'sha': '89f1e98e001a74542a4be7bf34222cb422b7... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
8 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 337917892 rhodyprog4ds/rhodyprog4ds.github.io https://api.github.com/repos/rhodyprog4ds/rhod... 1.029240e+10 ... [{'sha': '6af6f8dd48787d9ef93186ad9155b471f38a... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
9 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? 501809605 rhodyprog4ds/jupyterlite https://api.github.com/repos/rhodyprog4ds/jupy... NaN ... NaN branch main NaN user 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
10 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 501809605 rhodyprog4ds/jupyterlite https://api.github.com/repos/rhodyprog4ds/jupy... NaN ... NaN branch main NaN user 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
11 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 501809605 rhodyprog4ds/jupyterlite https://api.github.com/repos/rhodyprog4ds/jupy... NaN ... NaN repository main NaN user 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
12 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 404401191 rhodyprog4ds/.github https://api.github.com/repos/rhodyprog4ds/.github 9.874933e+09 ... [{'sha': '2135ed9bc8bfe199e5d41313054b96646eca... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
13 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 404401191 rhodyprog4ds/.github https://api.github.com/repos/rhodyprog4ds/.github 9.874923e+09 ... [{'sha': '095cdd63ca1bcecfb3744b332130509ccc51... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
14 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? 404401191 rhodyprog4ds/.github https://api.github.com/repos/rhodyprog4ds/.github 9.874895e+09 ... [{'sha': '169f607d53821cc7dfda9a89be3b2d31bd2c... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?

15 rows × 25 columns

So now, we have the unpacked columns with good column names, but we lost the columns that were originally good.

How can we append the new columns to the old ones? First we can make a DataFrame that’s just the columns not on the list that we’re goign to expand.

course_gf_df_good = course_gh_df[[col for col in
          course_gh_df.columns if not(col in js_cols)]]
course_gf_df_good
id type public created_at
0 23017405497 PushEvent True 2022-07-21 23:02:34+00:00
1 23017380330 PushEvent True 2022-07-21 23:00:14+00:00
2 23017297788 PushEvent True 2022-07-21 22:52:08+00:00
3 23017256816 PushEvent True 2022-07-21 22:48:25+00:00
4 23017147996 PushEvent True 2022-07-21 22:40:28+00:00
5 23017000728 PushEvent True 2022-07-21 22:28:48+00:00
6 23016409837 PushEvent True 2022-07-21 21:41:28+00:00
7 22594628501 PushEvent True 2022-06-29 00:36:34+00:00
8 22594620942 PushEvent True 2022-06-29 00:35:39+00:00
9 22258790417 CreateEvent True 2022-06-09 21:15:41+00:00
10 22258731281 CreateEvent True 2022-06-09 21:11:34+00:00
11 22258730912 CreateEvent True 2022-06-09 21:11:33+00:00
12 21774885552 PushEvent True 2022-05-13 12:54:30+00:00
13 21774865161 PushEvent True 2022-05-13 12:53:22+00:00
14 21774810879 PushEvent True 2022-05-13 12:50:21+00:00

Then we can prepend that to the list that we pass to concat. We have to put it in a list first, then use + to do that.

pd.concat([course_gf_df_good]+[course_gh_df[col].apply(pd.Series,).rename(
  columns= lambda i_col: col + '_' + i_col )
      for col in js_cols],axis=1)
id type public created_at actor_id actor_login actor_display_login actor_gravatar_id actor_url actor_avatar_url ... payload_commits payload_ref_type payload_master_branch payload_description payload_pusher_type org_id org_login org_gravatar_id org_url org_avatar_url
0 23017405497 PushEvent True 2022-07-21 23:02:34+00:00 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? ... [{'sha': '433500586ca5565bef504ff4c27f320bf466... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
1 23017380330 PushEvent True 2022-07-21 23:00:14+00:00 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? ... [{'sha': '90aeafc7245cb0fe0bd2622014d176527612... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
2 23017297788 PushEvent True 2022-07-21 22:52:08+00:00 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? ... [{'sha': 'ddd8f010d5da187edb15d8bbad1aebe20360... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
3 23017256816 PushEvent True 2022-07-21 22:48:25+00:00 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? ... [{'sha': '201d9ae167e019db90cd647ca8b25ec3b9ff... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
4 23017147996 PushEvent True 2022-07-21 22:40:28+00:00 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? ... [{'sha': '7dd816ec4eca8c85bb6e0c3fb296b1559230... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
5 23017000728 PushEvent True 2022-07-21 22:28:48+00:00 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? ... [{'sha': '8ba204abebd86331f83a40f4c7b2962f650d... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
6 23016409837 PushEvent True 2022-07-21 21:41:28+00:00 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? ... [{'sha': '2f4229efc77bf4ed85286c02205523e6c4f5... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
7 22594628501 PushEvent True 2022-06-29 00:36:34+00:00 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? ... [{'sha': '89f1e98e001a74542a4be7bf34222cb422b7... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
8 22594620942 PushEvent True 2022-06-29 00:35:39+00:00 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? ... [{'sha': '6af6f8dd48787d9ef93186ad9155b471f38a... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
9 22258790417 CreateEvent True 2022-06-09 21:15:41+00:00 41898282 github-actions[bot] github-actions https://api.github.com/users/github-actions[bot] https://avatars.githubusercontent.com/u/41898282? ... NaN branch main NaN user 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
10 22258731281 CreateEvent True 2022-06-09 21:11:34+00:00 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? ... NaN branch main NaN user 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
11 22258730912 CreateEvent True 2022-06-09 21:11:33+00:00 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? ... NaN repository main NaN user 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
12 21774885552 PushEvent True 2022-05-13 12:54:30+00:00 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? ... [{'sha': '2135ed9bc8bfe199e5d41313054b96646eca... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
13 21774865161 PushEvent True 2022-05-13 12:53:22+00:00 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? ... [{'sha': '095cdd63ca1bcecfb3744b332130509ccc51... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?
14 21774810879 PushEvent True 2022-05-13 12:50:21+00:00 10656079 brownsarahm brownsarahm https://api.github.com/users/brownsarahm https://avatars.githubusercontent.com/u/10656079? ... [{'sha': '169f607d53821cc7dfda9a89be3b2d31bd2c... NaN NaN NaN NaN 69595187 rhodyprog4ds https://api.github.com/orgs/rhodyprog4ds https://avatars.githubusercontent.com/u/69595187?

15 rows × 29 columns

To see how the list math works

['a'] + ['b','c','d']
['a', 'b', 'c', 'd']

results in one list

but without the [] we get a type error

'a' + ['b','c','d']
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [39], in <cell line: 1>()
----> 1 'a' + ['b','c','d']

TypeError: can only concatenate str (not "list") to str

List operations return None and mutate the list in place so

orig_list = ['a']
new_items = ['b','c','d']
orig_list.extend(new_items)

outputs nothing because None was returned and it changes the original variable.

orig_list
['a', 'b', 'c', 'd']
type(orig_list.extend(new_items))
NoneType

is none.

10.2. Questions After Class#

All clarifying questions today

10.2.1. How does Axis work?#

the notes above are expanded a lot, which should help. You can see more examples in the Tidy Data Explanation and on the Cheat Sheet.

The axis parameter is a parameter in a lot of pandas functions, you can see it used in most of the statistics we used last week as well, because though column operations are the default, we can do row-wise as well.

For more on concatenation, see the Pandas user guide or API docs sections on it.

10.2.2. How does melt work?#

the notes are expanded a lot. Also see Tidying data.

For the concept, you can also see the original Tidy Data paper.

For the pandas method, see its docs.

10.2.3. What about the NaNs that are still left?#

those are Nan in the data because the events are different types and different types of events have different information available about them.

If we groupby event type and then look at, for example, the payload columns. We see that the NaNs are explained by that. (remember, count tells how many are not NaN)