More Reshaping
Contents
10. More Reshaping#
Continuing from Friday.
import pandas as pd
import seaborn as sns
# make plots look nicer, increase font size, and use colorblind compatible colors
sns.set_theme(font_scale=2,palette='colorblind')
arabica_data_url = 'https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/arabica_data_cleaned.csv'
coffee_df = pd.read_csv(arabica_data_url)
# compute ___ per ___
bag_total_df = coffee_df.groupby('Country.of.Origin')['Number.of.Bags'].sum()
# subset the summary Series for countries with over 15000 total and store as a list
high_prod_countries = list(bag_total_df[bag_total_df>15000].index)
# a lambda function that checks if a string c is one of the
# countries in high_prod_countries
high_prod = lambda c: c in high_prod_countries
# add a column that indicates that the country is a high producer
coffee_df['high_production'] = coffee_df['Country.of.Origin'].apply(high_prod)
# filter based on production level threshold
high_prod_coffee_df = coffee_df[coffee_df['high_production']]
What happened when we filtered the data?
coffee_df.shape, high_prod_coffee_df.shape
((1311, 45), (732, 45))
We have many fewer rows.
Now that we’ve filtered the data. Let’s practice reshaping data to by Tidy again.
# replace the FIXMEs
scores_of_interest = ['Balance','Aroma','Body','Aftertaste']
attrs_of_interest = ['Country.of.Origin','Color']
high_prod_coffee_df_melted = high_prod_coffee_df.melt(
id_vars = attrs_of_interest,
value_vars = scores_of_interest,
var_name = 'Score')
What happened?
high_prod_coffee_df_melted.shape
(2928, 4)
Now the shape is 4 times as long (because the length of the list we passed to value_vars is 4). And it has 4 columns: the length of the list we passed to id_vars
+ 2 (variable, value)
len(scores_of_interest)
4
len(scores_of_interest)*len(high_prod_coffee_df)
2928
We can seee the column names and what they have in them here:
high_prod_coffee_df_melted.head()
Country.of.Origin | Color | Score | value | |
---|---|---|---|---|
0 | Guatemala | NaN | Balance | 8.42 |
1 | Brazil | Bluish-Green | Balance | 8.33 |
2 | Mexico | Green | Balance | 8.17 |
3 | Brazil | Green | Balance | 8.00 |
4 | Brazil | Green | Balance | 8.00 |
Note that we passed a value to var_name
to make that column named “Score”. We could also not pass that
high_prod_coffee_df.melt(
id_vars = attrs_of_interest,
value_vars = scores_of_interest)
Country.of.Origin | Color | variable | value | |
---|---|---|---|---|
0 | Guatemala | NaN | Balance | 8.42 |
1 | Brazil | Bluish-Green | Balance | 8.33 |
2 | Mexico | Green | Balance | 8.17 |
3 | Brazil | Green | Balance | 8.00 |
4 | Brazil | Green | Balance | 8.00 |
... | ... | ... | ... | ... |
2923 | Mexico | Green | Aftertaste | 6.42 |
2924 | Mexico | Green | Aftertaste | 6.83 |
2925 | Brazil | Green | Aftertaste | 6.83 |
2926 | Mexico | None | Aftertaste | 6.25 |
2927 | Guatemala | Green | Aftertaste | 6.67 |
2928 rows × 4 columns
then we have variable
and value
as column names.
Try it yourself
How could you rename the value
column?
The head has only ‘Balance’ in the ‘Score’ column, we could use sample
to pick a random subset of the rows instead to see different values.
high_prod_coffee_df_melted.sample(5)
Country.of.Origin | Color | Score | value | |
---|---|---|---|---|
1276 | Mexico | Green | Aroma | 7.42 |
1549 | Colombia | Green | Body | 7.83 |
2641 | Brazil | Green | Aftertaste | 7.42 |
1244 | Guatemala | Green | Aroma | 7.50 |
852 | Brazil | Green | Aroma | 7.25 |
What does this let us do?
One thing is it makes plots easier, because seaborn is organized around tidy data.
sns.displot(data= high_prod_coffee_df_melted,
x='value',hue='Country.of.Origin',
col = 'Score', col_wrap=2, kind='kde',aspect =1.5)
<seaborn.axisgrid.FacetGrid at 0x7f9468e1ae80>
sns.displot(data= high_prod_coffee_df_melted,
x='value',hue='Color',
col = 'Score', col_wrap=2, kind='kde',aspect =1.5)
<seaborn.axisgrid.FacetGrid at 0x7f9429ef4a60>
sns.displot(data= high_prod_coffee_df_melted,
x='value',hue='Country.of.Origin',
col = 'Score', row='Color', kind='kde',aspect =1.5)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/distributions.py:316: UserWarning: Dataset has 0 variance; skipping density estimate. Pass `warn_singular=False` to disable this warning.
warnings.warn(msg, UserWarning)
/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/site-packages/seaborn/distributions.py:316: UserWarning: Dataset has 0 variance; skipping density estimate. Pass `warn_singular=False` to disable this warning.
warnings.warn(msg, UserWarning)
<seaborn.axisgrid.FacetGrid at 0x7f942ca1e5e0>
10.1. Unpacking Jsons#
rhodyprog4ds_gh_events_url = 'https://api.github.com/orgs/rhodyprog4ds/events'
course_gh_df = pd.read_json(rhodyprog4ds_gh_events_url)
course_gh_df.head()
id | type | actor | repo | payload | public | created_at | org | |
---|---|---|---|---|---|---|---|---|
0 | 23017405497 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 400283911, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 10511497820, 'size': 1, 'distinct_... | True | 2022-07-21 23:02:34+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
1 | 23017380330 | PushEvent | {'id': 41898282, 'login': 'github-actions[bot]... | {'id': 400283911, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 10511484308, 'size': 1, 'distinct_... | True | 2022-07-21 23:00:14+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
2 | 23017297788 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 400283911, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 10511442158, 'size': 1, 'distinct_... | True | 2022-07-21 22:52:08+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
3 | 23017256816 | PushEvent | {'id': 41898282, 'login': 'github-actions[bot]... | {'id': 400283911, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 10511421441, 'size': 1, 'distinct_... | True | 2022-07-21 22:48:25+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
4 | 23017147996 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 400283911, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 10511370784, 'size': 1, 'distinct_... | True | 2022-07-21 22:40:28+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
We want to transform each one of those from a dictionary like thing into a row in a data frame.
type(course_gh_df['actor'])
pandas.core.series.Series
Recall, that base python types can be used as function, to cast an object from type to another.
5
5
type(5)
int
str(5)
'5'
To unpack one column we can cast each element of the column to a series and then stack them back together.
First, let’s look at one row of one column
course_gh_df['actor'][0]
{'id': 10656079,
'login': 'brownsarahm',
'display_login': 'brownsarahm',
'gravatar_id': '',
'url': 'https://api.github.com/users/brownsarahm',
'avatar_url': 'https://avatars.githubusercontent.com/u/10656079?'}
Now let’s cast it to a Series
pd.Series(course_gh_df['actor'][0])
id 10656079
login brownsarahm
display_login brownsarahm
gravatar_id
url https://api.github.com/users/brownsarahm
avatar_url https://avatars.githubusercontent.com/u/10656079?
dtype: object
What we want is to do this over and over and stack them.
The apply
method does this for us, in one compact step.
course_gh_df['actor'].apply(pd.Series)
id | login | display_login | gravatar_id | url | avatar_url | |
---|---|---|---|---|---|---|
0 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | |
1 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | |
2 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | |
3 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | |
4 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | |
5 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | |
6 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | |
7 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | |
8 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | |
9 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | |
10 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | |
11 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | |
12 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | |
13 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | |
14 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? |
How can we do this for all of the columns and put them back together after?
First, let’s make a list of the columns we need to convert.
js_cols = ['actor','repo','payload','org']
When we use .apply(pd.Series)
we get a a DataFrame.
type(course_gh_df['actor'].apply(pd.Series))
pandas.core.frame.DataFrame
pd.concat
takes a list of DataFrames and puts the together in one DataFrame.
to illustrate, it’s nice to make small dataFrames.
df1 = pd.DataFrame([[1,2,3],[3,4,7]], columns = ['A','B','t'])
df2 = pd.DataFrame([[10,20,30],[30,40,70]], columns = ['AA','BB','t'])
df1
A | B | t | |
---|---|---|---|
0 | 1 | 2 | 3 |
1 | 3 | 4 | 7 |
df2
AA | BB | t | |
---|---|---|---|
0 | 10 | 20 | 30 |
1 | 30 | 40 | 70 |
If we use concat with the default settings, it stacks them vertically and aligns any columns that have the same name.
pd.concat([df1,df2])
A | B | t | AA | BB | |
---|---|---|---|---|---|
0 | 1.0 | 2.0 | 3 | NaN | NaN |
1 | 3.0 | 4.0 | 7 | NaN | NaN |
0 | NaN | NaN | 30 | 10.0 | 20.0 |
1 | NaN | NaN | 70 | 30.0 | 40.0 |
So, since the original DataFrames were both 2 rows with 3 columns each, with one column name appearing in both, we end up with a new DataFrame with shape (4,5) and it fills with NaN
in the top right and the bottom left.
pd.concat([df1,df2]).shape
(4, 5)
We can use the axis
parameter to tell it how to combine them. The default is axis=0
, but axis=1
will combine along rows.
pd.concat([df1,df2], axis =1)
A | B | t | AA | BB | t | |
---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 10 | 20 | 30 |
1 | 3 | 4 | 7 | 30 | 40 | 70 |
So now we get no NaN values, because both DataFrames have the same number of rows and the same index.
df1.index == df2.index
array([ True, True])
and we have a total of 6 columns and 2 rows.
pd.concat([df1,df2], axis =1).shape
(2, 6)
Back to our gh data, we want to make a list of DataFrames where each DataFrame corresponds to one of the columns in the original DataFrame, but unpacked and then stack them horizontally (axis=1
) because each DataFrame in the list is based on the same original DataFrame, they again have the same index.
pd.concat([course_gh_df[cur_col].apply(pd.Series) for cur_col in js_cols],
axis=1)
id | login | display_login | gravatar_id | url | avatar_url | id | name | url | push_id | ... | commits | ref_type | master_branch | description | pusher_type | id | login | gravatar_id | url | avatar_url | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051150e+10 | ... | [{'sha': '433500586ca5565bef504ff4c27f320bf466... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
1 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051148e+10 | ... | [{'sha': '90aeafc7245cb0fe0bd2622014d176527612... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
2 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051144e+10 | ... | [{'sha': 'ddd8f010d5da187edb15d8bbad1aebe20360... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
3 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051142e+10 | ... | [{'sha': '201d9ae167e019db90cd647ca8b25ec3b9ff... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
4 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051137e+10 | ... | [{'sha': '7dd816ec4eca8c85bb6e0c3fb296b1559230... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
5 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051130e+10 | ... | [{'sha': '8ba204abebd86331f83a40f4c7b2962f650d... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
6 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051100e+10 | ... | [{'sha': '2f4229efc77bf4ed85286c02205523e6c4f5... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
7 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | 337917892 | rhodyprog4ds/rhodyprog4ds.github.io | https://api.github.com/repos/rhodyprog4ds/rhod... | 1.029240e+10 | ... | [{'sha': '89f1e98e001a74542a4be7bf34222cb422b7... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
8 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 337917892 | rhodyprog4ds/rhodyprog4ds.github.io | https://api.github.com/repos/rhodyprog4ds/rhod... | 1.029240e+10 | ... | [{'sha': '6af6f8dd48787d9ef93186ad9155b471f38a... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
9 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | 501809605 | rhodyprog4ds/jupyterlite | https://api.github.com/repos/rhodyprog4ds/jupy... | NaN | ... | NaN | branch | main | NaN | user | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
10 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 501809605 | rhodyprog4ds/jupyterlite | https://api.github.com/repos/rhodyprog4ds/jupy... | NaN | ... | NaN | branch | main | NaN | user | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
11 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 501809605 | rhodyprog4ds/jupyterlite | https://api.github.com/repos/rhodyprog4ds/jupy... | NaN | ... | NaN | repository | main | NaN | user | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
12 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 404401191 | rhodyprog4ds/.github | https://api.github.com/repos/rhodyprog4ds/.github | 9.874933e+09 | ... | [{'sha': '2135ed9bc8bfe199e5d41313054b96646eca... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
13 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 404401191 | rhodyprog4ds/.github | https://api.github.com/repos/rhodyprog4ds/.github | 9.874923e+09 | ... | [{'sha': '095cdd63ca1bcecfb3744b332130509ccc51... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
14 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 404401191 | rhodyprog4ds/.github | https://api.github.com/repos/rhodyprog4ds/.github | 9.874895e+09 | ... | [{'sha': '169f607d53821cc7dfda9a89be3b2d31bd2c... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? |
15 rows × 25 columns
Try it Yourself
examine the list of DataFrames to see what structure they share and do not share
In this case we get the same 30 rows, beacuse that’s what the API gave us and turned our 4 columns from js_cols
into 26 columns.
pd.concat([course_gh_df[cur_col].apply(pd.Series) for cur_col in js_cols],
axis=1).shape
(15, 25)
If we had used the default, we’d end up with 120 rows (30*4) and we have only 19 columns, because there are subfield names that are shared across the original columns. (eg most have an id
)
pd.concat([course_gh_df[cur_col].apply(pd.Series) for cur_col in js_cols],
axis=0).shape
(60, 18)
we might want to rename the new columns so that they have the original column
name prepended to the new name. This will help us distinguish between the different id
columns
pandas has a rename
method for this.
and this is another job for lambdas.
pd.concat([course_gh_df[cur_col].apply(pd.Series).rename(columns = lambda c: cur_col + '_' +c)
for cur_col in js_cols],
axis=1)
actor_id | actor_login | actor_display_login | actor_gravatar_id | actor_url | actor_avatar_url | repo_id | repo_name | repo_url | payload_push_id | ... | payload_commits | payload_ref_type | payload_master_branch | payload_description | payload_pusher_type | org_id | org_login | org_gravatar_id | org_url | org_avatar_url | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051150e+10 | ... | [{'sha': '433500586ca5565bef504ff4c27f320bf466... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
1 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051148e+10 | ... | [{'sha': '90aeafc7245cb0fe0bd2622014d176527612... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
2 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051144e+10 | ... | [{'sha': 'ddd8f010d5da187edb15d8bbad1aebe20360... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
3 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051142e+10 | ... | [{'sha': '201d9ae167e019db90cd647ca8b25ec3b9ff... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
4 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051137e+10 | ... | [{'sha': '7dd816ec4eca8c85bb6e0c3fb296b1559230... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
5 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051130e+10 | ... | [{'sha': '8ba204abebd86331f83a40f4c7b2962f650d... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
6 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051100e+10 | ... | [{'sha': '2f4229efc77bf4ed85286c02205523e6c4f5... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
7 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | 337917892 | rhodyprog4ds/rhodyprog4ds.github.io | https://api.github.com/repos/rhodyprog4ds/rhod... | 1.029240e+10 | ... | [{'sha': '89f1e98e001a74542a4be7bf34222cb422b7... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
8 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 337917892 | rhodyprog4ds/rhodyprog4ds.github.io | https://api.github.com/repos/rhodyprog4ds/rhod... | 1.029240e+10 | ... | [{'sha': '6af6f8dd48787d9ef93186ad9155b471f38a... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
9 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | 501809605 | rhodyprog4ds/jupyterlite | https://api.github.com/repos/rhodyprog4ds/jupy... | NaN | ... | NaN | branch | main | NaN | user | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
10 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 501809605 | rhodyprog4ds/jupyterlite | https://api.github.com/repos/rhodyprog4ds/jupy... | NaN | ... | NaN | branch | main | NaN | user | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
11 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 501809605 | rhodyprog4ds/jupyterlite | https://api.github.com/repos/rhodyprog4ds/jupy... | NaN | ... | NaN | repository | main | NaN | user | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
12 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 404401191 | rhodyprog4ds/.github | https://api.github.com/repos/rhodyprog4ds/.github | 9.874933e+09 | ... | [{'sha': '2135ed9bc8bfe199e5d41313054b96646eca... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
13 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 404401191 | rhodyprog4ds/.github | https://api.github.com/repos/rhodyprog4ds/.github | 9.874923e+09 | ... | [{'sha': '095cdd63ca1bcecfb3744b332130509ccc51... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
14 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 404401191 | rhodyprog4ds/.github | https://api.github.com/repos/rhodyprog4ds/.github | 9.874895e+09 | ... | [{'sha': '169f607d53821cc7dfda9a89be3b2d31bd2c... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? |
15 rows × 25 columns
the rename
method’s column
parameter can take a lambda defined inline, which is helpful, because we want that function to take one parameter (the current columnt name) and do the same thing to all of the columns within a single DataFrame, but to prepend a different thing for each DataFrame
pd.concat([course_gh_df[cur_col].apply(pd.Series).rename(columns = lambda c: cur_col + '_' +c)
for cur_col in js_cols],
axis=1)
actor_id | actor_login | actor_display_login | actor_gravatar_id | actor_url | actor_avatar_url | repo_id | repo_name | repo_url | payload_push_id | ... | payload_commits | payload_ref_type | payload_master_branch | payload_description | payload_pusher_type | org_id | org_login | org_gravatar_id | org_url | org_avatar_url | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051150e+10 | ... | [{'sha': '433500586ca5565bef504ff4c27f320bf466... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
1 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051148e+10 | ... | [{'sha': '90aeafc7245cb0fe0bd2622014d176527612... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
2 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051144e+10 | ... | [{'sha': 'ddd8f010d5da187edb15d8bbad1aebe20360... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
3 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051142e+10 | ... | [{'sha': '201d9ae167e019db90cd647ca8b25ec3b9ff... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
4 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051137e+10 | ... | [{'sha': '7dd816ec4eca8c85bb6e0c3fb296b1559230... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
5 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051130e+10 | ... | [{'sha': '8ba204abebd86331f83a40f4c7b2962f650d... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
6 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 400283911 | rhodyprog4ds/BrownFall21 | https://api.github.com/repos/rhodyprog4ds/Brow... | 1.051100e+10 | ... | [{'sha': '2f4229efc77bf4ed85286c02205523e6c4f5... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
7 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | 337917892 | rhodyprog4ds/rhodyprog4ds.github.io | https://api.github.com/repos/rhodyprog4ds/rhod... | 1.029240e+10 | ... | [{'sha': '89f1e98e001a74542a4be7bf34222cb422b7... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
8 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 337917892 | rhodyprog4ds/rhodyprog4ds.github.io | https://api.github.com/repos/rhodyprog4ds/rhod... | 1.029240e+10 | ... | [{'sha': '6af6f8dd48787d9ef93186ad9155b471f38a... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
9 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | 501809605 | rhodyprog4ds/jupyterlite | https://api.github.com/repos/rhodyprog4ds/jupy... | NaN | ... | NaN | branch | main | NaN | user | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
10 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 501809605 | rhodyprog4ds/jupyterlite | https://api.github.com/repos/rhodyprog4ds/jupy... | NaN | ... | NaN | branch | main | NaN | user | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
11 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 501809605 | rhodyprog4ds/jupyterlite | https://api.github.com/repos/rhodyprog4ds/jupy... | NaN | ... | NaN | repository | main | NaN | user | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
12 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 404401191 | rhodyprog4ds/.github | https://api.github.com/repos/rhodyprog4ds/.github | 9.874933e+09 | ... | [{'sha': '2135ed9bc8bfe199e5d41313054b96646eca... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
13 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 404401191 | rhodyprog4ds/.github | https://api.github.com/repos/rhodyprog4ds/.github | 9.874923e+09 | ... | [{'sha': '095cdd63ca1bcecfb3744b332130509ccc51... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
14 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | 404401191 | rhodyprog4ds/.github | https://api.github.com/repos/rhodyprog4ds/.github | 9.874895e+09 | ... | [{'sha': '169f607d53821cc7dfda9a89be3b2d31bd2c... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? |
15 rows × 25 columns
So now, we have the unpacked columns with good column names, but we lost the columns that were originally good.
How can we append the new columns to the old ones? First we can make a DataFrame that’s just the columns not on the list that we’re goign to expand.
course_gf_df_good = course_gh_df[[col for col in
course_gh_df.columns if not(col in js_cols)]]
course_gf_df_good
id | type | public | created_at | |
---|---|---|---|---|
0 | 23017405497 | PushEvent | True | 2022-07-21 23:02:34+00:00 |
1 | 23017380330 | PushEvent | True | 2022-07-21 23:00:14+00:00 |
2 | 23017297788 | PushEvent | True | 2022-07-21 22:52:08+00:00 |
3 | 23017256816 | PushEvent | True | 2022-07-21 22:48:25+00:00 |
4 | 23017147996 | PushEvent | True | 2022-07-21 22:40:28+00:00 |
5 | 23017000728 | PushEvent | True | 2022-07-21 22:28:48+00:00 |
6 | 23016409837 | PushEvent | True | 2022-07-21 21:41:28+00:00 |
7 | 22594628501 | PushEvent | True | 2022-06-29 00:36:34+00:00 |
8 | 22594620942 | PushEvent | True | 2022-06-29 00:35:39+00:00 |
9 | 22258790417 | CreateEvent | True | 2022-06-09 21:15:41+00:00 |
10 | 22258731281 | CreateEvent | True | 2022-06-09 21:11:34+00:00 |
11 | 22258730912 | CreateEvent | True | 2022-06-09 21:11:33+00:00 |
12 | 21774885552 | PushEvent | True | 2022-05-13 12:54:30+00:00 |
13 | 21774865161 | PushEvent | True | 2022-05-13 12:53:22+00:00 |
14 | 21774810879 | PushEvent | True | 2022-05-13 12:50:21+00:00 |
Then we can prepend that to the list that we pass to concat
. We have to put it in a list first, then use + to do that.
pd.concat([course_gf_df_good]+[course_gh_df[col].apply(pd.Series,).rename(
columns= lambda i_col: col + '_' + i_col )
for col in js_cols],axis=1)
id | type | public | created_at | actor_id | actor_login | actor_display_login | actor_gravatar_id | actor_url | actor_avatar_url | ... | payload_commits | payload_ref_type | payload_master_branch | payload_description | payload_pusher_type | org_id | org_login | org_gravatar_id | org_url | org_avatar_url | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 23017405497 | PushEvent | True | 2022-07-21 23:02:34+00:00 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | ... | [{'sha': '433500586ca5565bef504ff4c27f320bf466... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
1 | 23017380330 | PushEvent | True | 2022-07-21 23:00:14+00:00 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | ... | [{'sha': '90aeafc7245cb0fe0bd2622014d176527612... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
2 | 23017297788 | PushEvent | True | 2022-07-21 22:52:08+00:00 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | ... | [{'sha': 'ddd8f010d5da187edb15d8bbad1aebe20360... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
3 | 23017256816 | PushEvent | True | 2022-07-21 22:48:25+00:00 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | ... | [{'sha': '201d9ae167e019db90cd647ca8b25ec3b9ff... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
4 | 23017147996 | PushEvent | True | 2022-07-21 22:40:28+00:00 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | ... | [{'sha': '7dd816ec4eca8c85bb6e0c3fb296b1559230... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
5 | 23017000728 | PushEvent | True | 2022-07-21 22:28:48+00:00 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | ... | [{'sha': '8ba204abebd86331f83a40f4c7b2962f650d... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
6 | 23016409837 | PushEvent | True | 2022-07-21 21:41:28+00:00 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | ... | [{'sha': '2f4229efc77bf4ed85286c02205523e6c4f5... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
7 | 22594628501 | PushEvent | True | 2022-06-29 00:36:34+00:00 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | ... | [{'sha': '89f1e98e001a74542a4be7bf34222cb422b7... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
8 | 22594620942 | PushEvent | True | 2022-06-29 00:35:39+00:00 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | ... | [{'sha': '6af6f8dd48787d9ef93186ad9155b471f38a... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
9 | 22258790417 | CreateEvent | True | 2022-06-09 21:15:41+00:00 | 41898282 | github-actions[bot] | github-actions | https://api.github.com/users/github-actions[bot] | https://avatars.githubusercontent.com/u/41898282? | ... | NaN | branch | main | NaN | user | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
10 | 22258731281 | CreateEvent | True | 2022-06-09 21:11:34+00:00 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | ... | NaN | branch | main | NaN | user | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
11 | 22258730912 | CreateEvent | True | 2022-06-09 21:11:33+00:00 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | ... | NaN | repository | main | NaN | user | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
12 | 21774885552 | PushEvent | True | 2022-05-13 12:54:30+00:00 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | ... | [{'sha': '2135ed9bc8bfe199e5d41313054b96646eca... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
13 | 21774865161 | PushEvent | True | 2022-05-13 12:53:22+00:00 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | ... | [{'sha': '095cdd63ca1bcecfb3744b332130509ccc51... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? | ||
14 | 21774810879 | PushEvent | True | 2022-05-13 12:50:21+00:00 | 10656079 | brownsarahm | brownsarahm | https://api.github.com/users/brownsarahm | https://avatars.githubusercontent.com/u/10656079? | ... | [{'sha': '169f607d53821cc7dfda9a89be3b2d31bd2c... | NaN | NaN | NaN | NaN | 69595187 | rhodyprog4ds | https://api.github.com/orgs/rhodyprog4ds | https://avatars.githubusercontent.com/u/69595187? |
15 rows × 29 columns
To see how the list math works
['a'] + ['b','c','d']
['a', 'b', 'c', 'd']
results in one list
but without the []
we get a type error
'a' + ['b','c','d']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [39], in <cell line: 1>()
----> 1 'a' + ['b','c','d']
TypeError: can only concatenate str (not "list") to str
List operations return None
and mutate the list in place so
orig_list = ['a']
new_items = ['b','c','d']
orig_list.extend(new_items)
outputs nothing because None
was returned and it changes the original variable.
orig_list
['a', 'b', 'c', 'd']
type(orig_list.extend(new_items))
NoneType
is none.
10.2. Questions After Class#
All clarifying questions today
10.2.1. How does Axis work?#
the notes above are expanded a lot, which should help. You can see more examples in the Tidy Data Explanation and on the Cheat Sheet.
The axis
parameter is a parameter in a lot of pandas functions, you can see it used in most of the statistics we used last week as well, because though column operations are the default, we can do row-wise as well.
For more on concatenation, see the Pandas user guide or API docs sections on it.
10.2.2. How does melt work?#
the notes are expanded a lot. Also see Tidying data.
For the concept, you can also see the original Tidy Data paper.
For the pandas method, see its docs.
10.2.3. What about the NaNs that are still left?#
those are Nan in the data because the events are different types and different types of events have different information available about them.
If we groupby
event type and then look at, for example, the payload columns. We see that the NaN
s are explained by that. (remember, count tells how many are not NaN
)