{ "cells": [ { "cell_type": "markdown", "id": "8dba16a2", "metadata": {}, "source": [ "# Building Datasets from Multiple Sources\n", "\n", "focus this week is on how to programmatically combine sources of data\n", "\n", "We will start by looking at combining multiple tabular data formats and see how to get data from other sources. \n", "\n", "\n", "\n", "## Self- assessment, what's your plan\n", "\n", "Take a few minutes to think about the following questions and make a few notes\n", "for yourself whereever you need them to be (a planner, calendar, etc).\n", "Share one takeaway or question you have below when you're done.\n", "\n", "1. What achievements have you earned?\n", "1. Does BrightSpace seem accurate to what you've done?\n", "1. If not, e-mail brownsarahm with [CSC310] or [DSP310] in the subject and specific details about what you think is missing and why\n", "1. Are you on track to earn the grade you want in this class?\n", "1. If not, what will you need to do (respond more in class, submit more\n", " assignments, use your portfolio to catch up) to get back on track?\n", "1. If you are on track and you want to earn above a B, take a minute to think\n", "about your portfolio. (tip: post an idea as an issue to get early feedback and help shaping your idea)\n", "\n", "## Logistics\n", "\n", "Check your earned achievements. See the [instructions](gradecheck) that are now saved for future reference on the left side.\n", "\n", "\n", "## Merges" ] }, { "cell_type": "code", "execution_count": 1, "id": "14a0c74d", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "id": "331ab0f0", "metadata": {}, "source": [ "I created a folder with [datasets we use in class](https://github.com/rhodyprog4ds/rhodyds/tree/main/data)" ] }, { "cell_type": "code", "execution_count": 2, "id": "10d33a0b", "metadata": {}, "outputs": [], "source": [ "course_data_url = 'https://raw.githubusercontent.com/rhodyprog4ds/rhodyds/main/data/'" ] }, { "cell_type": "markdown", "id": "7a31dfcc", "metadata": {}, "source": [ " We can load in two data sets of player information." ] }, { "cell_type": "code", "execution_count": 3, "id": "ba2efd5b", "metadata": {}, "outputs": [], "source": [ "df_18 = pd.read_csv(course_data_url+ '2018-players.csv')\n", "df_19 = pd.read_csv(course_data_url+ '2019-players.csv')" ] }, { "cell_type": "markdown", "id": "810d1b21", "metadata": {}, "source": [ "and take a peek at each" ] }, { "cell_type": "code", "execution_count": 4, "id": "cab99358", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TEAM_IDPLAYER_IDSEASON
016106127612026952018
1161061276116277832018
\n", "
" ], "text/plain": [ " TEAM_ID PLAYER_ID SEASON\n", "0 1610612761 202695 2018\n", "1 1610612761 1627783 2018" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_18.head(2)" ] }, { "cell_type": "markdown", "id": "f9ac8866", "metadata": {}, "source": [ "```{important}\n", "Remember `columns` is an attribute, so it does not need `()`\n", "```" ] }, { "cell_type": "code", "execution_count": 5, "id": "4e7b14fe", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['PLAYER_NAME', 'TEAM_ID', 'PLAYER_ID', 'SEASON'], dtype='object')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_19.columns" ] }, { "cell_type": "markdown", "id": "2e7e71c4", "metadata": {}, "source": [ "Let's make note of the shape of each" ] }, { "cell_type": "code", "execution_count": 6, "id": "e3a0d995", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((748, 3), (626, 4))" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_18.shape, df_19.shape" ] }, { "cell_type": "markdown", "id": "4a737616", "metadata": {}, "source": [ "## What if we want to analyze them together?\n", "\n", "We can combine them vertically:" ] }, { "cell_type": "code", "execution_count": 7, "id": "55af8abf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1374, 4)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.concat([df_18,df_19]).shape" ] }, { "cell_type": "markdown", "id": "d2b26b67", "metadata": {}, "source": [ "Note that this has the maximum number of columns (because both had some overlapping\n", " columns) and the total number of rows.\n", "\n", "\n", "## How can we find which players changed teams?\n", "\n", "To do this we want to have one player column and a column with each year's team.\n", "\n", "We can use a merge to do that.\n", "\n", "````{margin}\n", "```{note}\n", "the `head` here is only to help us see more at a time\n", "```\n", "````" ] }, { "cell_type": "code", "execution_count": 8, "id": "92612200", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TEAM_IDPLAYER_IDSEASONPLAYER_NAME
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [TEAM_ID, PLAYER_ID, SEASON, PLAYER_NAME]\n", "Index: []" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(df_18,df_19,).head(2)" ] }, { "cell_type": "markdown", "id": "af9383d1", "metadata": {}, "source": [ "if we merge them without any parameters, it tries to merge on all shared columns. We want to merge them using the `PLAYER_ID` column though, we would say hat we are \"merging on player ID\" and we use the `on` parameter to do it" ] }, { "cell_type": "code", "execution_count": 9, "id": "0ba6118e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TEAM_ID_xPLAYER_IDSEASON_xPLAYER_NAMETEAM_ID_ySEASON_y
016106127612026952018Kawhi Leonard16106127462019
1161061276116277832018Pascal Siakam16106127612019
\n", "
" ], "text/plain": [ " TEAM_ID_x PLAYER_ID SEASON_x PLAYER_NAME TEAM_ID_y SEASON_y\n", "0 1610612761 202695 2018 Kawhi Leonard 1610612746 2019\n", "1 1610612761 1627783 2018 Pascal Siakam 1610612761 2019" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(df_18,df_19, on = 'PLAYER_ID').head(2)" ] }, { "cell_type": "markdown", "id": "03d1c7d3", "metadata": {}, "source": [ "Since there are other columns that appear in both DataFrames, they get a suffis, which by default is `x` or `y`, we can specify them though." ] }, { "cell_type": "code", "execution_count": 10, "id": "fdcf9bff", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TEAM_ID_2018PLAYER_IDSEASON_2018PLAYER_NAMETEAM_ID_2019SEASON_2019
016106127612026952018Kawhi Leonard16106127462019
1161061276116277832018Pascal Siakam16106127612019
\n", "
" ], "text/plain": [ " TEAM_ID_2018 PLAYER_ID SEASON_2018 PLAYER_NAME TEAM_ID_2019 \\\n", "0 1610612761 202695 2018 Kawhi Leonard 1610612746 \n", "1 1610612761 1627783 2018 Pascal Siakam 1610612761 \n", "\n", " SEASON_2019 \n", "0 2019 \n", "1 2019 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(df_18,df_19, on = 'PLAYER_ID',\n", " suffixes=('_2018','_2019')).head(2)" ] }, { "cell_type": "markdown", "id": "bae3759c", "metadata": {}, "source": [ "We also told it what to append to any column names that match, but are not the same across both datasets." ] }, { "cell_type": "code", "execution_count": 11, "id": "25cc8f5e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TEAM_IDPLAYER_IDSEASON
016106127612026952018
\n", "
" ], "text/plain": [ " TEAM_ID PLAYER_ID SEASON\n", "0 1610612761 202695 2018" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_18.head(1)" ] }, { "cell_type": "code", "execution_count": 12, "id": "a439cea8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYER_NAMETEAM_IDPLAYER_IDSEASON
0Royce O'Neale161061276216262202019
\n", "
" ], "text/plain": [ " PLAYER_NAME TEAM_ID PLAYER_ID SEASON\n", "0 Royce O'Neale 1610612762 1626220 2019" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_19.head(1)" ] }, { "cell_type": "code", "execution_count": 13, "id": "f35d1d51", "metadata": {}, "outputs": [], "source": [ "df1819 = pd.merge(df_18,df_19, on = 'PLAYER_ID',\n", " suffixes=('_2018','_2019'))" ] }, { "cell_type": "code", "execution_count": 14, "id": "4a7dc765", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(538, 6)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1819.shape" ] }, { "cell_type": "markdown", "id": "223ef056", "metadata": {}, "source": [ "By default, this uses an *inner* merge, so we get the players that are in both datasets only." ] }, { "cell_type": "code", "execution_count": 15, "id": "fc013868", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TEAM_ID_2018PLAYER_IDSEASON_2018PLAYER_NAMETEAM_ID_2019SEASON_2019
016106127612026952018Kawhi Leonard16106127462019
1161061276116277832018Pascal Siakam16106127612019
216106127612011882018Marc Gasol16106127612019
316106127632011882018Marc Gasol16106127612019
416106127612019802018Danny Green16106127472019
\n", "
" ], "text/plain": [ " TEAM_ID_2018 PLAYER_ID SEASON_2018 PLAYER_NAME TEAM_ID_2019 \\\n", "0 1610612761 202695 2018 Kawhi Leonard 1610612746 \n", "1 1610612761 1627783 2018 Pascal Siakam 1610612761 \n", "2 1610612761 201188 2018 Marc Gasol 1610612761 \n", "3 1610612763 201188 2018 Marc Gasol 1610612761 \n", "4 1610612761 201980 2018 Danny Green 1610612747 \n", "\n", " SEASON_2019 \n", "0 2019 \n", "1 2019 \n", "2 2019 \n", "3 2019 \n", "4 2019 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1819.head()" ] }, { "cell_type": "markdown", "id": "353f0257", "metadata": {}, "source": [ "Some players still appear twice, because they were in one of the datsets twice, this\n", "happens when a player plays for two team in one season.\n", "\n", "## Which players played in 2018, but not 2019?\n", "\n", "We have different types of merges, inner is both, out is either. Left and right keep all the rows of one dataFrame. We can use left with `df_18` as the left DataFrame to see which players played only in 18." ] }, { "cell_type": "code", "execution_count": 16, "id": "f2d7c973", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TEAM_ID_2018PLAYER_IDSEASON_2018PLAYER_NAMETEAM_ID_2019SEASON_2019
749161061275216292462018NaNNaNNaN
750161061274816291592018NaNNaNNaN
751161061276216291632018NaNNaNNaN
752161061274316291502018NaNNaNNaN
753161061273816291672018NaNNaNNaN
\n", "
" ], "text/plain": [ " TEAM_ID_2018 PLAYER_ID SEASON_2018 PLAYER_NAME TEAM_ID_2019 \\\n", "749 1610612752 1629246 2018 NaN NaN \n", "750 1610612748 1629159 2018 NaN NaN \n", "751 1610612762 1629163 2018 NaN NaN \n", "752 1610612743 1629150 2018 NaN NaN \n", "753 1610612738 1629167 2018 NaN NaN \n", "\n", " SEASON_2019 \n", "749 NaN \n", "750 NaN \n", "751 NaN \n", "752 NaN \n", "753 NaN " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(df_18,df_19,on='PLAYER_ID', how='left',suffixes=('_2018','_2019')).tail()" ] }, { "cell_type": "markdown", "id": "f7bec7a6", "metadata": {}, "source": [ "To pick out those rows:" ] }, { "cell_type": "code", "execution_count": 17, "id": "6751eaa7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TEAM_ID_2018PLAYER_IDSEASON_2018PLAYER_NAMETEAM_ID_2019SEASON_2019
916106127612023912018NaNNaNNaN
1116106127612019752018NaNNaNNaN
1816106127441011062018NaNNaNNaN
23161061274427332018NaNNaNNaN
2416106127442019732018NaNNaNNaN
.....................
749161061275216292462018NaNNaNNaN
750161061274816291592018NaNNaNNaN
751161061276216291632018NaNNaNNaN
752161061274316291502018NaNNaNNaN
753161061273816291672018NaNNaNNaN
\n", "

216 rows × 6 columns

\n", "
" ], "text/plain": [ " TEAM_ID_2018 PLAYER_ID SEASON_2018 PLAYER_NAME TEAM_ID_2019 \\\n", "9 1610612761 202391 2018 NaN NaN \n", "11 1610612761 201975 2018 NaN NaN \n", "18 1610612744 101106 2018 NaN NaN \n", "23 1610612744 2733 2018 NaN NaN \n", "24 1610612744 201973 2018 NaN NaN \n", ".. ... ... ... ... ... \n", "749 1610612752 1629246 2018 NaN NaN \n", "750 1610612748 1629159 2018 NaN NaN \n", "751 1610612762 1629163 2018 NaN NaN \n", "752 1610612743 1629150 2018 NaN NaN \n", "753 1610612738 1629167 2018 NaN NaN \n", "\n", " SEASON_2019 \n", "9 NaN \n", "11 NaN \n", "18 NaN \n", "23 NaN \n", "24 NaN \n", ".. ... \n", "749 NaN \n", "750 NaN \n", "751 NaN \n", "752 NaN \n", "753 NaN \n", "\n", "[216 rows x 6 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_18_only = pd.merge(df_18,df_19,on='PLAYER_ID', how='left', suffixes=('_2018','_2019'))\n", "df_18_only[df_18_only['SEASON_2019'].isna()]" ] }, { "cell_type": "code", "execution_count": 18, "id": "f4994850", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TEAM_ID_2018PLAYER_IDSEASON_2018PLAYER_NAMETEAM_ID_2019SEASON_2019
916106127612023912018NaNNaNNaN
1116106127612019752018NaNNaNNaN
1816106127441011062018NaNNaNNaN
23161061274427332018NaNNaNNaN
2416106127442019732018NaNNaNNaN
\n", "
" ], "text/plain": [ " TEAM_ID_2018 PLAYER_ID SEASON_2018 PLAYER_NAME TEAM_ID_2019 \\\n", "9 1610612761 202391 2018 NaN NaN \n", "11 1610612761 201975 2018 NaN NaN \n", "18 1610612744 101106 2018 NaN NaN \n", "23 1610612744 2733 2018 NaN NaN \n", "24 1610612744 201973 2018 NaN NaN \n", "\n", " SEASON_2019 \n", "9 NaN \n", "11 NaN \n", "18 NaN \n", "23 NaN \n", "24 NaN " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_18_left = pd.merge(df_18,df_19.drop(columns=['SEASON'])\n", " ,on='PLAYER_ID', how='left', suffixes=('_2018','_2019'))\n", "df_18_only = df_18_only[df_18_only['TEAM_ID_2019'].isna()]\n", "df_18_only.head()" ] }, { "cell_type": "code", "execution_count": 19, "id": "9aba64e4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "178" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n_18_only, _ = df_18_only.drop_duplicates(subset=['PLAYER_ID']).shape\n", "n_18_only" ] }, { "cell_type": "markdown", "id": "9aa66a32", "metadata": {}, "source": [ "## Which players played for the same team both seasons?" ] }, { "cell_type": "code", "execution_count": 20, "id": "ee7c2c48", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TEAM_IDPLAYER_IDSEASON_xPLAYER_NAMESEASON_y
0161061276116277832018Pascal Siakam2019
116106127612011882018Marc Gasol2019
216106127612007682018Kyle Lowry2019
3161061276116278322018Fred VanVleet2019
416106127612015862018Serge Ibaka2019
\n", "
" ], "text/plain": [ " TEAM_ID PLAYER_ID SEASON_x PLAYER_NAME SEASON_y\n", "0 1610612761 1627783 2018 Pascal Siakam 2019\n", "1 1610612761 201188 2018 Marc Gasol 2019\n", "2 1610612761 200768 2018 Kyle Lowry 2019\n", "3 1610612761 1627832 2018 Fred VanVleet 2019\n", "4 1610612761 201586 2018 Serge Ibaka 2019" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_same_team = pd.merge(df_18,df_19, on = ['PLAYER_ID','TEAM_ID'],)\n", "df_same_team.head()" ] }, { "cell_type": "markdown", "id": "c5d0a55b", "metadata": {}, "source": [ "In this case, the suffix only applies to season, but they're not telling us much, so we can clean it up using `drop` before we merge." ] }, { "cell_type": "code", "execution_count": 21, "id": "970190c9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TEAM_IDPLAYER_IDPLAYER_NAME
016106127611627783Pascal Siakam
11610612761201188Marc Gasol
21610612761200768Kyle Lowry
316106127611627832Fred VanVleet
41610612761201586Serge Ibaka
\n", "
" ], "text/plain": [ " TEAM_ID PLAYER_ID PLAYER_NAME\n", "0 1610612761 1627783 Pascal Siakam\n", "1 1610612761 201188 Marc Gasol\n", "2 1610612761 200768 Kyle Lowry\n", "3 1610612761 1627832 Fred VanVleet\n", "4 1610612761 201586 Serge Ibaka" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_18_only_clean = pd.merge(df_18.drop(columns='SEASON'),df_19.drop(columns='SEASON'), on = ['PLAYER_ID','TEAM_ID'],)\n", "df_18_only_clean.head()" ] }, { "cell_type": "code", "execution_count": 22, "id": "8f2802f6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(263, 3)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_18_only_clean.shape" ] }, { "cell_type": "code", "execution_count": 23, "id": "f039703e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(263, 3)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_18_only_clean.drop_duplicates(subset=['PLAYER_ID']).shape" ] }, { "cell_type": "markdown", "id": "37a7475e", "metadata": {}, "source": [ "We do not need to drop the duplicates in this case becaue we merged on two columns and there were no actual duplicates in the original dataset." ] }, { "cell_type": "markdown", "id": "3505e302", "metadata": {}, "source": [ "## Visualizing merges" ] }, { "cell_type": "code", "execution_count": 24, "id": "676afd96", "metadata": {}, "outputs": [], "source": [ "left = pd.DataFrame(\n", " {\n", " \"key\": [\"K0\", \"K1\", \"K2\", \"K4\"],\n", " \"A\": [\"A0\", \"A1\", \"A2\", \"A4\"],\n", " \"B\": [\"B0\", \"B1\", \"B2\", \"B4\"],\n", " }\n", ")\n", "\n", "\n", "right = pd.DataFrame(\n", " {\n", " \"key\": [\"K0\", \"K1\", \"K2\", \"K3\"],\n", " \"C\": [\"C0\", \"C1\", \"C2\", \"C3\"],\n", " \"D\": [\"D0\", \"D1\", \"D2\", \"D3\"],\n", " }\n", ")" ] }, { "cell_type": "code", "execution_count": 25, "id": "2134f10e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keyAB
0K0A0B0
1K1A1B1
2K2A2B2
3K4A4B4
\n", "
" ], "text/plain": [ " key A B\n", "0 K0 A0 B0\n", "1 K1 A1 B1\n", "2 K2 A2 B2\n", "3 K4 A4 B4" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "left" ] }, { "cell_type": "code", "execution_count": 26, "id": "2a4630c6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keyCD
0K0C0D0
1K1C1D1
2K2C2D2
3K3C3D3
\n", "
" ], "text/plain": [ " key C D\n", "0 K0 C0 D0\n", "1 K1 C1 D1\n", "2 K2 C2 D2\n", "3 K3 C3 D3" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "right" ] }, { "cell_type": "code", "execution_count": 27, "id": "db26d2ab", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keyABCD
0K0A0B0C0D0
1K1A1B1C1D1
2K2A2B2C2D2
\n", "
" ], "text/plain": [ " key A B C D\n", "0 K0 A0 B0 C0 D0\n", "1 K1 A1 B1 C1 D1\n", "2 K2 A2 B2 C2 D2" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, how='inner',)" ] }, { "cell_type": "code", "execution_count": 28, "id": "f7960896", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keyABCD
0K0A0B0C0D0
1K1A1B1C1D1
2K2A2B2C2D2
3K4A4B4NaNNaN
4K3NaNNaNC3D3
\n", "
" ], "text/plain": [ " key A B C D\n", "0 K0 A0 B0 C0 D0\n", "1 K1 A1 B1 C1 D1\n", "2 K2 A2 B2 C2 D2\n", "3 K4 A4 B4 NaN NaN\n", "4 K3 NaN NaN C3 D3" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, how='outer')" ] }, { "cell_type": "code", "execution_count": 29, "id": "ef693fcd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keyABCD
0K0A0B0C0D0
1K1A1B1C1D1
2K2A2B2C2D2
3K4A4B4NaNNaN
\n", "
" ], "text/plain": [ " key A B C D\n", "0 K0 A0 B0 C0 D0\n", "1 K1 A1 B1 C1 D1\n", "2 K2 A2 B2 C2 D2\n", "3 K4 A4 B4 NaN NaN" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, how='left')" ] }, { "cell_type": "code", "execution_count": 30, "id": "b5e5ef82", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keyABCD
0K0A0B0C0D0
1K1A1B1C1D1
2K2A2B2C2D2
3K3NaNNaNC3D3
\n", "
" ], "text/plain": [ " key A B C D\n", "0 K0 A0 B0 C0 D0\n", "1 K1 A1 B1 C1 D1\n", "2 K2 A2 B2 C2 D2\n", "3 K3 NaN NaN C3 D3" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, how='right')" ] }, { "cell_type": "markdown", "id": "bd6928c8", "metadata": {}, "source": [ " We can merge on multiple columns too/" ] }, { "cell_type": "code", "execution_count": 31, "id": "ae44c24f", "metadata": {}, "outputs": [], "source": [ "left = pd.DataFrame(\n", " {\n", " \"key1\": [\"K0\", \"K0\", \"K1\", \"K2\"],\n", " \"key2\": [\"K0\", \"K1\", \"K0\", \"K1\"],\n", " \"A\": [\"A0\", \"A1\", \"A2\", \"A3\"],\n", " \"B\": [\"B0\", \"B1\", \"B2\", \"B3\"],\n", " }\n", ")\n", "\n", "\n", "right = pd.DataFrame(\n", " {\n", " \"key1\": [\"K0\", \"K1\", \"K1\", \"K2\"],\n", " \"key2\": [\"K0\", \"K0\", \"K0\", \"K0\"],\n", " \"C\": [\"C0\", \"C1\", \"C2\", \"C3\"],\n", " \"D\": [\"D0\", \"D1\", \"D2\", \"D3\"],\n", " }\n", ")" ] }, { "cell_type": "code", "execution_count": 32, "id": "dd65da6e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2AB
0K0K0A0B0
1K0K1A1B1
2K1K0A2B2
3K2K1A3B3
\n", "
" ], "text/plain": [ " key1 key2 A B\n", "0 K0 K0 A0 B0\n", "1 K0 K1 A1 B1\n", "2 K1 K0 A2 B2\n", "3 K2 K1 A3 B3" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "left" ] }, { "cell_type": "code", "execution_count": 33, "id": "25cc9549", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2CD
0K0K0C0D0
1K1K0C1D1
2K1K0C2D2
3K2K0C3D3
\n", "
" ], "text/plain": [ " key1 key2 C D\n", "0 K0 K0 C0 D0\n", "1 K1 K0 C1 D1\n", "2 K1 K0 C2 D2\n", "3 K2 K0 C3 D3" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "right" ] }, { "cell_type": "code", "execution_count": 34, "id": "6a5ba4a3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2ABCD
0K0K0A0B0C0D0
1K1K0A2B2C1D1
2K1K0A2B2C2D2
\n", "
" ], "text/plain": [ " key1 key2 A B C D\n", "0 K0 K0 A0 B0 C0 D0\n", "1 K1 K0 A2 B2 C1 D1\n", "2 K1 K0 A2 B2 C2 D2" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, on=[\"key1\", \"key2\"])" ] }, { "cell_type": "code", "execution_count": 35, "id": "1c3705d6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2ABCD
0K0K0A0B0C0D0
1K0K1A1B1NaNNaN
2K1K0A2B2C1D1
3K1K0A2B2C2D2
4K2K1A3B3NaNNaN
5K2K0NaNNaNC3D3
\n", "
" ], "text/plain": [ " key1 key2 A B C D\n", "0 K0 K0 A0 B0 C0 D0\n", "1 K0 K1 A1 B1 NaN NaN\n", "2 K1 K0 A2 B2 C1 D1\n", "3 K1 K0 A2 B2 C2 D2\n", "4 K2 K1 A3 B3 NaN NaN\n", "5 K2 K0 NaN NaN C3 D3" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, on=[\"key1\", \"key2\"],how='outer')" ] }, { "cell_type": "code", "execution_count": 36, "id": "b727354a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key1key2_xABkey2_yCD
0K0K0A0B0K0C0D0
1K0K1A1B1K0C0D0
2K1K0A2B2K0C1D1
3K1K0A2B2K0C2D2
4K2K1A3B3K0C3D3
\n", "
" ], "text/plain": [ " key1 key2_x A B key2_y C D\n", "0 K0 K0 A0 B0 K0 C0 D0\n", "1 K0 K1 A1 B1 K0 C0 D0\n", "2 K1 K0 A2 B2 K0 C1 D1\n", "3 K1 K0 A2 B2 K0 C2 D2\n", "4 K2 K1 A3 B3 K0 C3 D3" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right,on = 'key1' )" ] }, { "cell_type": "code", "execution_count": null, "id": "afe59900", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 37, "id": "5547fa50", "metadata": {}, "outputs": [], "source": [ "left = pd.DataFrame({\"A\": [1, 2], \"B\": [2, 2]})\n", "\n", "right = pd.DataFrame({\"A\": [4, 5, 6], \"B\": [2, 2, 2]})\n" ] }, { "cell_type": "code", "execution_count": 38, "id": "3a9b0b5d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
012
122
\n", "
" ], "text/plain": [ " A B\n", "0 1 2\n", "1 2 2" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "left" ] }, { "cell_type": "code", "execution_count": 39, "id": "908c67e0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
042
152
262
\n", "
" ], "text/plain": [ " A B\n", "0 4 2\n", "1 5 2\n", "2 6 2" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "right" ] }, { "cell_type": "code", "execution_count": 40, "id": "f0c8734a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
A_xBA_y
0124
1125
2126
3224
4225
5226
\n", "
" ], "text/plain": [ " A_x B A_y\n", "0 1 2 4\n", "1 1 2 5\n", "2 1 2 6\n", "3 2 2 4\n", "4 2 2 5\n", "5 2 2 6" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, on=\"B\", how=\"outer\")" ] }, { "cell_type": "code", "execution_count": null, "id": "875ce932", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "8a80198e", "metadata": {}, "source": [ "## Questions After Class\n", "\n", "```{important}\n", "several questions were easiest to answer within the narrative of the notes above.\n", "```\n", "\n", "### How do I represent NaN as a variable? (Ex. df1819['TEAM_ID_y'==NaN]...something like that?)\n", "\n", "For that specific case, you can use the `isna` method as I did above, but if you need a NaN constant otherwise pandas provides" ] }, { "cell_type": "code", "execution_count": 41, "id": "56321717", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.NA" ] }, { "cell_type": "markdown", "id": "d0e38245", "metadata": {}, "source": [ "and numpy provides" ] }, { "cell_type": "code", "execution_count": 42, "id": "b7a79448", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "nan" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "np.nan" ] }, { "cell_type": "markdown", "id": "5ee7fb5d", "metadata": {}, "source": [ "Watch out though because they are not the same:" ] }, { "cell_type": "code", "execution_count": 43, "id": "aaaf0b70", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.nan == pd.NA" ] }, { "cell_type": "markdown", "id": "708ac17c", "metadata": {}, "source": [ "and this cannot even assert" ] }, { "cell_type": "code", "execution_count": 44, "id": "794a3da9", "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "boolean value of NA is ambiguous", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[44], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m pd\u001b[38;5;241m.\u001b[39mNA \u001b[38;5;241m==\u001b[39m np\u001b[38;5;241m.\u001b[39mnan\n", "File \u001b[0;32m/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/pandas/_libs/missing.pyx:382\u001b[0m, in \u001b[0;36mpandas._libs.missing.NAType.__bool__\u001b[0;34m()\u001b[0m\n", "\u001b[0;31mTypeError\u001b[0m: boolean value of NA is ambiguous" ] } ], "source": [ "assert pd.NA == np.nan" ] }, { "cell_type": "markdown", "id": "1f9f7595", "metadata": {}, "source": [ "The `pandas` `pd.isna` method is robust, and it knows how `numpy` works (because it is built on top and imports)." ] }, { "cell_type": "code", "execution_count": 45, "id": "4c17cb24", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.isna(np.nan)" ] }, { "cell_type": "code", "execution_count": 46, "id": "b91595a7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.isna(pd.NA)" ] }, { "cell_type": "markdown", "id": "166e3e20", "metadata": {}, "source": [ "However, `numpy` does not know `pandas`" ] }, { "cell_type": "code", "execution_count": 47, "id": "a0aa05bc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.isnan(pd.NA)" ] }, { "cell_type": "markdown", "id": "313e21c0", "metadata": {}, "source": [ "and returns a value that cannot be used for indexing\n", "\n", "### Is there an easier way to look at the changes with merge?\n", "\n", "Any of the tools we have to look at a DataFrame will work. I think using a small DataFrame is the best way to get a good idea of how they each work, so I included that. Overall, checking the shape gives a broad view and then comparing values gives the detials.\n", "\n", "### Do merges also delete any other overlapping columns other than the one they are told to consolidate on?\n", "\n", "No they copy them all\n", "\n", "### in pd.merge(p18,p19, on='PLAYER_ID',how='____').shape is p18 and p19 supposed to be df_18 and df_19?\n", "\n", "Yes, sorry, I read them in using different names than were in my notes\n", "\n", "### Learning more about accessing data from databases\n", "\n", "We will access databases on Wednesday\n", "\n", "### How to merge data that does not line up as well as these two datasets?\n", "\n", "We will see how to merge on columns that have the same data, but different columns this week. When there is no columns that match value to value, you have to transform and add new columns, combining what we learned about cleaning data to make them line up somehow. For example, maybe one dataset has dates like `YYYY-MM-DD` and the other data set has only months like \"January\". You could split the whole date into three columns and transform the strings to numbers or numbers to strings by treating the columns as [dates](https://pandas.pydata.org/docs/user_guide/timeseries.html)\n", "\n", "### Why did we merged on 'PLAYER_ID'?\n", "\n", "We were interested in linking based on the players.\n", "\n", "\n", "### What is the second best source after documentation? For ex: you found a parameter that might do what you want it to in pandas, but where do you go after to make sure? I'm not sure if I ran it, I'd specifically know if it \"worked\"\n", "\n", "Ideally, you should know what you are looking for to know if it worked or not. If you are not sure what you are looking for, that is a good time to try to think that through, or attend office hours. Next is to create an issue on your assignment repo if the question might reveal a solution.\n", "\n", "\n", "### I think I need to understand how GitHub works because I'm following the steps of the assignments and I feel like I'm doing something wrong every time I have to work on GitHub.\n", "\n", "If you are getting feedback, you are close enough. However, I am offering an extra session on GitHub on Friday." ] } ], "metadata": { "jupytext": { "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.13, "jupytext_version": "1.14.1" } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" }, "source_map": [ 12, 44, 46, 50, 52, 56, 59, 63, 65, 71, 73, 77, 79, 85, 87, 105, 107, 111, 113, 117, 120, 124, 128, 132, 137, 139, 143, 145, 154, 156, 160, 165, 172, 175, 179, 182, 186, 191, 195, 197, 201, 205, 224, 228, 232, 236, 240, 244, 246, 250, 271, 275, 279, 283, 287, 291, 295, 302, 306, 310, 314, 316, 328, 330, 333, 336, 341, 343, 348, 350, 355, 359, 361, 366, 368 ] }, "nbformat": 4, "nbformat_minor": 5 }