{
"cells": [
{
"cell_type": "markdown",
"id": "f14e95ab",
"metadata": {},
"source": [
"# Class 12: Constructing Datasets from Multiple Sources"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "bbca4d7c",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "94f89de0",
"metadata": {},
"source": [
"## Introduction\n",
"Sometimes we use data from multiple sources. For example data may be provided across a bunch of tables and we want to put them together. Or we may want to build our own dataset to answer the questions that we want to ask.\n",
"For example we have a dataset of drivers who have been pulled over by the police, and need to compare distribution of them between day and night times. If we only have time of being pulled over without mentioning day/night, we can use another dataset including time of sunset for every day and build our own dataset out of these two dataset to find the answer.\n",
"Later we will also learn about getting data out of databses with some small database operations through SQLite and some python libararies.\n",
"\n",
"\n",
"To use relative paths as in class (`data/2018-games.csv`) instead of a full url `https://raw.githubusercontent.com/rhodyprog4ds/inclass-data/main/2018-games.csv`, download data from [this GitHub repo](https://github.com/rhodyprog4ds/inclass-data) by clicking on the green code button and choosing .zip. Then unzip the data and save it in a data folder in the same folder as the notebook. For the class notes, the urls make it so that the notebook can run without having to store the data in another place."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d5646ba0",
"metadata": {},
"outputs": [],
"source": [
"games_df18 = pd.read_csv('https://raw.githubusercontent.com/rhodyprog4ds/inclass-data/main/2018-games.csv')\n",
"games_df19 = pd.read_csv('https://raw.githubusercontent.com/rhodyprog4ds/inclass-data/main/2019-games.csv')"
]
},
{
"cell_type": "markdown",
"id": "70effb06",
"metadata": {},
"source": [
"\n",
"## Stacking DataFrames\n",
"Let's look at the first couple of rows to see how the configuration of data is."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ac6fe4f4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" GAME_DATE_EST | \n",
" GAME_ID | \n",
" GAME_STATUS_TEXT | \n",
" HOME_TEAM_ID | \n",
" VISITOR_TEAM_ID | \n",
" SEASON | \n",
" TEAM_ID_home | \n",
" PTS_home | \n",
" FG_PCT_home | \n",
" ... | \n",
" AST_home | \n",
" REB_home | \n",
" TEAM_ID_away | \n",
" PTS_away | \n",
" FG_PCT_away | \n",
" FT_PCT_away | \n",
" FG3_PCT_away | \n",
" AST_away | \n",
" REB_away | \n",
" HOME_TEAM_WINS | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 16196 | \n",
" 2019-06-13 | \n",
" 41800406 | \n",
" Final | \n",
" 1610612744 | \n",
" 1610612761 | \n",
" 2018 | \n",
" 1610612744 | \n",
" 110.0 | \n",
" 0.488 | \n",
" ... | \n",
" 28.0 | \n",
" 42.0 | \n",
" 1610612761 | \n",
" 114.0 | \n",
" 0.476 | \n",
" 0.793 | \n",
" 0.394 | \n",
" 25.0 | \n",
" 39.0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 16197 | \n",
" 2019-06-10 | \n",
" 41800405 | \n",
" Final | \n",
" 1610612761 | \n",
" 1610612744 | \n",
" 2018 | \n",
" 1610612761 | \n",
" 105.0 | \n",
" 0.447 | \n",
" ... | \n",
" 19.0 | \n",
" 43.0 | \n",
" 1610612744 | \n",
" 106.0 | \n",
" 0.463 | \n",
" 0.714 | \n",
" 0.476 | \n",
" 27.0 | \n",
" 37.0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
2 rows × 22 columns
\n",
"
"
],
"text/plain": [
" Unnamed: 0 GAME_DATE_EST GAME_ID GAME_STATUS_TEXT HOME_TEAM_ID \\\n",
"0 16196 2019-06-13 41800406 Final 1610612744 \n",
"1 16197 2019-06-10 41800405 Final 1610612761 \n",
"\n",
" VISITOR_TEAM_ID SEASON TEAM_ID_home PTS_home FG_PCT_home ... \\\n",
"0 1610612761 2018 1610612744 110.0 0.488 ... \n",
"1 1610612744 2018 1610612761 105.0 0.447 ... \n",
"\n",
" AST_home REB_home TEAM_ID_away PTS_away FG_PCT_away FT_PCT_away \\\n",
"0 28.0 42.0 1610612761 114.0 0.476 0.793 \n",
"1 19.0 43.0 1610612744 106.0 0.463 0.714 \n",
"\n",
" FG3_PCT_away AST_away REB_away HOME_TEAM_WINS \n",
"0 0.394 25.0 39.0 0 \n",
"1 0.476 27.0 37.0 0 \n",
"\n",
"[2 rows x 22 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"games_df18.head(2)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "47ccc58e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" GAME_DATE_EST | \n",
" GAME_ID | \n",
" GAME_STATUS_TEXT | \n",
" HOME_TEAM_ID | \n",
" VISITOR_TEAM_ID | \n",
" SEASON | \n",
" TEAM_ID_home | \n",
" PTS_home | \n",
" FG_PCT_home | \n",
" ... | \n",
" AST_home | \n",
" REB_home | \n",
" TEAM_ID_away | \n",
" PTS_away | \n",
" FG_PCT_away | \n",
" FT_PCT_away | \n",
" FG3_PCT_away | \n",
" AST_away | \n",
" REB_away | \n",
" HOME_TEAM_WINS | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 2020-03-01 | \n",
" 21900895 | \n",
" Final | \n",
" 1610612766 | \n",
" 1610612749 | \n",
" 2019 | \n",
" 1610612766 | \n",
" 85.0 | \n",
" 0.354 | \n",
" ... | \n",
" 22.0 | \n",
" 47.0 | \n",
" 1610612749 | \n",
" 93.0 | \n",
" 0.402 | \n",
" 0.762 | \n",
" 0.226 | \n",
" 20.0 | \n",
" 61.0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 2020-03-01 | \n",
" 21900896 | \n",
" Final | \n",
" 1610612750 | \n",
" 1610612742 | \n",
" 2019 | \n",
" 1610612750 | \n",
" 91.0 | \n",
" 0.364 | \n",
" ... | \n",
" 19.0 | \n",
" 57.0 | \n",
" 1610612742 | \n",
" 111.0 | \n",
" 0.468 | \n",
" 0.632 | \n",
" 0.275 | \n",
" 28.0 | \n",
" 56.0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
2 rows × 22 columns
\n",
"
"
],
"text/plain": [
" Unnamed: 0 GAME_DATE_EST GAME_ID GAME_STATUS_TEXT HOME_TEAM_ID \\\n",
"0 0 2020-03-01 21900895 Final 1610612766 \n",
"1 1 2020-03-01 21900896 Final 1610612750 \n",
"\n",
" VISITOR_TEAM_ID SEASON TEAM_ID_home PTS_home FG_PCT_home ... \\\n",
"0 1610612749 2019 1610612766 85.0 0.354 ... \n",
"1 1610612742 2019 1610612750 91.0 0.364 ... \n",
"\n",
" AST_home REB_home TEAM_ID_away PTS_away FG_PCT_away FT_PCT_away \\\n",
"0 22.0 47.0 1610612749 93.0 0.402 0.762 \n",
"1 19.0 57.0 1610612742 111.0 0.468 0.632 \n",
"\n",
" FG3_PCT_away AST_away REB_away HOME_TEAM_WINS \n",
"0 0.226 20.0 61.0 0 \n",
"1 0.275 28.0 56.0 0 \n",
"\n",
"[2 rows x 22 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"games_df19.head(2)"
]
},
{
"cell_type": "markdown",
"id": "f0d22345",
"metadata": {},
"source": [
"As we can see, both datasets have the same columns, and just for two different years."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "f4801a18",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1378, 22)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"games_df18.shape"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a9a126b0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(965, 22)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"games_df19.shape"
]
},
{
"cell_type": "markdown",
"id": "5b1dbc92",
"metadata": {},
"source": [
"Let's concatenate two dataframes to make one dataframe out of them."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a758bdb7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2343, 22)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"games_df = pd.concat([games_df18,games_df19])\n",
"games_df.shape"
]
},
{
"cell_type": "markdown",
"id": "0bfc9139",
"metadata": {},
"source": [
"As we can see rows are added up but we have the same numbe of rcolums."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c1e2463d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Unnamed: 0', 'GAME_DATE_EST', 'GAME_ID', 'GAME_STATUS_TEXT',\n",
" 'HOME_TEAM_ID', 'VISITOR_TEAM_ID', 'SEASON', 'TEAM_ID_home', 'PTS_home',\n",
" 'FG_PCT_home', 'FT_PCT_home', 'FG3_PCT_home', 'AST_home', 'REB_home',\n",
" 'TEAM_ID_away', 'PTS_away', 'FG_PCT_away', 'FT_PCT_away',\n",
" 'FG3_PCT_away', 'AST_away', 'REB_away', 'HOME_TEAM_WINS'],\n",
" dtype='object')"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"games_df.columns"
]
},
{
"cell_type": "markdown",
"id": "8df40c57",
"metadata": {},
"source": [
"Let's drop the column we do not need."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "872952bb",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" GAME_DATE_EST | \n",
" GAME_ID | \n",
" GAME_STATUS_TEXT | \n",
" HOME_TEAM_ID | \n",
" VISITOR_TEAM_ID | \n",
" SEASON | \n",
" TEAM_ID_home | \n",
" PTS_home | \n",
" FG_PCT_home | \n",
" FT_PCT_home | \n",
" ... | \n",
" AST_home | \n",
" REB_home | \n",
" TEAM_ID_away | \n",
" PTS_away | \n",
" FG_PCT_away | \n",
" FT_PCT_away | \n",
" FG3_PCT_away | \n",
" AST_away | \n",
" REB_away | \n",
" HOME_TEAM_WINS | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2019-06-13 | \n",
" 41800406 | \n",
" Final | \n",
" 1610612744 | \n",
" 1610612761 | \n",
" 2018 | \n",
" 1610612744 | \n",
" 110.0 | \n",
" 0.488 | \n",
" 0.700 | \n",
" ... | \n",
" 28.0 | \n",
" 42.0 | \n",
" 1610612761 | \n",
" 114.0 | \n",
" 0.476 | \n",
" 0.793 | \n",
" 0.394 | \n",
" 25.0 | \n",
" 39.0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 2019-06-10 | \n",
" 41800405 | \n",
" Final | \n",
" 1610612761 | \n",
" 1610612744 | \n",
" 2018 | \n",
" 1610612761 | \n",
" 105.0 | \n",
" 0.447 | \n",
" 0.778 | \n",
" ... | \n",
" 19.0 | \n",
" 43.0 | \n",
" 1610612744 | \n",
" 106.0 | \n",
" 0.463 | \n",
" 0.714 | \n",
" 0.476 | \n",
" 27.0 | \n",
" 37.0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
2 rows × 21 columns
\n",
"
"
],
"text/plain": [
" GAME_DATE_EST GAME_ID GAME_STATUS_TEXT HOME_TEAM_ID VISITOR_TEAM_ID \\\n",
"0 2019-06-13 41800406 Final 1610612744 1610612761 \n",
"1 2019-06-10 41800405 Final 1610612761 1610612744 \n",
"\n",
" SEASON TEAM_ID_home PTS_home FG_PCT_home FT_PCT_home ... AST_home \\\n",
"0 2018 1610612744 110.0 0.488 0.700 ... 28.0 \n",
"1 2018 1610612761 105.0 0.447 0.778 ... 19.0 \n",
"\n",
" REB_home TEAM_ID_away PTS_away FG_PCT_away FT_PCT_away FG3_PCT_away \\\n",
"0 42.0 1610612761 114.0 0.476 0.793 0.394 \n",
"1 43.0 1610612744 106.0 0.463 0.714 0.476 \n",
"\n",
" AST_away REB_away HOME_TEAM_WINS \n",
"0 25.0 39.0 0 \n",
"1 27.0 37.0 0 \n",
"\n",
"[2 rows x 21 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"games_df.drop(columns= 'Unnamed: 0',inplace=True,)\n",
"games_df.head(2)"
]
},
{
"cell_type": "markdown",
"id": "8e243ed3",
"metadata": {},
"source": [
"\n",
"## Merging Data Frames\n",
"Now we read another dataset which some of its columns are the same as dataframe \"games_df\" and some are different."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "2df04016",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" LEAGUE_ID | \n",
" TEAM_ID | \n",
" MIN_YEAR | \n",
" MAX_YEAR | \n",
" ABBREVIATION | \n",
" NICKNAME | \n",
" YEARFOUNDED | \n",
" CITY | \n",
" ARENA | \n",
" ARENACAPACITY | \n",
" OWNER | \n",
" GENERALMANAGER | \n",
" HEADCOACH | \n",
" DLEAGUEAFFILIATION | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 1610612737 | \n",
" 1949 | \n",
" 2019 | \n",
" ATL | \n",
" Hawks | \n",
" 1949 | \n",
" Atlanta | \n",
" State Farm Arena | \n",
" 18729.0 | \n",
" Tony Ressler | \n",
" Travis Schlenk | \n",
" Lloyd Pierce | \n",
" Erie Bayhawks | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 1610612738 | \n",
" 1946 | \n",
" 2019 | \n",
" BOS | \n",
" Celtics | \n",
" 1946 | \n",
" Boston | \n",
" TD Garden | \n",
" 18624.0 | \n",
" Wyc Grousbeck | \n",
" Danny Ainge | \n",
" Brad Stevens | \n",
" Maine Red Claws | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" LEAGUE_ID TEAM_ID MIN_YEAR MAX_YEAR ABBREVIATION NICKNAME \\\n",
"0 0 1610612737 1949 2019 ATL Hawks \n",
"1 0 1610612738 1946 2019 BOS Celtics \n",
"\n",
" YEARFOUNDED CITY ARENA ARENACAPACITY OWNER \\\n",
"0 1949 Atlanta State Farm Arena 18729.0 Tony Ressler \n",
"1 1946 Boston TD Garden 18624.0 Wyc Grousbeck \n",
"\n",
" GENERALMANAGER HEADCOACH DLEAGUEAFFILIATION \n",
"0 Travis Schlenk Lloyd Pierce Erie Bayhawks \n",
"1 Danny Ainge Brad Stevens Maine Red Claws "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"teams_df = pd.read_csv('https://raw.githubusercontent.com/rhodyprog4ds/inclass-data/main/teams.csv')\n",
"teams_df.head(2)"
]
},
{
"cell_type": "markdown",
"id": "bdaefef6",
"metadata": {},
"source": [
"We use left_on='TEAM_ID' and right_on = 'HOME_TEAM_ID' to match two dataframes."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "f0d13ec3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" LEAGUE_ID | \n",
" TEAM_ID | \n",
" MIN_YEAR | \n",
" MAX_YEAR | \n",
" ABBREVIATION | \n",
" NICKNAME | \n",
" YEARFOUNDED | \n",
" CITY | \n",
" ARENA | \n",
" ARENACAPACITY | \n",
" ... | \n",
" AST_home | \n",
" REB_home | \n",
" TEAM_ID_away | \n",
" PTS_away | \n",
" FG_PCT_away | \n",
" FT_PCT_away | \n",
" FG3_PCT_away | \n",
" AST_away | \n",
" REB_away | \n",
" HOME_TEAM_WINS | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 1610612737 | \n",
" 1949 | \n",
" 2019 | \n",
" ATL | \n",
" Hawks | \n",
" 1949 | \n",
" Atlanta | \n",
" State Farm Arena | \n",
" 18729.0 | \n",
" ... | \n",
" 29.0 | \n",
" 61.0 | \n",
" 1610612754 | \n",
" 135.0 | \n",
" 0.459 | \n",
" 0.846 | \n",
" 0.400 | \n",
" 22.0 | \n",
" 43.0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 1610612737 | \n",
" 1949 | \n",
" 2019 | \n",
" ATL | \n",
" Hawks | \n",
" 1949 | \n",
" Atlanta | \n",
" State Farm Arena | \n",
" 18729.0 | \n",
" ... | \n",
" 29.0 | \n",
" 44.0 | \n",
" 1610612755 | \n",
" 122.0 | \n",
" 0.459 | \n",
" 0.579 | \n",
" 0.323 | \n",
" 27.0 | \n",
" 57.0 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
2 rows × 35 columns
\n",
"
"
],
"text/plain": [
" LEAGUE_ID TEAM_ID MIN_YEAR MAX_YEAR ABBREVIATION NICKNAME \\\n",
"0 0 1610612737 1949 2019 ATL Hawks \n",
"1 0 1610612737 1949 2019 ATL Hawks \n",
"\n",
" YEARFOUNDED CITY ARENA ARENACAPACITY ... AST_home \\\n",
"0 1949 Atlanta State Farm Arena 18729.0 ... 29.0 \n",
"1 1949 Atlanta State Farm Arena 18729.0 ... 29.0 \n",
"\n",
" REB_home TEAM_ID_away PTS_away FG_PCT_away FT_PCT_away FG3_PCT_away \\\n",
"0 61.0 1610612754 135.0 0.459 0.846 0.400 \n",
"1 44.0 1610612755 122.0 0.459 0.579 0.323 \n",
"\n",
" AST_away REB_away HOME_TEAM_WINS \n",
"0 22.0 43.0 0 \n",
"1 27.0 57.0 1 \n",
"\n",
"[2 rows x 35 columns]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merge1_df = pd.merge(teams_df,games_df,left_on='TEAM_ID', right_on = 'HOME_TEAM_ID')\n",
"merge1_df.head(2)"
]
},
{
"cell_type": "markdown",
"id": "793c5964",
"metadata": {},
"source": [
"We want information for each game and append the team info onto that. So, lets try another settings in our mergging."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "b53b7b7f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2343, 35)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merge1_df.shape"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "3bf49b69",
"metadata": {},
"outputs": [],
"source": [
"merge2_df = pd.merge(teams_df, games_df,left_on='TEAM_ID', right_on = 'HOME_TEAM_ID', how='outer')"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "2986c10c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" LEAGUE_ID | \n",
" TEAM_ID | \n",
" MIN_YEAR | \n",
" MAX_YEAR | \n",
" ABBREVIATION | \n",
" NICKNAME | \n",
" YEARFOUNDED | \n",
" CITY | \n",
" ARENA | \n",
" ARENACAPACITY | \n",
" ... | \n",
" AST_home | \n",
" REB_home | \n",
" TEAM_ID_away | \n",
" PTS_away | \n",
" FG_PCT_away | \n",
" FT_PCT_away | \n",
" FG3_PCT_away | \n",
" AST_away | \n",
" REB_away | \n",
" HOME_TEAM_WINS | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 1610612737 | \n",
" 1949 | \n",
" 2019 | \n",
" ATL | \n",
" Hawks | \n",
" 1949 | \n",
" Atlanta | \n",
" State Farm Arena | \n",
" 18729.0 | \n",
" ... | \n",
" 29.0 | \n",
" 61.0 | \n",
" 1610612754 | \n",
" 135.0 | \n",
" 0.459 | \n",
" 0.846 | \n",
" 0.400 | \n",
" 22.0 | \n",
" 43.0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 1610612737 | \n",
" 1949 | \n",
" 2019 | \n",
" ATL | \n",
" Hawks | \n",
" 1949 | \n",
" Atlanta | \n",
" State Farm Arena | \n",
" 18729.0 | \n",
" ... | \n",
" 29.0 | \n",
" 44.0 | \n",
" 1610612755 | \n",
" 122.0 | \n",
" 0.459 | \n",
" 0.579 | \n",
" 0.323 | \n",
" 27.0 | \n",
" 57.0 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
2 rows × 35 columns
\n",
"
"
],
"text/plain": [
" LEAGUE_ID TEAM_ID MIN_YEAR MAX_YEAR ABBREVIATION NICKNAME \\\n",
"0 0 1610612737 1949 2019 ATL Hawks \n",
"1 0 1610612737 1949 2019 ATL Hawks \n",
"\n",
" YEARFOUNDED CITY ARENA ARENACAPACITY ... AST_home \\\n",
"0 1949 Atlanta State Farm Arena 18729.0 ... 29.0 \n",
"1 1949 Atlanta State Farm Arena 18729.0 ... 29.0 \n",
"\n",
" REB_home TEAM_ID_away PTS_away FG_PCT_away FT_PCT_away FG3_PCT_away \\\n",
"0 61.0 1610612754 135.0 0.459 0.846 0.400 \n",
"1 44.0 1610612755 122.0 0.459 0.579 0.323 \n",
"\n",
" AST_away REB_away HOME_TEAM_WINS \n",
"0 22.0 43.0 0 \n",
"1 27.0 57.0 1 \n",
"\n",
"[2 rows x 35 columns]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merge2_df.head(2)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "8217507f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2343, 35)"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merge2_df.shape"
]
},
{
"cell_type": "markdown",
"id": "fc6b7830",
"metadata": {},
"source": [
"We can group by \"ARENA\" and then look at the \"mean\" statistics."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "2cf706c7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" LEAGUE_ID | \n",
" TEAM_ID | \n",
" MIN_YEAR | \n",
" MAX_YEAR | \n",
" YEARFOUNDED | \n",
" ARENACAPACITY | \n",
" GAME_ID | \n",
" HOME_TEAM_ID | \n",
" VISITOR_TEAM_ID | \n",
" SEASON | \n",
" ... | \n",
" AST_home | \n",
" REB_home | \n",
" TEAM_ID_away | \n",
" PTS_away | \n",
" FG_PCT_away | \n",
" FT_PCT_away | \n",
" FG3_PCT_away | \n",
" AST_away | \n",
" REB_away | \n",
" HOME_TEAM_WINS | \n",
"
\n",
" \n",
" ARENA | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" AT&T Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1976.000000 | \n",
" 2019.0 | \n",
" 1976.000000 | \n",
" 18694.0 | \n",
" 2.184021e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.397436 | \n",
" ... | \n",
" 25.141026 | \n",
" 45.782051 | \n",
" 1.610613e+09 | \n",
" 108.448718 | \n",
" 0.449333 | \n",
" 0.770808 | \n",
" 0.348231 | \n",
" 23.987179 | \n",
" 43.833333 | \n",
" 0.666667 | \n",
"
\n",
" \n",
" American Airlines Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1980.000000 | \n",
" 2019.0 | \n",
" 1980.000000 | \n",
" 19200.0 | \n",
" 2.130985e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.426667 | \n",
" ... | \n",
" 23.920000 | \n",
" 46.093333 | \n",
" 1.610613e+09 | \n",
" 108.960000 | \n",
" 0.456093 | \n",
" 0.777320 | \n",
" 0.347707 | \n",
" 22.960000 | \n",
" 43.720000 | \n",
" 0.546667 | \n",
"
\n",
" \n",
" AmericanAirlines Arena | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1988.000000 | \n",
" 2019.0 | \n",
" 1988.000000 | \n",
" 19600.0 | \n",
" 2.105314e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.421053 | \n",
" ... | \n",
" 25.039474 | \n",
" 46.276316 | \n",
" 1.610613e+09 | \n",
" 106.618421 | \n",
" 0.442684 | \n",
" 0.771987 | \n",
" 0.344250 | \n",
" 23.157895 | \n",
" 42.131579 | \n",
" 0.644737 | \n",
"
\n",
" \n",
" Amway Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1989.000000 | \n",
" 2019.0 | \n",
" 1989.000000 | \n",
" 0.0 | \n",
" 2.171458e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.423077 | \n",
" ... | \n",
" 24.525641 | \n",
" 45.935897 | \n",
" 1.610613e+09 | \n",
" 105.538462 | \n",
" 0.454372 | \n",
" 0.772487 | \n",
" 0.359962 | \n",
" 22.987179 | \n",
" 44.769231 | \n",
" 0.538462 | \n",
"
\n",
" \n",
" Bankers Life Fieldhouse | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1976.000000 | \n",
" 2019.0 | \n",
" 1976.000000 | \n",
" 18345.0 | \n",
" 2.197454e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.441558 | \n",
" ... | \n",
" 26.363636 | \n",
" 44.597403 | \n",
" 1.610613e+09 | \n",
" 103.051948 | \n",
" 0.435494 | \n",
" 0.761805 | \n",
" 0.329442 | \n",
" 23.272727 | \n",
" 44.662338 | \n",
" 0.675325 | \n",
"
\n",
" \n",
" Barclays Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1976.000000 | \n",
" 2019.0 | \n",
" 1976.000000 | \n",
" NaN | \n",
" 2.197516e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.413333 | \n",
" ... | \n",
" 23.880000 | \n",
" 48.000000 | \n",
" 1.610613e+09 | \n",
" 110.293333 | \n",
" 0.448133 | \n",
" 0.773533 | \n",
" 0.338547 | \n",
" 22.640000 | \n",
" 45.000000 | \n",
" 0.533333 | \n",
"
\n",
" \n",
" Capital One Arena | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1961.000000 | \n",
" 2019.0 | \n",
" 1961.000000 | \n",
" 20647.0 | \n",
" 2.130190e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.418919 | \n",
" ... | \n",
" 25.891892 | \n",
" 43.932432 | \n",
" 1.610613e+09 | \n",
" 115.310811 | \n",
" 0.472041 | \n",
" 0.765959 | \n",
" 0.353324 | \n",
" 24.554054 | \n",
" 46.121622 | \n",
" 0.500000 | \n",
"
\n",
" \n",
" Chase Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1946.000000 | \n",
" 2019.0 | \n",
" 1946.000000 | \n",
" 19596.0 | \n",
" 2.350492e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.377778 | \n",
" ... | \n",
" 28.444444 | \n",
" 44.722222 | \n",
" 1.610613e+09 | \n",
" 112.833333 | \n",
" 0.461144 | \n",
" 0.782122 | \n",
" 0.374233 | \n",
" 25.377778 | \n",
" 43.900000 | \n",
" 0.511111 | \n",
"
\n",
" \n",
" Chesapeake Energy Arena | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1967.000000 | \n",
" 2019.0 | \n",
" 1967.000000 | \n",
" 19163.0 | \n",
" 2.171802e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.425000 | \n",
" ... | \n",
" 23.450000 | \n",
" 45.387500 | \n",
" 1.610613e+09 | \n",
" 109.250000 | \n",
" 0.455088 | \n",
" 0.772437 | \n",
" 0.351600 | \n",
" 23.175000 | \n",
" 44.700000 | \n",
" 0.637500 | \n",
"
\n",
" \n",
" FedExForum | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1995.000000 | \n",
" 2019.0 | \n",
" 1995.000000 | \n",
" 18119.0 | \n",
" 2.131630e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.421053 | \n",
" ... | \n",
" 25.394737 | \n",
" 44.578947 | \n",
" 1.610613e+09 | \n",
" 108.236842 | \n",
" 0.445092 | \n",
" 0.786355 | \n",
" 0.362816 | \n",
" 23.473684 | \n",
" 44.986842 | \n",
" 0.526316 | \n",
"
\n",
" \n",
" Fiserv Forum | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1968.000000 | \n",
" 2019.0 | \n",
" 1968.000000 | \n",
" 17500.0 | \n",
" 2.328482e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.385542 | \n",
" ... | \n",
" 26.819277 | \n",
" 51.590361 | \n",
" 1.610613e+09 | \n",
" 106.421687 | \n",
" 0.417289 | \n",
" 0.758819 | \n",
" 0.347325 | \n",
" 23.253012 | \n",
" 44.385542 | \n",
" 0.843373 | \n",
"
\n",
" \n",
" Golden 1 Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1948.000000 | \n",
" 2019.0 | \n",
" 1948.000000 | \n",
" 17500.0 | \n",
" 2.142553e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.416667 | \n",
" ... | \n",
" 24.125000 | \n",
" 45.208333 | \n",
" 1.610613e+09 | \n",
" 112.361111 | \n",
" 0.459417 | \n",
" 0.772444 | \n",
" 0.349514 | \n",
" 23.805556 | \n",
" 46.708333 | \n",
" 0.527778 | \n",
"
\n",
" \n",
" Little Caesars Arena | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1948.000000 | \n",
" 2019.0 | \n",
" 1948.000000 | \n",
" 21000.0 | \n",
" 2.171695e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.430380 | \n",
" ... | \n",
" 23.974684 | \n",
" 44.670886 | \n",
" 1.610613e+09 | \n",
" 108.797468 | \n",
" 0.469127 | \n",
" 0.767203 | \n",
" 0.348127 | \n",
" 24.151899 | \n",
" 43.417722 | \n",
" 0.493671 | \n",
"
\n",
" \n",
" Madison Square Garden | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1946.000000 | \n",
" 2019.0 | \n",
" 1946.000000 | \n",
" 19763.0 | \n",
" 2.105316e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.421053 | \n",
" ... | \n",
" 21.500000 | \n",
" 45.447368 | \n",
" 1.610613e+09 | \n",
" 112.105263 | \n",
" 0.463842 | \n",
" 0.756461 | \n",
" 0.365237 | \n",
" 24.263158 | \n",
" 45.263158 | \n",
" 0.250000 | \n",
"
\n",
" \n",
" Moda Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1970.000000 | \n",
" 2019.0 | \n",
" 1970.000000 | \n",
" 19980.0 | \n",
" 2.316313e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.373494 | \n",
" ... | \n",
" 22.361446 | \n",
" 47.831325 | \n",
" 1.610613e+09 | \n",
" 112.060241 | \n",
" 0.448145 | \n",
" 0.793880 | \n",
" 0.371578 | \n",
" 22.939759 | \n",
" 44.048193 | \n",
" 0.662651 | \n",
"
\n",
" \n",
" Pepsi Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1976.000000 | \n",
" 2019.0 | \n",
" 1976.000000 | \n",
" 19099.0 | \n",
" 2.369186e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.395062 | \n",
" ... | \n",
" 27.419753 | \n",
" 46.419753 | \n",
" 1.610613e+09 | \n",
" 103.777778 | \n",
" 0.448531 | \n",
" 0.742173 | \n",
" 0.323988 | \n",
" 23.320988 | \n",
" 42.172840 | \n",
" 0.790123 | \n",
"
\n",
" \n",
" Quicken Loans Arena | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1970.000000 | \n",
" 2019.0 | \n",
" 1970.000000 | \n",
" 20562.0 | \n",
" 2.132390e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.428571 | \n",
" ... | \n",
" 22.038961 | \n",
" 44.025974 | \n",
" 1.610613e+09 | \n",
" 113.584416 | \n",
" 0.489766 | \n",
" 0.776896 | \n",
" 0.384247 | \n",
" 25.584416 | \n",
" 42.597403 | \n",
" 0.298701 | \n",
"
\n",
" \n",
" Scotiabank Arena | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1995.000000 | \n",
" 2019.0 | \n",
" 1995.000000 | \n",
" 19800.0 | \n",
" 2.428270e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.377778 | \n",
" ... | \n",
" 25.433333 | \n",
" 45.100000 | \n",
" 1.610613e+09 | \n",
" 106.144444 | \n",
" 0.432111 | \n",
" 0.770211 | \n",
" 0.347544 | \n",
" 24.155556 | \n",
" 45.300000 | \n",
" 0.744444 | \n",
"
\n",
" \n",
" Smoothie King Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 2002.000000 | \n",
" 2019.0 | \n",
" 2002.000000 | \n",
" NaN | \n",
" 2.156903e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.424658 | \n",
" ... | \n",
" 27.232877 | \n",
" 46.945205 | \n",
" 1.610613e+09 | \n",
" 116.808219 | \n",
" 0.474096 | \n",
" 0.771562 | \n",
" 0.362822 | \n",
" 25.109589 | \n",
" 44.561644 | \n",
" 0.438356 | \n",
"
\n",
" \n",
" Spectrum Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1988.000000 | \n",
" 2019.0 | \n",
" 1988.000000 | \n",
" 19026.0 | \n",
" 2.104182e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.413333 | \n",
" ... | \n",
" 23.746667 | \n",
" 44.293333 | \n",
" 1.610613e+09 | \n",
" 109.386667 | \n",
" 0.463920 | \n",
" 0.761573 | \n",
" 0.350853 | \n",
" 25.066667 | \n",
" 45.266667 | \n",
" 0.493333 | \n",
"
\n",
" \n",
" Staples Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1959.210191 | \n",
" 2019.0 | \n",
" 1959.210191 | \n",
" 19060.0 | \n",
" 2.139604e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.414013 | \n",
" ... | \n",
" 25.942675 | \n",
" 47.286624 | \n",
" 1.610613e+09 | \n",
" 110.477707 | \n",
" 0.443783 | \n",
" 0.775064 | \n",
" 0.338796 | \n",
" 24.095541 | \n",
" 45.133758 | \n",
" 0.636943 | \n",
"
\n",
" \n",
" State Farm Arena | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1949.000000 | \n",
" 2019.0 | \n",
" 1949.000000 | \n",
" 18729.0 | \n",
" 2.131766e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.434211 | \n",
" ... | \n",
" 25.736842 | \n",
" 45.960526 | \n",
" 1.610613e+09 | \n",
" 117.894737 | \n",
" 0.463447 | \n",
" 0.767184 | \n",
" 0.343553 | \n",
" 25.500000 | \n",
" 46.276316 | \n",
" 0.421053 | \n",
"
\n",
" \n",
" TD Garden | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1946.000000 | \n",
" 2019.0 | \n",
" 1946.000000 | \n",
" 18624.0 | \n",
" 2.235308e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.397436 | \n",
" ... | \n",
" 25.717949 | \n",
" 45.974359 | \n",
" 1.610613e+09 | \n",
" 106.358974 | \n",
" 0.442769 | \n",
" 0.743064 | \n",
" 0.335808 | \n",
" 23.282051 | \n",
" 44.153846 | \n",
" 0.717949 | \n",
"
\n",
" \n",
" Talking Stick Resort Arena | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1968.000000 | \n",
" 2019.0 | \n",
" 1968.000000 | \n",
" NaN | \n",
" 2.132518e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.441558 | \n",
" ... | \n",
" 25.298701 | \n",
" 41.519481 | \n",
" 1.610613e+09 | \n",
" 113.064935 | \n",
" 0.474922 | \n",
" 0.762377 | \n",
" 0.363143 | \n",
" 24.285714 | \n",
" 45.000000 | \n",
" 0.311688 | \n",
"
\n",
" \n",
" Target Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1989.000000 | \n",
" 2019.0 | \n",
" 1989.000000 | \n",
" 19356.0 | \n",
" 2.156304e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.402778 | \n",
" ... | \n",
" 24.458333 | \n",
" 46.583333 | \n",
" 1.610613e+09 | \n",
" 113.208333 | \n",
" 0.462306 | \n",
" 0.763181 | \n",
" 0.363625 | \n",
" 24.402778 | \n",
" 46.500000 | \n",
" 0.444444 | \n",
"
\n",
" \n",
" Toyota Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1967.000000 | \n",
" 2019.0 | \n",
" 1967.000000 | \n",
" 18104.0 | \n",
" 2.283925e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.387500 | \n",
" ... | \n",
" 21.712500 | \n",
" 44.850000 | \n",
" 1.610613e+09 | \n",
" 110.275000 | \n",
" 0.454438 | \n",
" 0.759188 | \n",
" 0.339250 | \n",
" 24.200000 | \n",
" 45.700000 | \n",
" 0.737500 | \n",
"
\n",
" \n",
" United Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1966.000000 | \n",
" 2019.0 | \n",
" 1966.000000 | \n",
" 21711.0 | \n",
" 2.107486e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.435897 | \n",
" ... | \n",
" 22.987179 | \n",
" 43.512821 | \n",
" 1.610613e+09 | \n",
" 110.705128 | \n",
" 0.475372 | \n",
" 0.750628 | \n",
" 0.355808 | \n",
" 25.602564 | \n",
" 46.192308 | \n",
" 0.307692 | \n",
"
\n",
" \n",
" Vivint Smart Home Arena | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1974.000000 | \n",
" 2019.0 | \n",
" 1974.000000 | \n",
" 20148.0 | \n",
" 2.197424e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.421053 | \n",
" ... | \n",
" 24.526316 | \n",
" 47.592105 | \n",
" 1.610613e+09 | \n",
" 105.723684 | \n",
" 0.443618 | \n",
" 0.767803 | \n",
" 0.351961 | \n",
" 20.539474 | \n",
" 42.276316 | \n",
" 0.684211 | \n",
"
\n",
" \n",
" Wells Fargo Center | \n",
" 0.0 | \n",
" 1.610613e+09 | \n",
" 1949.000000 | \n",
" 2019.0 | \n",
" 1949.000000 | \n",
" NaN | \n",
" 2.282764e+07 | \n",
" 1.610613e+09 | \n",
" 1.610613e+09 | \n",
" 2018.395062 | \n",
" ... | \n",
" 27.395062 | \n",
" 47.617284 | \n",
" 1.610613e+09 | \n",
" 106.753086 | \n",
" 0.445346 | \n",
" 0.757247 | \n",
" 0.343272 | \n",
" 22.086420 | \n",
" 41.160494 | \n",
" 0.802469 | \n",
"
\n",
" \n",
"
\n",
"
29 rows × 25 columns
\n",
"
"
],
"text/plain": [
" LEAGUE_ID TEAM_ID MIN_YEAR MAX_YEAR \\\n",
"ARENA \n",
"AT&T Center 0.0 1.610613e+09 1976.000000 2019.0 \n",
"American Airlines Center 0.0 1.610613e+09 1980.000000 2019.0 \n",
"AmericanAirlines Arena 0.0 1.610613e+09 1988.000000 2019.0 \n",
"Amway Center 0.0 1.610613e+09 1989.000000 2019.0 \n",
"Bankers Life Fieldhouse 0.0 1.610613e+09 1976.000000 2019.0 \n",
"Barclays Center 0.0 1.610613e+09 1976.000000 2019.0 \n",
"Capital One Arena 0.0 1.610613e+09 1961.000000 2019.0 \n",
"Chase Center 0.0 1.610613e+09 1946.000000 2019.0 \n",
"Chesapeake Energy Arena 0.0 1.610613e+09 1967.000000 2019.0 \n",
"FedExForum 0.0 1.610613e+09 1995.000000 2019.0 \n",
"Fiserv Forum 0.0 1.610613e+09 1968.000000 2019.0 \n",
"Golden 1 Center 0.0 1.610613e+09 1948.000000 2019.0 \n",
"Little Caesars Arena 0.0 1.610613e+09 1948.000000 2019.0 \n",
"Madison Square Garden 0.0 1.610613e+09 1946.000000 2019.0 \n",
"Moda Center 0.0 1.610613e+09 1970.000000 2019.0 \n",
"Pepsi Center 0.0 1.610613e+09 1976.000000 2019.0 \n",
"Quicken Loans Arena 0.0 1.610613e+09 1970.000000 2019.0 \n",
"Scotiabank Arena 0.0 1.610613e+09 1995.000000 2019.0 \n",
"Smoothie King Center 0.0 1.610613e+09 2002.000000 2019.0 \n",
"Spectrum Center 0.0 1.610613e+09 1988.000000 2019.0 \n",
"Staples Center 0.0 1.610613e+09 1959.210191 2019.0 \n",
"State Farm Arena 0.0 1.610613e+09 1949.000000 2019.0 \n",
"TD Garden 0.0 1.610613e+09 1946.000000 2019.0 \n",
"Talking Stick Resort Arena 0.0 1.610613e+09 1968.000000 2019.0 \n",
"Target Center 0.0 1.610613e+09 1989.000000 2019.0 \n",
"Toyota Center 0.0 1.610613e+09 1967.000000 2019.0 \n",
"United Center 0.0 1.610613e+09 1966.000000 2019.0 \n",
"Vivint Smart Home Arena 0.0 1.610613e+09 1974.000000 2019.0 \n",
"Wells Fargo Center 0.0 1.610613e+09 1949.000000 2019.0 \n",
"\n",
" YEARFOUNDED ARENACAPACITY GAME_ID \\\n",
"ARENA \n",
"AT&T Center 1976.000000 18694.0 2.184021e+07 \n",
"American Airlines Center 1980.000000 19200.0 2.130985e+07 \n",
"AmericanAirlines Arena 1988.000000 19600.0 2.105314e+07 \n",
"Amway Center 1989.000000 0.0 2.171458e+07 \n",
"Bankers Life Fieldhouse 1976.000000 18345.0 2.197454e+07 \n",
"Barclays Center 1976.000000 NaN 2.197516e+07 \n",
"Capital One Arena 1961.000000 20647.0 2.130190e+07 \n",
"Chase Center 1946.000000 19596.0 2.350492e+07 \n",
"Chesapeake Energy Arena 1967.000000 19163.0 2.171802e+07 \n",
"FedExForum 1995.000000 18119.0 2.131630e+07 \n",
"Fiserv Forum 1968.000000 17500.0 2.328482e+07 \n",
"Golden 1 Center 1948.000000 17500.0 2.142553e+07 \n",
"Little Caesars Arena 1948.000000 21000.0 2.171695e+07 \n",
"Madison Square Garden 1946.000000 19763.0 2.105316e+07 \n",
"Moda Center 1970.000000 19980.0 2.316313e+07 \n",
"Pepsi Center 1976.000000 19099.0 2.369186e+07 \n",
"Quicken Loans Arena 1970.000000 20562.0 2.132390e+07 \n",
"Scotiabank Arena 1995.000000 19800.0 2.428270e+07 \n",
"Smoothie King Center 2002.000000 NaN 2.156903e+07 \n",
"Spectrum Center 1988.000000 19026.0 2.104182e+07 \n",
"Staples Center 1959.210191 19060.0 2.139604e+07 \n",
"State Farm Arena 1949.000000 18729.0 2.131766e+07 \n",
"TD Garden 1946.000000 18624.0 2.235308e+07 \n",
"Talking Stick Resort Arena 1968.000000 NaN 2.132518e+07 \n",
"Target Center 1989.000000 19356.0 2.156304e+07 \n",
"Toyota Center 1967.000000 18104.0 2.283925e+07 \n",
"United Center 1966.000000 21711.0 2.107486e+07 \n",
"Vivint Smart Home Arena 1974.000000 20148.0 2.197424e+07 \n",
"Wells Fargo Center 1949.000000 NaN 2.282764e+07 \n",
"\n",
" HOME_TEAM_ID VISITOR_TEAM_ID SEASON ... \\\n",
"ARENA ... \n",
"AT&T Center 1.610613e+09 1.610613e+09 2018.397436 ... \n",
"American Airlines Center 1.610613e+09 1.610613e+09 2018.426667 ... \n",
"AmericanAirlines Arena 1.610613e+09 1.610613e+09 2018.421053 ... \n",
"Amway Center 1.610613e+09 1.610613e+09 2018.423077 ... \n",
"Bankers Life Fieldhouse 1.610613e+09 1.610613e+09 2018.441558 ... \n",
"Barclays Center 1.610613e+09 1.610613e+09 2018.413333 ... \n",
"Capital One Arena 1.610613e+09 1.610613e+09 2018.418919 ... \n",
"Chase Center 1.610613e+09 1.610613e+09 2018.377778 ... \n",
"Chesapeake Energy Arena 1.610613e+09 1.610613e+09 2018.425000 ... \n",
"FedExForum 1.610613e+09 1.610613e+09 2018.421053 ... \n",
"Fiserv Forum 1.610613e+09 1.610613e+09 2018.385542 ... \n",
"Golden 1 Center 1.610613e+09 1.610613e+09 2018.416667 ... \n",
"Little Caesars Arena 1.610613e+09 1.610613e+09 2018.430380 ... \n",
"Madison Square Garden 1.610613e+09 1.610613e+09 2018.421053 ... \n",
"Moda Center 1.610613e+09 1.610613e+09 2018.373494 ... \n",
"Pepsi Center 1.610613e+09 1.610613e+09 2018.395062 ... \n",
"Quicken Loans Arena 1.610613e+09 1.610613e+09 2018.428571 ... \n",
"Scotiabank Arena 1.610613e+09 1.610613e+09 2018.377778 ... \n",
"Smoothie King Center 1.610613e+09 1.610613e+09 2018.424658 ... \n",
"Spectrum Center 1.610613e+09 1.610613e+09 2018.413333 ... \n",
"Staples Center 1.610613e+09 1.610613e+09 2018.414013 ... \n",
"State Farm Arena 1.610613e+09 1.610613e+09 2018.434211 ... \n",
"TD Garden 1.610613e+09 1.610613e+09 2018.397436 ... \n",
"Talking Stick Resort Arena 1.610613e+09 1.610613e+09 2018.441558 ... \n",
"Target Center 1.610613e+09 1.610613e+09 2018.402778 ... \n",
"Toyota Center 1.610613e+09 1.610613e+09 2018.387500 ... \n",
"United Center 1.610613e+09 1.610613e+09 2018.435897 ... \n",
"Vivint Smart Home Arena 1.610613e+09 1.610613e+09 2018.421053 ... \n",
"Wells Fargo Center 1.610613e+09 1.610613e+09 2018.395062 ... \n",
"\n",
" AST_home REB_home TEAM_ID_away PTS_away \\\n",
"ARENA \n",
"AT&T Center 25.141026 45.782051 1.610613e+09 108.448718 \n",
"American Airlines Center 23.920000 46.093333 1.610613e+09 108.960000 \n",
"AmericanAirlines Arena 25.039474 46.276316 1.610613e+09 106.618421 \n",
"Amway Center 24.525641 45.935897 1.610613e+09 105.538462 \n",
"Bankers Life Fieldhouse 26.363636 44.597403 1.610613e+09 103.051948 \n",
"Barclays Center 23.880000 48.000000 1.610613e+09 110.293333 \n",
"Capital One Arena 25.891892 43.932432 1.610613e+09 115.310811 \n",
"Chase Center 28.444444 44.722222 1.610613e+09 112.833333 \n",
"Chesapeake Energy Arena 23.450000 45.387500 1.610613e+09 109.250000 \n",
"FedExForum 25.394737 44.578947 1.610613e+09 108.236842 \n",
"Fiserv Forum 26.819277 51.590361 1.610613e+09 106.421687 \n",
"Golden 1 Center 24.125000 45.208333 1.610613e+09 112.361111 \n",
"Little Caesars Arena 23.974684 44.670886 1.610613e+09 108.797468 \n",
"Madison Square Garden 21.500000 45.447368 1.610613e+09 112.105263 \n",
"Moda Center 22.361446 47.831325 1.610613e+09 112.060241 \n",
"Pepsi Center 27.419753 46.419753 1.610613e+09 103.777778 \n",
"Quicken Loans Arena 22.038961 44.025974 1.610613e+09 113.584416 \n",
"Scotiabank Arena 25.433333 45.100000 1.610613e+09 106.144444 \n",
"Smoothie King Center 27.232877 46.945205 1.610613e+09 116.808219 \n",
"Spectrum Center 23.746667 44.293333 1.610613e+09 109.386667 \n",
"Staples Center 25.942675 47.286624 1.610613e+09 110.477707 \n",
"State Farm Arena 25.736842 45.960526 1.610613e+09 117.894737 \n",
"TD Garden 25.717949 45.974359 1.610613e+09 106.358974 \n",
"Talking Stick Resort Arena 25.298701 41.519481 1.610613e+09 113.064935 \n",
"Target Center 24.458333 46.583333 1.610613e+09 113.208333 \n",
"Toyota Center 21.712500 44.850000 1.610613e+09 110.275000 \n",
"United Center 22.987179 43.512821 1.610613e+09 110.705128 \n",
"Vivint Smart Home Arena 24.526316 47.592105 1.610613e+09 105.723684 \n",
"Wells Fargo Center 27.395062 47.617284 1.610613e+09 106.753086 \n",
"\n",
" FG_PCT_away FT_PCT_away FG3_PCT_away AST_away \\\n",
"ARENA \n",
"AT&T Center 0.449333 0.770808 0.348231 23.987179 \n",
"American Airlines Center 0.456093 0.777320 0.347707 22.960000 \n",
"AmericanAirlines Arena 0.442684 0.771987 0.344250 23.157895 \n",
"Amway Center 0.454372 0.772487 0.359962 22.987179 \n",
"Bankers Life Fieldhouse 0.435494 0.761805 0.329442 23.272727 \n",
"Barclays Center 0.448133 0.773533 0.338547 22.640000 \n",
"Capital One Arena 0.472041 0.765959 0.353324 24.554054 \n",
"Chase Center 0.461144 0.782122 0.374233 25.377778 \n",
"Chesapeake Energy Arena 0.455088 0.772437 0.351600 23.175000 \n",
"FedExForum 0.445092 0.786355 0.362816 23.473684 \n",
"Fiserv Forum 0.417289 0.758819 0.347325 23.253012 \n",
"Golden 1 Center 0.459417 0.772444 0.349514 23.805556 \n",
"Little Caesars Arena 0.469127 0.767203 0.348127 24.151899 \n",
"Madison Square Garden 0.463842 0.756461 0.365237 24.263158 \n",
"Moda Center 0.448145 0.793880 0.371578 22.939759 \n",
"Pepsi Center 0.448531 0.742173 0.323988 23.320988 \n",
"Quicken Loans Arena 0.489766 0.776896 0.384247 25.584416 \n",
"Scotiabank Arena 0.432111 0.770211 0.347544 24.155556 \n",
"Smoothie King Center 0.474096 0.771562 0.362822 25.109589 \n",
"Spectrum Center 0.463920 0.761573 0.350853 25.066667 \n",
"Staples Center 0.443783 0.775064 0.338796 24.095541 \n",
"State Farm Arena 0.463447 0.767184 0.343553 25.500000 \n",
"TD Garden 0.442769 0.743064 0.335808 23.282051 \n",
"Talking Stick Resort Arena 0.474922 0.762377 0.363143 24.285714 \n",
"Target Center 0.462306 0.763181 0.363625 24.402778 \n",
"Toyota Center 0.454438 0.759188 0.339250 24.200000 \n",
"United Center 0.475372 0.750628 0.355808 25.602564 \n",
"Vivint Smart Home Arena 0.443618 0.767803 0.351961 20.539474 \n",
"Wells Fargo Center 0.445346 0.757247 0.343272 22.086420 \n",
"\n",
" REB_away HOME_TEAM_WINS \n",
"ARENA \n",
"AT&T Center 43.833333 0.666667 \n",
"American Airlines Center 43.720000 0.546667 \n",
"AmericanAirlines Arena 42.131579 0.644737 \n",
"Amway Center 44.769231 0.538462 \n",
"Bankers Life Fieldhouse 44.662338 0.675325 \n",
"Barclays Center 45.000000 0.533333 \n",
"Capital One Arena 46.121622 0.500000 \n",
"Chase Center 43.900000 0.511111 \n",
"Chesapeake Energy Arena 44.700000 0.637500 \n",
"FedExForum 44.986842 0.526316 \n",
"Fiserv Forum 44.385542 0.843373 \n",
"Golden 1 Center 46.708333 0.527778 \n",
"Little Caesars Arena 43.417722 0.493671 \n",
"Madison Square Garden 45.263158 0.250000 \n",
"Moda Center 44.048193 0.662651 \n",
"Pepsi Center 42.172840 0.790123 \n",
"Quicken Loans Arena 42.597403 0.298701 \n",
"Scotiabank Arena 45.300000 0.744444 \n",
"Smoothie King Center 44.561644 0.438356 \n",
"Spectrum Center 45.266667 0.493333 \n",
"Staples Center 45.133758 0.636943 \n",
"State Farm Arena 46.276316 0.421053 \n",
"TD Garden 44.153846 0.717949 \n",
"Talking Stick Resort Arena 45.000000 0.311688 \n",
"Target Center 46.500000 0.444444 \n",
"Toyota Center 45.700000 0.737500 \n",
"United Center 46.192308 0.307692 \n",
"Vivint Smart Home Arena 42.276316 0.684211 \n",
"Wells Fargo Center 41.160494 0.802469 \n",
"\n",
"[29 rows x 25 columns]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merge1_df.groupby('ARENA').mean()"
]
},
{
"cell_type": "markdown",
"id": "742c3e08",
"metadata": {},
"source": [
"## How to combine the same data for different outcomes\n",
"\n",
"We can combine datasets in different ways to learn different things about the data.\n",
"Let's now read info about the players."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "20436050",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" PLAYER_NAME | \n",
" TEAM_ID | \n",
" PLAYER_ID | \n",
" SEASON | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 626 | \n",
" Kawhi Leonard | \n",
" 1610612761 | \n",
" 202695 | \n",
" 2018 | \n",
"
\n",
" \n",
" 1 | \n",
" 627 | \n",
" Pascal Siakam | \n",
" 1610612761 | \n",
" 1627783 | \n",
" 2018 | \n",
"
\n",
" \n",
" 2 | \n",
" 628 | \n",
" Marc Gasol | \n",
" 1610612761 | \n",
" 201188 | \n",
" 2018 | \n",
"
\n",
" \n",
" 3 | \n",
" 629 | \n",
" Danny Green | \n",
" 1610612761 | \n",
" 201980 | \n",
" 2018 | \n",
"
\n",
" \n",
" 4 | \n",
" 630 | \n",
" Kyle Lowry | \n",
" 1610612761 | \n",
" 200768 | \n",
" 2018 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Unnamed: 0 PLAYER_NAME TEAM_ID PLAYER_ID SEASON\n",
"0 626 Kawhi Leonard 1610612761 202695 2018\n",
"1 627 Pascal Siakam 1610612761 1627783 2018\n",
"2 628 Marc Gasol 1610612761 201188 2018\n",
"3 629 Danny Green 1610612761 201980 2018\n",
"4 630 Kyle Lowry 1610612761 200768 2018"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"players18 = pd.read_csv('https://raw.githubusercontent.com/rhodyprog4ds/inclass-data/main/2018-players.csv')\n",
"players19 = pd.read_csv('https://raw.githubusercontent.com/rhodyprog4ds/inclass-data/main/2019-players.csv')\n",
"players18.head()"
]
},
{
"cell_type": "markdown",
"id": "f1cd8f53",
"metadata": {},
"source": [
"First let's look at the shape of each of them to have a reference for what happens when we try different merges."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "aef18b85",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"((748, 5), (626, 5))"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"players18.shape, players19.shape"
]
},
{
"cell_type": "markdown",
"id": "9cbed9bc",
"metadata": {},
"source": [
"One thing we might want to do is to put all of the information into one long DataFrame. We can do this with `concat`."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "01e95e00",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" PLAYER_NAME | \n",
" TEAM_ID | \n",
" PLAYER_ID | \n",
" SEASON | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 626 | \n",
" Kawhi Leonard | \n",
" 1610612761 | \n",
" 202695 | \n",
" 2018 | \n",
"
\n",
" \n",
" 1 | \n",
" 627 | \n",
" Pascal Siakam | \n",
" 1610612761 | \n",
" 1627783 | \n",
" 2018 | \n",
"
\n",
" \n",
" 2 | \n",
" 628 | \n",
" Marc Gasol | \n",
" 1610612761 | \n",
" 201188 | \n",
" 2018 | \n",
"
\n",
" \n",
" 3 | \n",
" 629 | \n",
" Danny Green | \n",
" 1610612761 | \n",
" 201980 | \n",
" 2018 | \n",
"
\n",
" \n",
" 4 | \n",
" 630 | \n",
" Kyle Lowry | \n",
" 1610612761 | \n",
" 200768 | \n",
" 2018 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 621 | \n",
" 621 | \n",
" Anthony Bennett | \n",
" 1610612745 | \n",
" 203461 | \n",
" 2019 | \n",
"
\n",
" \n",
" 622 | \n",
" 622 | \n",
" Ray Spalding | \n",
" 1610612737 | \n",
" 1629034 | \n",
" 2019 | \n",
"
\n",
" \n",
" 623 | \n",
" 623 | \n",
" Devyn Marble | \n",
" 1610612744 | \n",
" 203906 | \n",
" 2019 | \n",
"
\n",
" \n",
" 624 | \n",
" 624 | \n",
" Hassani Gravett | \n",
" 1610612753 | \n",
" 1629755 | \n",
" 2019 | \n",
"
\n",
" \n",
" 625 | \n",
" 625 | \n",
" JaKeenan Gant | \n",
" 1610612754 | \n",
" 1629721 | \n",
" 2019 | \n",
"
\n",
" \n",
"
\n",
"
1374 rows × 5 columns
\n",
"
"
],
"text/plain": [
" Unnamed: 0 PLAYER_NAME TEAM_ID PLAYER_ID SEASON\n",
"0 626 Kawhi Leonard 1610612761 202695 2018\n",
"1 627 Pascal Siakam 1610612761 1627783 2018\n",
"2 628 Marc Gasol 1610612761 201188 2018\n",
"3 629 Danny Green 1610612761 201980 2018\n",
"4 630 Kyle Lowry 1610612761 200768 2018\n",
".. ... ... ... ... ...\n",
"621 621 Anthony Bennett 1610612745 203461 2019\n",
"622 622 Ray Spalding 1610612737 1629034 2019\n",
"623 623 Devyn Marble 1610612744 203906 2019\n",
"624 624 Hassani Gravett 1610612753 1629755 2019\n",
"625 625 JaKeenan Gant 1610612754 1629721 2019\n",
"\n",
"[1374 rows x 5 columns]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.concat([players18,players19])"
]
},
{
"cell_type": "markdown",
"id": "7357d24d",
"metadata": {},
"source": [
"This allows us to see all the players for each year, we could do groupby and count to see how many players played each year for example.\n",
"\n",
"We can check that this is the size we expected."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "9c658050",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1374, 5)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.concat([players18,players19]).shape"
]
},
{
"cell_type": "markdown",
"id": "bf57de1e",
"metadata": {},
"source": [
"If we use the default merge settings we get an empty result because the two DataFrames have the same columns so pandas tries to merge `on` all of the columns, but there are no rows that have the same value in all of the columns, so there's nothing left."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "05ea2718",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" PLAYER_NAME | \n",
" TEAM_ID | \n",
" PLAYER_ID | \n",
" SEASON | \n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [Unnamed: 0, PLAYER_NAME, TEAM_ID, PLAYER_ID, SEASON]\n",
"Index: []"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(players18,players19,)"
]
},
{
"cell_type": "markdown",
"id": "3b591650",
"metadata": {},
"source": [
"If we merge with on='PLAYER_ID', it only requires that one column to be the same to match rows from the two DataFrames together. With the default value for `how` or explicitly setting `how='inner'` we get the info of players who played both seasons."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "264de540",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0_x | \n",
" PLAYER_NAME_x | \n",
" TEAM_ID_x | \n",
" PLAYER_ID | \n",
" SEASON_x | \n",
" Unnamed: 0_y | \n",
" PLAYER_NAME_y | \n",
" TEAM_ID_y | \n",
" SEASON_y | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 626 | \n",
" Kawhi Leonard | \n",
" 1610612761 | \n",
" 202695 | \n",
" 2018 | \n",
" 299 | \n",
" Kawhi Leonard | \n",
" 1610612746 | \n",
" 2019 | \n",
"
\n",
" \n",
" 1 | \n",
" 627 | \n",
" Pascal Siakam | \n",
" 1610612761 | \n",
" 1627783 | \n",
" 2018 | \n",
" 275 | \n",
" Pascal Siakam | \n",
" 1610612761 | \n",
" 2019 | \n",
"
\n",
" \n",
" 2 | \n",
" 628 | \n",
" Marc Gasol | \n",
" 1610612761 | \n",
" 201188 | \n",
" 2018 | \n",
" 276 | \n",
" Marc Gasol | \n",
" 1610612761 | \n",
" 2019 | \n",
"
\n",
" \n",
" 3 | \n",
" 1157 | \n",
" Marc Gasol | \n",
" 1610612763 | \n",
" 201188 | \n",
" 2018 | \n",
" 276 | \n",
" Marc Gasol | \n",
" 1610612761 | \n",
" 2019 | \n",
"
\n",
" \n",
" 4 | \n",
" 629 | \n",
" Danny Green | \n",
" 1610612761 | \n",
" 201980 | \n",
" 2018 | \n",
" 202 | \n",
" Danny Green | \n",
" 1610612747 | \n",
" 2019 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 533 | \n",
" 1339 | \n",
" Abdul Gaddy | \n",
" 1610612760 | \n",
" 203583 | \n",
" 2018 | \n",
" 561 | \n",
" Abdul Gaddy | \n",
" 1610612760 | \n",
" 2019 | \n",
"
\n",
" \n",
" 534 | \n",
" 1342 | \n",
" Andre Roberson | \n",
" 1610612760 | \n",
" 203460 | \n",
" 2018 | \n",
" 566 | \n",
" Andre Roberson | \n",
" 1610612760 | \n",
" 2019 | \n",
"
\n",
" \n",
" 535 | \n",
" 1348 | \n",
" Norvel Pelle | \n",
" 1610612755 | \n",
" 203658 | \n",
" 2018 | \n",
" 24 | \n",
" Norvel Pelle | \n",
" 1610612755 | \n",
" 2019 | \n",
"
\n",
" \n",
" 536 | \n",
" 1350 | \n",
" Denzel Valentine | \n",
" 1610612741 | \n",
" 1627756 | \n",
" 2018 | \n",
" 58 | \n",
" Denzel Valentine | \n",
" 1610612741 | \n",
" 2019 | \n",
"
\n",
" \n",
" 537 | \n",
" 1353 | \n",
" C.J. Wilcox | \n",
" 1610612754 | \n",
" 203912 | \n",
" 2018 | \n",
" 573 | \n",
" C.J. Wilcox | \n",
" 1610612754 | \n",
" 2019 | \n",
"
\n",
" \n",
"
\n",
"
538 rows × 9 columns
\n",
"
"
],
"text/plain": [
" Unnamed: 0_x PLAYER_NAME_x TEAM_ID_x PLAYER_ID SEASON_x \\\n",
"0 626 Kawhi Leonard 1610612761 202695 2018 \n",
"1 627 Pascal Siakam 1610612761 1627783 2018 \n",
"2 628 Marc Gasol 1610612761 201188 2018 \n",
"3 1157 Marc Gasol 1610612763 201188 2018 \n",
"4 629 Danny Green 1610612761 201980 2018 \n",
".. ... ... ... ... ... \n",
"533 1339 Abdul Gaddy 1610612760 203583 2018 \n",
"534 1342 Andre Roberson 1610612760 203460 2018 \n",
"535 1348 Norvel Pelle 1610612755 203658 2018 \n",
"536 1350 Denzel Valentine 1610612741 1627756 2018 \n",
"537 1353 C.J. Wilcox 1610612754 203912 2018 \n",
"\n",
" Unnamed: 0_y PLAYER_NAME_y TEAM_ID_y SEASON_y \n",
"0 299 Kawhi Leonard 1610612746 2019 \n",
"1 275 Pascal Siakam 1610612761 2019 \n",
"2 276 Marc Gasol 1610612761 2019 \n",
"3 276 Marc Gasol 1610612761 2019 \n",
"4 202 Danny Green 1610612747 2019 \n",
".. ... ... ... ... \n",
"533 561 Abdul Gaddy 1610612760 2019 \n",
"534 566 Andre Roberson 1610612760 2019 \n",
"535 24 Norvel Pelle 1610612755 2019 \n",
"536 58 Denzel Valentine 1610612741 2019 \n",
"537 573 C.J. Wilcox 1610612754 2019 \n",
"\n",
"[538 rows x 9 columns]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(players18,players19,on='PLAYER_ID', how='inner')"
]
},
{
"cell_type": "markdown",
"id": "13c0b233",
"metadata": {},
"source": [
"When we use `outer` we get one row for each player who played in either season or both seasons. From this we can for example see who changed teams, who are the rookies in 2019 and who retired or was unsigned in 2019."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "8b3bd2b6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0_x | \n",
" PLAYER_NAME_x | \n",
" TEAM_ID_x | \n",
" PLAYER_ID | \n",
" SEASON_x | \n",
" Unnamed: 0_y | \n",
" PLAYER_NAME_y | \n",
" TEAM_ID_y | \n",
" SEASON_y | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 626.0 | \n",
" Kawhi Leonard | \n",
" 1.610613e+09 | \n",
" 202695 | \n",
" 2018.0 | \n",
" 299.0 | \n",
" Kawhi Leonard | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 627.0 | \n",
" Pascal Siakam | \n",
" 1.610613e+09 | \n",
" 1627783 | \n",
" 2018.0 | \n",
" 275.0 | \n",
" Pascal Siakam | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 628.0 | \n",
" Marc Gasol | \n",
" 1.610613e+09 | \n",
" 201188 | \n",
" 2018.0 | \n",
" 276.0 | \n",
" Marc Gasol | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 1157.0 | \n",
" Marc Gasol | \n",
" 1.610613e+09 | \n",
" 201188 | \n",
" 2018.0 | \n",
" 276.0 | \n",
" Marc Gasol | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 629.0 | \n",
" Danny Green | \n",
" 1.610613e+09 | \n",
" 201980 | \n",
" 2018.0 | \n",
" 202.0 | \n",
" Danny Green | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 922 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1629097 | \n",
" NaN | \n",
" 619.0 | \n",
" Terry Larrier | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" 923 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 203461 | \n",
" NaN | \n",
" 621.0 | \n",
" Anthony Bennett | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" 924 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 203906 | \n",
" NaN | \n",
" 623.0 | \n",
" Devyn Marble | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" 925 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1629755 | \n",
" NaN | \n",
" 624.0 | \n",
" Hassani Gravett | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" 926 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1629721 | \n",
" NaN | \n",
" 625.0 | \n",
" JaKeenan Gant | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
"
\n",
"
927 rows × 9 columns
\n",
"
"
],
"text/plain": [
" Unnamed: 0_x PLAYER_NAME_x TEAM_ID_x PLAYER_ID SEASON_x \\\n",
"0 626.0 Kawhi Leonard 1.610613e+09 202695 2018.0 \n",
"1 627.0 Pascal Siakam 1.610613e+09 1627783 2018.0 \n",
"2 628.0 Marc Gasol 1.610613e+09 201188 2018.0 \n",
"3 1157.0 Marc Gasol 1.610613e+09 201188 2018.0 \n",
"4 629.0 Danny Green 1.610613e+09 201980 2018.0 \n",
".. ... ... ... ... ... \n",
"922 NaN NaN NaN 1629097 NaN \n",
"923 NaN NaN NaN 203461 NaN \n",
"924 NaN NaN NaN 203906 NaN \n",
"925 NaN NaN NaN 1629755 NaN \n",
"926 NaN NaN NaN 1629721 NaN \n",
"\n",
" Unnamed: 0_y PLAYER_NAME_y TEAM_ID_y SEASON_y \n",
"0 299.0 Kawhi Leonard 1.610613e+09 2019.0 \n",
"1 275.0 Pascal Siakam 1.610613e+09 2019.0 \n",
"2 276.0 Marc Gasol 1.610613e+09 2019.0 \n",
"3 276.0 Marc Gasol 1.610613e+09 2019.0 \n",
"4 202.0 Danny Green 1.610613e+09 2019.0 \n",
".. ... ... ... ... \n",
"922 619.0 Terry Larrier 1.610613e+09 2019.0 \n",
"923 621.0 Anthony Bennett 1.610613e+09 2019.0 \n",
"924 623.0 Devyn Marble 1.610613e+09 2019.0 \n",
"925 624.0 Hassani Gravett 1.610613e+09 2019.0 \n",
"926 625.0 JaKeenan Gant 1.610613e+09 2019.0 \n",
"\n",
"[927 rows x 9 columns]"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(players18,players19,on='PLAYER_ID', how='outer')"
]
},
{
"cell_type": "markdown",
"id": "4ef712ed",
"metadata": {},
"source": [
"Using `left` gives us the `'PLAYER_ID'` that are in the left(`players18`) DataFrame, including those that are in both DataFrame\n",
"`right` would give players in the `players19` DataFrame or both DataFrames. With this result, we can see who retired, but not the 2019 rookies."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "9e47539c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0_x | \n",
" PLAYER_NAME_x | \n",
" TEAM_ID_x | \n",
" PLAYER_ID | \n",
" SEASON_x | \n",
" Unnamed: 0_y | \n",
" PLAYER_NAME_y | \n",
" TEAM_ID_y | \n",
" SEASON_y | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 626 | \n",
" Kawhi Leonard | \n",
" 1610612761 | \n",
" 202695 | \n",
" 2018 | \n",
" 299.0 | \n",
" Kawhi Leonard | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 627 | \n",
" Pascal Siakam | \n",
" 1610612761 | \n",
" 1627783 | \n",
" 2018 | \n",
" 275.0 | \n",
" Pascal Siakam | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 628 | \n",
" Marc Gasol | \n",
" 1610612761 | \n",
" 201188 | \n",
" 2018 | \n",
" 276.0 | \n",
" Marc Gasol | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 629 | \n",
" Danny Green | \n",
" 1610612761 | \n",
" 201980 | \n",
" 2018 | \n",
" 202.0 | \n",
" Danny Green | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 630 | \n",
" Kyle Lowry | \n",
" 1610612761 | \n",
" 200768 | \n",
" 2018 | \n",
" 451.0 | \n",
" Kyle Lowry | \n",
" 1.610613e+09 | \n",
" 2019.0 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 749 | \n",
" 1369 | \n",
" Tyrius Walker | \n",
" 1610612752 | \n",
" 1629246 | \n",
" 2018 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 750 | \n",
" 1370 | \n",
" Marcus Lee | \n",
" 1610612748 | \n",
" 1629159 | \n",
" 2018 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 751 | \n",
" 1371 | \n",
" Trey Lewis | \n",
" 1610612762 | \n",
" 1629163 | \n",
" 2018 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 752 | \n",
" 1372 | \n",
" Emanuel Terry | \n",
" 1610612743 | \n",
" 1629150 | \n",
" 2018 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 753 | \n",
" 1373 | \n",
" Justin Bibbs | \n",
" 1610612738 | \n",
" 1629167 | \n",
" 2018 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
754 rows × 9 columns
\n",
"
"
],
"text/plain": [
" Unnamed: 0_x PLAYER_NAME_x TEAM_ID_x PLAYER_ID SEASON_x \\\n",
"0 626 Kawhi Leonard 1610612761 202695 2018 \n",
"1 627 Pascal Siakam 1610612761 1627783 2018 \n",
"2 628 Marc Gasol 1610612761 201188 2018 \n",
"3 629 Danny Green 1610612761 201980 2018 \n",
"4 630 Kyle Lowry 1610612761 200768 2018 \n",
".. ... ... ... ... ... \n",
"749 1369 Tyrius Walker 1610612752 1629246 2018 \n",
"750 1370 Marcus Lee 1610612748 1629159 2018 \n",
"751 1371 Trey Lewis 1610612762 1629163 2018 \n",
"752 1372 Emanuel Terry 1610612743 1629150 2018 \n",
"753 1373 Justin Bibbs 1610612738 1629167 2018 \n",
"\n",
" Unnamed: 0_y PLAYER_NAME_y TEAM_ID_y SEASON_y \n",
"0 299.0 Kawhi Leonard 1.610613e+09 2019.0 \n",
"1 275.0 Pascal Siakam 1.610613e+09 2019.0 \n",
"2 276.0 Marc Gasol 1.610613e+09 2019.0 \n",
"3 202.0 Danny Green 1.610613e+09 2019.0 \n",
"4 451.0 Kyle Lowry 1.610613e+09 2019.0 \n",
".. ... ... ... ... \n",
"749 NaN NaN NaN NaN \n",
"750 NaN NaN NaN NaN \n",
"751 NaN NaN NaN NaN \n",
"752 NaN NaN NaN NaN \n",
"753 NaN NaN NaN NaN \n",
"\n",
"[754 rows x 9 columns]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.merge(players18,players19,on='PLAYER_ID', how='left')"
]
},
{
"cell_type": "markdown",
"id": "0fa8fb6e",
"metadata": {},
"source": [
"## Try it yourself\n",
"\n",
"Try different merges and inspect them:\n",
"- how many rows & columns?\n",
"- Where are NaN values inserted?\n",
"- What rows from the original datasets are not included?\n",
"- describe each type of merge in your own words\n",
"\n",
"\n",
"Split a DataFrame into separate data frames by subsetting the columns and indexing the rows with `loc`, then use concat to put it back together. Programmatically check that it's back together correctly."
]
}
],
"metadata": {
"jupytext": {
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.12,
"jupytext_version": "1.6.0"
}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
},
"source_map": [
12,
16,
18,
28,
31,
37,
41,
43,
45,
49,
51,
53,
56,
59,
61,
63,
66,
71,
74,
76,
79,
82,
86,
90,
94,
96,
98,
100,
107,
111,
115,
117,
121,
123,
128,
130,
133,
135,
140,
142,
148,
150,
156,
158
]
},
"nbformat": 4,
"nbformat_minor": 5
}