{
"cells": [
{
"cell_type": "markdown",
"id": "3fc4b3f4",
"metadata": {},
"source": [
"# Class 4: Pandas\n",
"\n",
"Today we will:"
]
},
{
"cell_type": "markdown",
"id": "702de130",
"metadata": {},
"source": [
"## Remember, Programming is a Practice\n",
"\n",
"- if you're curious about something try it\n",
"- you don't need me to give you answers about how code works, the interpreter will tell you\n",
"- if you don't remember details, remember you can get help from Jupyter\n",
"\n",
"with a `?` after the function name withouth `()`"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "88dc5169",
"metadata": {},
"outputs": [],
"source": [
"print?"
]
},
{
"cell_type": "markdown",
"id": "4ac15c5a",
"metadata": {},
"source": [
"or using the `tab` key inside the `()` for a function"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7f89ee8f",
"metadata": {
"tags": [
"raises-exception",
"remove-output"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"print()"
]
},
{
"cell_type": "markdown",
"id": "4592c964",
"metadata": {},
"source": [
"or from the core python, with the `help` fucntion"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a60bcfb3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on built-in function print in module builtins:\n",
"\n",
"print(...)\n",
" print(value, ..., sep=' ', end='\\n', file=sys.stdout, flush=False)\n",
" \n",
" Prints the values to a stream, or to sys.stdout by default.\n",
" Optional keyword arguments:\n",
" file: a file-like object (stream); defaults to the current sys.stdout.\n",
" sep: string inserted between values, default a space.\n",
" end: string appended after the last value, default a newline.\n",
" flush: whether to forcibly flush the stream.\n",
"\n"
]
}
],
"source": [
"help(print)"
]
},
{
"cell_type": "markdown",
"id": "5563bc1a",
"metadata": {},
"source": [
"## Data in Pandas\n",
"\n",
"We can import `pandas` again as before"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "fe4708b7",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "39851ae7",
"metadata": {},
"source": [
"and we can read in data."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "adf298bf",
"metadata": {
"tags": [
"output_scroll"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" key_ID | \n",
" village | \n",
" interview_date | \n",
" no_membrs | \n",
" years_liv | \n",
" respondent_wall_type | \n",
" rooms | \n",
" memb_assoc | \n",
" affect_conflicts | \n",
" liv_count | \n",
" items_owned | \n",
" no_meals | \n",
" months_lack_food | \n",
" instanceID | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 3 | \n",
" 4 | \n",
" muddaub | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" 1 | \n",
" bicycle;television;solar_panel;table | \n",
" 2 | \n",
" Jan | \n",
" uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 7 | \n",
" 9 | \n",
" muddaub | \n",
" 1 | \n",
" yes | \n",
" once | \n",
" 3 | \n",
" cow_cart;bicycle;radio;cow_plough;solar_panel;... | \n",
" 2 | \n",
" Jan;Sept;Oct;Nov;Dec | \n",
" uuid:099de9c9-3e5e-427b-8452-26250e840d6e | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 10 | \n",
" 15 | \n",
" burntbricks | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" 1 | \n",
" solar_torch | \n",
" 2 | \n",
" Jan;Feb;Mar;Oct;Nov;Dec | \n",
" uuid:193d7daf-9582-409b-bf09-027dd36f9007 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 7 | \n",
" 6 | \n",
" burntbricks | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" 2 | \n",
" bicycle;radio;cow_plough;solar_panel;mobile_phone | \n",
" 2 | \n",
" Sept;Oct;Nov;Dec | \n",
" uuid:148d1105-778a-4755-aa71-281eadd4a973 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 7 | \n",
" 40 | \n",
" burntbricks | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" 4 | \n",
" motorcyle;radio;cow_plough;mobile_phone | \n",
" 2 | \n",
" Aug;Sept;Oct;Nov | \n",
" uuid:2c867811-9696-4966-9866-f35c3e97d02d | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 126 | \n",
" 126 | \n",
" Ruaca | \n",
" 2017-05-18T00:00:00Z | \n",
" 3 | \n",
" 7 | \n",
" burntbricks | \n",
" 1 | \n",
" no | \n",
" more_once | \n",
" 3 | \n",
" motorcyle;radio;solar_panel | \n",
" 3 | \n",
" Oct;Nov;Dec | \n",
" uuid:69caea81-a4e5-4e8d-83cd-9c18d8e8d965 | \n",
"
\n",
" \n",
" 127 | \n",
" 193 | \n",
" Ruaca | \n",
" 2017-06-04T00:00:00Z | \n",
" 7 | \n",
" 10 | \n",
" cement | \n",
" 3 | \n",
" no | \n",
" more_once | \n",
" 3 | \n",
" car;lorry;television;radio;sterio;cow_plough;s... | \n",
" 3 | \n",
" none | \n",
" uuid:5ccc2e5a-ea90-48b5-8542-69400d5334df | \n",
"
\n",
" \n",
" 128 | \n",
" 194 | \n",
" Ruaca | \n",
" 2017-06-04T00:00:00Z | \n",
" 4 | \n",
" 5 | \n",
" muddaub | \n",
" 1 | \n",
" no | \n",
" more_once | \n",
" 1 | \n",
" radio;solar_panel;solar_torch;mobile_phone | \n",
" 3 | \n",
" Sept;Oct;Nov | \n",
" uuid:95c11a30-d44f-40c4-8ea8-ec34fca6bbbf | \n",
"
\n",
" \n",
" 129 | \n",
" 199 | \n",
" Chirodzo | \n",
" 2017-06-04T00:00:00Z | \n",
" 7 | \n",
" 17 | \n",
" burntbricks | \n",
" 2 | \n",
" yes | \n",
" more_once | \n",
" 2 | \n",
" cow_cart;lorry;motorcyle;computer;television;r... | \n",
" 3 | \n",
" Nov;Dec | \n",
" uuid:ffc83162-ff24-4a87-8709-eff17abc0b3b | \n",
"
\n",
" \n",
" 130 | \n",
" 200 | \n",
" Chirodzo | \n",
" 2017-06-04T00:00:00Z | \n",
" 8 | \n",
" 20 | \n",
" burntbricks | \n",
" 2 | \n",
" NaN | \n",
" NaN | \n",
" 3 | \n",
" radio;cow_plough;solar_panel;solar_torch;table... | \n",
" 3 | \n",
" Oct;Nov | \n",
" uuid:aa77a0d7-7142-41c8-b494-483a5b68d8a7 | \n",
"
\n",
" \n",
"
\n",
"
131 rows × 14 columns
\n",
"
"
],
"text/plain": [
" key_ID village interview_date no_membrs years_liv \\\n",
"0 1 God 2016-11-17T00:00:00Z 3 4 \n",
"1 1 God 2016-11-17T00:00:00Z 7 9 \n",
"2 3 God 2016-11-17T00:00:00Z 10 15 \n",
"3 4 God 2016-11-17T00:00:00Z 7 6 \n",
"4 5 God 2016-11-17T00:00:00Z 7 40 \n",
".. ... ... ... ... ... \n",
"126 126 Ruaca 2017-05-18T00:00:00Z 3 7 \n",
"127 193 Ruaca 2017-06-04T00:00:00Z 7 10 \n",
"128 194 Ruaca 2017-06-04T00:00:00Z 4 5 \n",
"129 199 Chirodzo 2017-06-04T00:00:00Z 7 17 \n",
"130 200 Chirodzo 2017-06-04T00:00:00Z 8 20 \n",
"\n",
" respondent_wall_type rooms memb_assoc affect_conflicts liv_count \\\n",
"0 muddaub 1 NaN NaN 1 \n",
"1 muddaub 1 yes once 3 \n",
"2 burntbricks 1 NaN NaN 1 \n",
"3 burntbricks 1 NaN NaN 2 \n",
"4 burntbricks 1 NaN NaN 4 \n",
".. ... ... ... ... ... \n",
"126 burntbricks 1 no more_once 3 \n",
"127 cement 3 no more_once 3 \n",
"128 muddaub 1 no more_once 1 \n",
"129 burntbricks 2 yes more_once 2 \n",
"130 burntbricks 2 NaN NaN 3 \n",
"\n",
" items_owned no_meals \\\n",
"0 bicycle;television;solar_panel;table 2 \n",
"1 cow_cart;bicycle;radio;cow_plough;solar_panel;... 2 \n",
"2 solar_torch 2 \n",
"3 bicycle;radio;cow_plough;solar_panel;mobile_phone 2 \n",
"4 motorcyle;radio;cow_plough;mobile_phone 2 \n",
".. ... ... \n",
"126 motorcyle;radio;solar_panel 3 \n",
"127 car;lorry;television;radio;sterio;cow_plough;s... 3 \n",
"128 radio;solar_panel;solar_torch;mobile_phone 3 \n",
"129 cow_cart;lorry;motorcyle;computer;television;r... 3 \n",
"130 radio;cow_plough;solar_panel;solar_torch;table... 3 \n",
"\n",
" months_lack_food instanceID \n",
"0 Jan uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef \n",
"1 Jan;Sept;Oct;Nov;Dec uuid:099de9c9-3e5e-427b-8452-26250e840d6e \n",
"2 Jan;Feb;Mar;Oct;Nov;Dec uuid:193d7daf-9582-409b-bf09-027dd36f9007 \n",
"3 Sept;Oct;Nov;Dec uuid:148d1105-778a-4755-aa71-281eadd4a973 \n",
"4 Aug;Sept;Oct;Nov uuid:2c867811-9696-4966-9866-f35c3e97d02d \n",
".. ... ... \n",
"126 Oct;Nov;Dec uuid:69caea81-a4e5-4e8d-83cd-9c18d8e8d965 \n",
"127 none uuid:5ccc2e5a-ea90-48b5-8542-69400d5334df \n",
"128 Sept;Oct;Nov uuid:95c11a30-d44f-40c4-8ea8-ec34fca6bbbf \n",
"129 Nov;Dec uuid:ffc83162-ff24-4a87-8709-eff17abc0b3b \n",
"130 Oct;Nov uuid:aa77a0d7-7142-41c8-b494-483a5b68d8a7 \n",
"\n",
"[131 rows x 14 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.read_csv('https://raw.githubusercontent.com/brownsarahm/python-socialsci-files/master/data/SAFI_clean.csv')"
]
},
{
"cell_type": "markdown",
"id": "0e07ecb9",
"metadata": {},
"source": [
"to be able to use this, we need to save it to a variable."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "24782612",
"metadata": {},
"outputs": [],
"source": [
"safi_df = pd.read_csv('https://raw.githubusercontent.com/brownsarahm/python-socialsci-files/master/data/SAFI_clean.csv')"
]
},
{
"cell_type": "markdown",
"id": "fbcd8d62",
"metadata": {},
"source": [
"This is an excerpt from the [SAFI dataset](https://figshare.com/articles/SAFI_Survey_Results/6262019).\n",
"\n",
"Another important thing to do is to check datatypes, this is how we know what things we can do with a variable."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "51d6757a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(safi_df)"
]
},
{
"cell_type": "markdown",
"id": "40496bdf",
"metadata": {},
"source": [
"````{margin}\n",
"```{tip}\n",
"we can also see the size when we print the whole dataset as in the first time we read the data in.\n",
"```\n",
"````\n",
"An important thing to check is the size of the dataset."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "13216461",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(131, 14)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"safi_df.shape"
]
},
{
"cell_type": "markdown",
"id": "b47ce8bb",
"metadata": {},
"source": [
"Recall that you can also tab complete"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "949fdaee",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(131, 14)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"safi_df.shape"
]
},
{
"cell_type": "markdown",
"id": "8be26ec7",
"metadata": {},
"source": [
"To see the first 5 rows of the dataset, use the `head()` function"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "bf6f2d64",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" key_ID | \n",
" village | \n",
" interview_date | \n",
" no_membrs | \n",
" years_liv | \n",
" respondent_wall_type | \n",
" rooms | \n",
" memb_assoc | \n",
" affect_conflicts | \n",
" liv_count | \n",
" items_owned | \n",
" no_meals | \n",
" months_lack_food | \n",
" instanceID | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 3 | \n",
" 4 | \n",
" muddaub | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" 1 | \n",
" bicycle;television;solar_panel;table | \n",
" 2 | \n",
" Jan | \n",
" uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 7 | \n",
" 9 | \n",
" muddaub | \n",
" 1 | \n",
" yes | \n",
" once | \n",
" 3 | \n",
" cow_cart;bicycle;radio;cow_plough;solar_panel;... | \n",
" 2 | \n",
" Jan;Sept;Oct;Nov;Dec | \n",
" uuid:099de9c9-3e5e-427b-8452-26250e840d6e | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 10 | \n",
" 15 | \n",
" burntbricks | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" 1 | \n",
" solar_torch | \n",
" 2 | \n",
" Jan;Feb;Mar;Oct;Nov;Dec | \n",
" uuid:193d7daf-9582-409b-bf09-027dd36f9007 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 7 | \n",
" 6 | \n",
" burntbricks | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" 2 | \n",
" bicycle;radio;cow_plough;solar_panel;mobile_phone | \n",
" 2 | \n",
" Sept;Oct;Nov;Dec | \n",
" uuid:148d1105-778a-4755-aa71-281eadd4a973 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 7 | \n",
" 40 | \n",
" burntbricks | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" 4 | \n",
" motorcyle;radio;cow_plough;mobile_phone | \n",
" 2 | \n",
" Aug;Sept;Oct;Nov | \n",
" uuid:2c867811-9696-4966-9866-f35c3e97d02d | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" key_ID village interview_date no_membrs years_liv \\\n",
"0 1 God 2016-11-17T00:00:00Z 3 4 \n",
"1 1 God 2016-11-17T00:00:00Z 7 9 \n",
"2 3 God 2016-11-17T00:00:00Z 10 15 \n",
"3 4 God 2016-11-17T00:00:00Z 7 6 \n",
"4 5 God 2016-11-17T00:00:00Z 7 40 \n",
"\n",
" respondent_wall_type rooms memb_assoc affect_conflicts liv_count \\\n",
"0 muddaub 1 NaN NaN 1 \n",
"1 muddaub 1 yes once 3 \n",
"2 burntbricks 1 NaN NaN 1 \n",
"3 burntbricks 1 NaN NaN 2 \n",
"4 burntbricks 1 NaN NaN 4 \n",
"\n",
" items_owned no_meals \\\n",
"0 bicycle;television;solar_panel;table 2 \n",
"1 cow_cart;bicycle;radio;cow_plough;solar_panel;... 2 \n",
"2 solar_torch 2 \n",
"3 bicycle;radio;cow_plough;solar_panel;mobile_phone 2 \n",
"4 motorcyle;radio;cow_plough;mobile_phone 2 \n",
"\n",
" months_lack_food instanceID \n",
"0 Jan uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef \n",
"1 Jan;Sept;Oct;Nov;Dec uuid:099de9c9-3e5e-427b-8452-26250e840d6e \n",
"2 Jan;Feb;Mar;Oct;Nov;Dec uuid:193d7daf-9582-409b-bf09-027dd36f9007 \n",
"3 Sept;Oct;Nov;Dec uuid:148d1105-778a-4755-aa71-281eadd4a973 \n",
"4 Aug;Sept;Oct;Nov uuid:2c867811-9696-4966-9866-f35c3e97d02d "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"safi_df.head()"
]
},
{
"cell_type": "markdown",
"id": "b899d1b0",
"metadata": {},
"source": [
"We can call this function with a value to change how many rows are returned"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "ee79ae60",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" key_ID | \n",
" village | \n",
" interview_date | \n",
" no_membrs | \n",
" years_liv | \n",
" respondent_wall_type | \n",
" rooms | \n",
" memb_assoc | \n",
" affect_conflicts | \n",
" liv_count | \n",
" items_owned | \n",
" no_meals | \n",
" months_lack_food | \n",
" instanceID | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 3 | \n",
" 4 | \n",
" muddaub | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" 1 | \n",
" bicycle;television;solar_panel;table | \n",
" 2 | \n",
" Jan | \n",
" uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 7 | \n",
" 9 | \n",
" muddaub | \n",
" 1 | \n",
" yes | \n",
" once | \n",
" 3 | \n",
" cow_cart;bicycle;radio;cow_plough;solar_panel;... | \n",
" 2 | \n",
" Jan;Sept;Oct;Nov;Dec | \n",
" uuid:099de9c9-3e5e-427b-8452-26250e840d6e | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 10 | \n",
" 15 | \n",
" burntbricks | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" 1 | \n",
" solar_torch | \n",
" 2 | \n",
" Jan;Feb;Mar;Oct;Nov;Dec | \n",
" uuid:193d7daf-9582-409b-bf09-027dd36f9007 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" key_ID village interview_date no_membrs years_liv \\\n",
"0 1 God 2016-11-17T00:00:00Z 3 4 \n",
"1 1 God 2016-11-17T00:00:00Z 7 9 \n",
"2 3 God 2016-11-17T00:00:00Z 10 15 \n",
"\n",
" respondent_wall_type rooms memb_assoc affect_conflicts liv_count \\\n",
"0 muddaub 1 NaN NaN 1 \n",
"1 muddaub 1 yes once 3 \n",
"2 burntbricks 1 NaN NaN 1 \n",
"\n",
" items_owned no_meals \\\n",
"0 bicycle;television;solar_panel;table 2 \n",
"1 cow_cart;bicycle;radio;cow_plough;solar_panel;... 2 \n",
"2 solar_torch 2 \n",
"\n",
" months_lack_food instanceID \n",
"0 Jan uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef \n",
"1 Jan;Sept;Oct;Nov;Dec uuid:099de9c9-3e5e-427b-8452-26250e840d6e \n",
"2 Jan;Feb;Mar;Oct;Nov;Dec uuid:193d7daf-9582-409b-bf09-027dd36f9007 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"safi_df.head(3)"
]
},
{
"cell_type": "markdown",
"id": "5abd6749",
"metadata": {},
"source": [
"To know how this works, we can view the documentation for the function\n",
"````{margin}\n",
"```{warning}\n",
"this was changed from using `?` for help in class so that the help is desplayed in the rendered website, but the pop up was fine in real time\n",
"```\n",
"````"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "36209ec9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on method head in module pandas.core.generic:\n",
"\n",
"head(n: 'int' = 5) -> 'FrameOrSeries' method of pandas.core.frame.DataFrame instance\n",
" Return the first `n` rows.\n",
" \n",
" This function returns the first `n` rows for the object based\n",
" on position. It is useful for quickly testing if your object\n",
" has the right type of data in it.\n",
" \n",
" For negative values of `n`, this function returns all rows except\n",
" the last `n` rows, equivalent to ``df[:-n]``.\n",
" \n",
" Parameters\n",
" ----------\n",
" n : int, default 5\n",
" Number of rows to select.\n",
" \n",
" Returns\n",
" -------\n",
" same type as caller\n",
" The first `n` rows of the caller object.\n",
" \n",
" See Also\n",
" --------\n",
" DataFrame.tail: Returns the last `n` rows.\n",
" \n",
" Examples\n",
" --------\n",
" >>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',\n",
" ... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})\n",
" >>> df\n",
" animal\n",
" 0 alligator\n",
" 1 bee\n",
" 2 falcon\n",
" 3 lion\n",
" 4 monkey\n",
" 5 parrot\n",
" 6 shark\n",
" 7 whale\n",
" 8 zebra\n",
" \n",
" Viewing the first 5 lines\n",
" \n",
" >>> df.head()\n",
" animal\n",
" 0 alligator\n",
" 1 bee\n",
" 2 falcon\n",
" 3 lion\n",
" 4 monkey\n",
" \n",
" Viewing the first `n` lines (three in this case)\n",
" \n",
" >>> df.head(3)\n",
" animal\n",
" 0 alligator\n",
" 1 bee\n",
" 2 falcon\n",
" \n",
" For negative values of `n`\n",
" \n",
" >>> df.head(-3)\n",
" animal\n",
" 0 alligator\n",
" 1 bee\n",
" 2 falcon\n",
" 3 lion\n",
" 4 monkey\n",
" 5 parrot\n",
"\n"
]
}
],
"source": [
"help(safi_df.head)"
]
},
{
"cell_type": "markdown",
"id": "fb67a0fb",
"metadata": {},
"source": [
"Since it says `n =5` we know that the default value of the parameter `n` is 5. When a function has a default value, we can call the function without a value.\n",
"\n",
"To view the last few lines, we use `tail`"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "d26cfd68",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" key_ID | \n",
" village | \n",
" interview_date | \n",
" no_membrs | \n",
" years_liv | \n",
" respondent_wall_type | \n",
" rooms | \n",
" memb_assoc | \n",
" affect_conflicts | \n",
" liv_count | \n",
" items_owned | \n",
" no_meals | \n",
" months_lack_food | \n",
" instanceID | \n",
"
\n",
" \n",
" \n",
" \n",
" 126 | \n",
" 126 | \n",
" Ruaca | \n",
" 2017-05-18T00:00:00Z | \n",
" 3 | \n",
" 7 | \n",
" burntbricks | \n",
" 1 | \n",
" no | \n",
" more_once | \n",
" 3 | \n",
" motorcyle;radio;solar_panel | \n",
" 3 | \n",
" Oct;Nov;Dec | \n",
" uuid:69caea81-a4e5-4e8d-83cd-9c18d8e8d965 | \n",
"
\n",
" \n",
" 127 | \n",
" 193 | \n",
" Ruaca | \n",
" 2017-06-04T00:00:00Z | \n",
" 7 | \n",
" 10 | \n",
" cement | \n",
" 3 | \n",
" no | \n",
" more_once | \n",
" 3 | \n",
" car;lorry;television;radio;sterio;cow_plough;s... | \n",
" 3 | \n",
" none | \n",
" uuid:5ccc2e5a-ea90-48b5-8542-69400d5334df | \n",
"
\n",
" \n",
" 128 | \n",
" 194 | \n",
" Ruaca | \n",
" 2017-06-04T00:00:00Z | \n",
" 4 | \n",
" 5 | \n",
" muddaub | \n",
" 1 | \n",
" no | \n",
" more_once | \n",
" 1 | \n",
" radio;solar_panel;solar_torch;mobile_phone | \n",
" 3 | \n",
" Sept;Oct;Nov | \n",
" uuid:95c11a30-d44f-40c4-8ea8-ec34fca6bbbf | \n",
"
\n",
" \n",
" 129 | \n",
" 199 | \n",
" Chirodzo | \n",
" 2017-06-04T00:00:00Z | \n",
" 7 | \n",
" 17 | \n",
" burntbricks | \n",
" 2 | \n",
" yes | \n",
" more_once | \n",
" 2 | \n",
" cow_cart;lorry;motorcyle;computer;television;r... | \n",
" 3 | \n",
" Nov;Dec | \n",
" uuid:ffc83162-ff24-4a87-8709-eff17abc0b3b | \n",
"
\n",
" \n",
" 130 | \n",
" 200 | \n",
" Chirodzo | \n",
" 2017-06-04T00:00:00Z | \n",
" 8 | \n",
" 20 | \n",
" burntbricks | \n",
" 2 | \n",
" NaN | \n",
" NaN | \n",
" 3 | \n",
" radio;cow_plough;solar_panel;solar_torch;table... | \n",
" 3 | \n",
" Oct;Nov | \n",
" uuid:aa77a0d7-7142-41c8-b494-483a5b68d8a7 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" key_ID village interview_date no_membrs years_liv \\\n",
"126 126 Ruaca 2017-05-18T00:00:00Z 3 7 \n",
"127 193 Ruaca 2017-06-04T00:00:00Z 7 10 \n",
"128 194 Ruaca 2017-06-04T00:00:00Z 4 5 \n",
"129 199 Chirodzo 2017-06-04T00:00:00Z 7 17 \n",
"130 200 Chirodzo 2017-06-04T00:00:00Z 8 20 \n",
"\n",
" respondent_wall_type rooms memb_assoc affect_conflicts liv_count \\\n",
"126 burntbricks 1 no more_once 3 \n",
"127 cement 3 no more_once 3 \n",
"128 muddaub 1 no more_once 1 \n",
"129 burntbricks 2 yes more_once 2 \n",
"130 burntbricks 2 NaN NaN 3 \n",
"\n",
" items_owned no_meals \\\n",
"126 motorcyle;radio;solar_panel 3 \n",
"127 car;lorry;television;radio;sterio;cow_plough;s... 3 \n",
"128 radio;solar_panel;solar_torch;mobile_phone 3 \n",
"129 cow_cart;lorry;motorcyle;computer;television;r... 3 \n",
"130 radio;cow_plough;solar_panel;solar_torch;table... 3 \n",
"\n",
" months_lack_food instanceID \n",
"126 Oct;Nov;Dec uuid:69caea81-a4e5-4e8d-83cd-9c18d8e8d965 \n",
"127 none uuid:5ccc2e5a-ea90-48b5-8542-69400d5334df \n",
"128 Sept;Oct;Nov uuid:95c11a30-d44f-40c4-8ea8-ec34fca6bbbf \n",
"129 Nov;Dec uuid:ffc83162-ff24-4a87-8709-eff17abc0b3b \n",
"130 Oct;Nov uuid:aa77a0d7-7142-41c8-b494-483a5b68d8a7 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"safi_df.tail()"
]
},
{
"cell_type": "markdown",
"id": "8535a4f9",
"metadata": {},
"source": [
"We can also get an `Index` for the columns of the DataFrame.\n",
"```{margin}\n",
"[`Index`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html) is another data type that is defined by `pandas`.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "570dc232",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['key_ID', 'village', 'interview_date', 'no_membrs', 'years_liv',\n",
" 'respondent_wall_type', 'rooms', 'memb_assoc', 'affect_conflicts',\n",
" 'liv_count', 'items_owned', 'no_meals', 'months_lack_food',\n",
" 'instanceID'],\n",
" dtype='object')"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"safi_df.columns"
]
},
{
"cell_type": "markdown",
"id": "976568cc",
"metadata": {},
"source": [
"an `Index` variable is iterable so we can index into it\n",
"```{admonition} Try it Yourself\n",
"How would you view the name of the 3rd column?\n",
"```\n",
"\n",
"First the correct answer:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "d01da3dd",
"metadata": {
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/plain": [
"'interview_date'"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"safi_df.columns[2]"
]
},
{
"cell_type": "markdown",
"id": "634f981e",
"metadata": {},
"source": [
"Now some misconceptions:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "3c79ab83",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 2016-11-17T00:00:00Z\n",
"1 2016-11-17T00:00:00Z\n",
"2 2016-11-17T00:00:00Z\n",
"3 2016-11-17T00:00:00Z\n",
"4 2016-11-17T00:00:00Z\n",
" ... \n",
"126 2017-05-18T00:00:00Z\n",
"127 2017-06-04T00:00:00Z\n",
"128 2017-06-04T00:00:00Z\n",
"129 2017-06-04T00:00:00Z\n",
"130 2017-06-04T00:00:00Z\n",
"Name: interview_date, Length: 131, dtype: object"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"safi_df['interview_date']"
]
},
{
"cell_type": "markdown",
"id": "a9f75732",
"metadata": {},
"source": [
"Indexing with the column name) will return the *values* in the column"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "207e250f",
"metadata": {
"tags": [
"raises-exception"
]
},
"outputs": [
{
"ename": "TypeError",
"evalue": "'Index' object is not callable",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msafi_df\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: 'Index' object is not callable"
]
}
],
"source": [
"safi_df.columns(2)"
]
},
{
"cell_type": "markdown",
"id": "130b4c7a",
"metadata": {},
"source": [
"Using `()` returns an error, because `columns` is an *attribute* which is referenced as is with no `()`. We get a type error because functions in python are objects of type `callable` and properties are values not functions."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "80f76b19",
"metadata": {
"tags": [
"raises-exception"
]
},
"outputs": [
{
"ename": "TypeError",
"evalue": "'pandas._libs.properties.AxisProperty' object is not subscriptable",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mDataFrame\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: 'pandas._libs.properties.AxisProperty' object is not subscriptable"
]
}
],
"source": [
"pd.DataFrame.columns[2]"
]
},
{
"cell_type": "markdown",
"id": "47175367",
"metadata": {},
"source": [
"This doesn't work because `columns` is an attribute of an object of type `pandas.DataFrame` and `pd.DataFrame.columns` is not an object. \n",
"\n",
"We can see what the type of pd.DataFrame is with the `type` function."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "f0e97096",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"type"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(pd.DataFrame)"
]
},
{
"cell_type": "markdown",
"id": "20273850",
"metadata": {},
"source": [
"Knowing about types is helpful for the individual columns of a dataset as well."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "df0d2b6b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"key_ID int64\n",
"village object\n",
"interview_date object\n",
"no_membrs int64\n",
"years_liv int64\n",
"respondent_wall_type object\n",
"rooms int64\n",
"memb_assoc object\n",
"affect_conflicts object\n",
"liv_count int64\n",
"items_owned object\n",
"no_meals int64\n",
"months_lack_food object\n",
"instanceID object\n",
"dtype: object"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"safi_df.dtypes"
]
},
{
"cell_type": "markdown",
"id": "9ff68795",
"metadata": {},
"source": [
"Note that it uses `int64` and `object` as the types."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "028a4dc7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" key_ID | \n",
" village | \n",
" interview_date | \n",
" no_membrs | \n",
" years_liv | \n",
" respondent_wall_type | \n",
" rooms | \n",
" memb_assoc | \n",
" affect_conflicts | \n",
" liv_count | \n",
" items_owned | \n",
" no_meals | \n",
" months_lack_food | \n",
" instanceID | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 3 | \n",
" 4 | \n",
" muddaub | \n",
" 1 | \n",
" NaN | \n",
" NaN | \n",
" 1 | \n",
" bicycle;television;solar_panel;table | \n",
" 2 | \n",
" Jan | \n",
" uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" God | \n",
" 2016-11-17T00:00:00Z | \n",
" 7 | \n",
" 9 | \n",
" muddaub | \n",
" 1 | \n",
" yes | \n",
" once | \n",
" 3 | \n",
" cow_cart;bicycle;radio;cow_plough;solar_panel;... | \n",
" 2 | \n",
" Jan;Sept;Oct;Nov;Dec | \n",
" uuid:099de9c9-3e5e-427b-8452-26250e840d6e | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" key_ID village interview_date no_membrs years_liv \\\n",
"0 1 God 2016-11-17T00:00:00Z 3 4 \n",
"1 1 God 2016-11-17T00:00:00Z 7 9 \n",
"\n",
" respondent_wall_type rooms memb_assoc affect_conflicts liv_count \\\n",
"0 muddaub 1 NaN NaN 1 \n",
"1 muddaub 1 yes once 3 \n",
"\n",
" items_owned no_meals \\\n",
"0 bicycle;television;solar_panel;table 2 \n",
"1 cow_cart;bicycle;radio;cow_plough;solar_panel;... 2 \n",
"\n",
" months_lack_food instanceID \n",
"0 Jan uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef \n",
"1 Jan;Sept;Oct;Nov;Dec uuid:099de9c9-3e5e-427b-8452-26250e840d6e "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"safi_df.head(2)"
]
},
{
"cell_type": "markdown",
"id": "d611fc9d",
"metadata": {},
"source": [
"We might want to look at what villages were included in the data."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "03be5cd5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['God', 'Chirodzo', 'Ruaca'], dtype=object)"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.unique(safi_df['village'])"
]
},
{
"cell_type": "markdown",
"id": "20333fab",
"metadata": {},
"source": [
"We can also get count of the number of of each value"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "d8aee397",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Ruaca 49\n",
"God 43\n",
"Chirodzo 39\n",
"Name: village, dtype: int64"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"safi_df['village'].value_counts()"
]
},
{
"cell_type": "markdown",
"id": "eee49241",
"metadata": {},
"source": [
"````{admonition} Try it Yourself!\n",
"how many surveyed farms have all type `mauddaub`?\n",
"\n",
"\n",
"```{toggle}\n",
"46 or 45 count as good answers.\n",
"```\n",
"\n",
"````"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "24d15c86",
"metadata": {
"tags": [
"hide-input",
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"burntbricks 65\n",
"muddaub 45\n",
"sunbricks 17\n",
" burntbricks 2\n",
" muddaub 1\n",
"cement 1\n",
"Name: respondent_wall_type, dtype: int64"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"safi_df['respondent_wall_type'].value_counts()"
]
},
{
"cell_type": "markdown",
"id": "b9835d1b",
"metadata": {},
"source": [
"## Review and Further reading\n",
"\n",
"- [reading data with pandas](https://brownsarahm.github.io/python-socialsci/08-Pandas/index.html)\n",
"- [Python built in functions](https://docs.python.org/3/library/functions.html) and in particular the [`type` function](https://docs.python.org/3/library/functions.html#type)\n",
"- [Pandas DataFrames](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html)\n",
"- [`value_counts`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html#pandas.Series.value_counts)\n",
"\n",
"If you've made it this far, [let me know](https://forms.gle/PDATAmuRS5tAHiiZ8) how you found these notes."
]
}
],
"metadata": {
"jupytext": {
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.12,
"jupytext_version": "1.6.0"
}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
},
"source_map": [
12,
18,
26,
28,
31,
34,
37,
39,
44,
46,
49,
52,
55,
57,
63,
65,
74,
76,
79,
81,
84,
86,
90,
92,
101,
103,
108,
110,
116,
118,
126,
130,
135,
138,
141,
145,
148,
152,
158,
160,
165,
167,
171,
173,
177,
179,
184,
186,
198,
201
]
},
"nbformat": 4,
"nbformat_minor": 5
}