{ "cells": [ { "cell_type": "markdown", "id": "3fc4b3f4", "metadata": {}, "source": [ "# Class 4: Pandas\n", "\n", "Today we will:" ] }, { "cell_type": "markdown", "id": "702de130", "metadata": {}, "source": [ "## Remember, Programming is a Practice\n", "\n", "- if you're curious about something try it\n", "- you don't need me to give you answers about how code works, the interpreter will tell you\n", "- if you don't remember details, remember you can get help from Jupyter\n", "\n", "with a `?` after the function name withouth `()`" ] }, { "cell_type": "code", "execution_count": 1, "id": "88dc5169", "metadata": {}, "outputs": [], "source": [ "print?" ] }, { "cell_type": "markdown", "id": "4ac15c5a", "metadata": {}, "source": [ "or using the `tab` key inside the `()` for a function" ] }, { "cell_type": "code", "execution_count": 2, "id": "7f89ee8f", "metadata": { "tags": [ "raises-exception", "remove-output" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "print()" ] }, { "cell_type": "markdown", "id": "4592c964", "metadata": {}, "source": [ "or from the core python, with the `help` fucntion" ] }, { "cell_type": "code", "execution_count": 3, "id": "a60bcfb3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on built-in function print in module builtins:\n", "\n", "print(...)\n", " print(value, ..., sep=' ', end='\\n', file=sys.stdout, flush=False)\n", " \n", " Prints the values to a stream, or to sys.stdout by default.\n", " Optional keyword arguments:\n", " file: a file-like object (stream); defaults to the current sys.stdout.\n", " sep: string inserted between values, default a space.\n", " end: string appended after the last value, default a newline.\n", " flush: whether to forcibly flush the stream.\n", "\n" ] } ], "source": [ "help(print)" ] }, { "cell_type": "markdown", "id": "5563bc1a", "metadata": {}, "source": [ "## Data in Pandas\n", "\n", "We can import `pandas` again as before" ] }, { "cell_type": "code", "execution_count": 4, "id": "fe4708b7", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "id": "39851ae7", "metadata": {}, "source": [ "and we can read in data." ] }, { "cell_type": "code", "execution_count": 5, "id": "adf298bf", "metadata": { "tags": [ "output_scroll" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key_IDvillageinterview_dateno_membrsyears_livrespondent_wall_typeroomsmemb_assocaffect_conflictsliv_countitems_ownedno_mealsmonths_lack_foodinstanceID
01God2016-11-17T00:00:00Z34muddaub1NaNNaN1bicycle;television;solar_panel;table2Januuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef
11God2016-11-17T00:00:00Z79muddaub1yesonce3cow_cart;bicycle;radio;cow_plough;solar_panel;...2Jan;Sept;Oct;Nov;Decuuid:099de9c9-3e5e-427b-8452-26250e840d6e
23God2016-11-17T00:00:00Z1015burntbricks1NaNNaN1solar_torch2Jan;Feb;Mar;Oct;Nov;Decuuid:193d7daf-9582-409b-bf09-027dd36f9007
34God2016-11-17T00:00:00Z76burntbricks1NaNNaN2bicycle;radio;cow_plough;solar_panel;mobile_phone2Sept;Oct;Nov;Decuuid:148d1105-778a-4755-aa71-281eadd4a973
45God2016-11-17T00:00:00Z740burntbricks1NaNNaN4motorcyle;radio;cow_plough;mobile_phone2Aug;Sept;Oct;Novuuid:2c867811-9696-4966-9866-f35c3e97d02d
.............................................
126126Ruaca2017-05-18T00:00:00Z37burntbricks1nomore_once3motorcyle;radio;solar_panel3Oct;Nov;Decuuid:69caea81-a4e5-4e8d-83cd-9c18d8e8d965
127193Ruaca2017-06-04T00:00:00Z710cement3nomore_once3car;lorry;television;radio;sterio;cow_plough;s...3noneuuid:5ccc2e5a-ea90-48b5-8542-69400d5334df
128194Ruaca2017-06-04T00:00:00Z45muddaub1nomore_once1radio;solar_panel;solar_torch;mobile_phone3Sept;Oct;Novuuid:95c11a30-d44f-40c4-8ea8-ec34fca6bbbf
129199Chirodzo2017-06-04T00:00:00Z717burntbricks2yesmore_once2cow_cart;lorry;motorcyle;computer;television;r...3Nov;Decuuid:ffc83162-ff24-4a87-8709-eff17abc0b3b
130200Chirodzo2017-06-04T00:00:00Z820burntbricks2NaNNaN3radio;cow_plough;solar_panel;solar_torch;table...3Oct;Novuuid:aa77a0d7-7142-41c8-b494-483a5b68d8a7
\n", "

131 rows × 14 columns

\n", "
" ], "text/plain": [ " key_ID village interview_date no_membrs years_liv \\\n", "0 1 God 2016-11-17T00:00:00Z 3 4 \n", "1 1 God 2016-11-17T00:00:00Z 7 9 \n", "2 3 God 2016-11-17T00:00:00Z 10 15 \n", "3 4 God 2016-11-17T00:00:00Z 7 6 \n", "4 5 God 2016-11-17T00:00:00Z 7 40 \n", ".. ... ... ... ... ... \n", "126 126 Ruaca 2017-05-18T00:00:00Z 3 7 \n", "127 193 Ruaca 2017-06-04T00:00:00Z 7 10 \n", "128 194 Ruaca 2017-06-04T00:00:00Z 4 5 \n", "129 199 Chirodzo 2017-06-04T00:00:00Z 7 17 \n", "130 200 Chirodzo 2017-06-04T00:00:00Z 8 20 \n", "\n", " respondent_wall_type rooms memb_assoc affect_conflicts liv_count \\\n", "0 muddaub 1 NaN NaN 1 \n", "1 muddaub 1 yes once 3 \n", "2 burntbricks 1 NaN NaN 1 \n", "3 burntbricks 1 NaN NaN 2 \n", "4 burntbricks 1 NaN NaN 4 \n", ".. ... ... ... ... ... \n", "126 burntbricks 1 no more_once 3 \n", "127 cement 3 no more_once 3 \n", "128 muddaub 1 no more_once 1 \n", "129 burntbricks 2 yes more_once 2 \n", "130 burntbricks 2 NaN NaN 3 \n", "\n", " items_owned no_meals \\\n", "0 bicycle;television;solar_panel;table 2 \n", "1 cow_cart;bicycle;radio;cow_plough;solar_panel;... 2 \n", "2 solar_torch 2 \n", "3 bicycle;radio;cow_plough;solar_panel;mobile_phone 2 \n", "4 motorcyle;radio;cow_plough;mobile_phone 2 \n", ".. ... ... \n", "126 motorcyle;radio;solar_panel 3 \n", "127 car;lorry;television;radio;sterio;cow_plough;s... 3 \n", "128 radio;solar_panel;solar_torch;mobile_phone 3 \n", "129 cow_cart;lorry;motorcyle;computer;television;r... 3 \n", "130 radio;cow_plough;solar_panel;solar_torch;table... 3 \n", "\n", " months_lack_food instanceID \n", "0 Jan uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef \n", "1 Jan;Sept;Oct;Nov;Dec uuid:099de9c9-3e5e-427b-8452-26250e840d6e \n", "2 Jan;Feb;Mar;Oct;Nov;Dec uuid:193d7daf-9582-409b-bf09-027dd36f9007 \n", "3 Sept;Oct;Nov;Dec uuid:148d1105-778a-4755-aa71-281eadd4a973 \n", "4 Aug;Sept;Oct;Nov uuid:2c867811-9696-4966-9866-f35c3e97d02d \n", ".. ... ... \n", "126 Oct;Nov;Dec uuid:69caea81-a4e5-4e8d-83cd-9c18d8e8d965 \n", "127 none uuid:5ccc2e5a-ea90-48b5-8542-69400d5334df \n", "128 Sept;Oct;Nov uuid:95c11a30-d44f-40c4-8ea8-ec34fca6bbbf \n", "129 Nov;Dec uuid:ffc83162-ff24-4a87-8709-eff17abc0b3b \n", "130 Oct;Nov uuid:aa77a0d7-7142-41c8-b494-483a5b68d8a7 \n", "\n", "[131 rows x 14 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_csv('https://raw.githubusercontent.com/brownsarahm/python-socialsci-files/master/data/SAFI_clean.csv')" ] }, { "cell_type": "markdown", "id": "0e07ecb9", "metadata": {}, "source": [ "to be able to use this, we need to save it to a variable." ] }, { "cell_type": "code", "execution_count": 6, "id": "24782612", "metadata": {}, "outputs": [], "source": [ "safi_df = pd.read_csv('https://raw.githubusercontent.com/brownsarahm/python-socialsci-files/master/data/SAFI_clean.csv')" ] }, { "cell_type": "markdown", "id": "fbcd8d62", "metadata": {}, "source": [ "This is an excerpt from the [SAFI dataset](https://figshare.com/articles/SAFI_Survey_Results/6262019).\n", "\n", "Another important thing to do is to check datatypes, this is how we know what things we can do with a variable." ] }, { "cell_type": "code", "execution_count": 7, "id": "51d6757a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(safi_df)" ] }, { "cell_type": "markdown", "id": "40496bdf", "metadata": {}, "source": [ "````{margin}\n", "```{tip}\n", "we can also see the size when we print the whole dataset as in the first time we read the data in.\n", "```\n", "````\n", "An important thing to check is the size of the dataset." ] }, { "cell_type": "code", "execution_count": 8, "id": "13216461", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(131, 14)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "safi_df.shape" ] }, { "cell_type": "markdown", "id": "b47ce8bb", "metadata": {}, "source": [ "Recall that you can also tab complete" ] }, { "cell_type": "code", "execution_count": 9, "id": "949fdaee", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(131, 14)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "safi_df.shape" ] }, { "cell_type": "markdown", "id": "8be26ec7", "metadata": {}, "source": [ "To see the first 5 rows of the dataset, use the `head()` function" ] }, { "cell_type": "code", "execution_count": 10, "id": "bf6f2d64", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key_IDvillageinterview_dateno_membrsyears_livrespondent_wall_typeroomsmemb_assocaffect_conflictsliv_countitems_ownedno_mealsmonths_lack_foodinstanceID
01God2016-11-17T00:00:00Z34muddaub1NaNNaN1bicycle;television;solar_panel;table2Januuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef
11God2016-11-17T00:00:00Z79muddaub1yesonce3cow_cart;bicycle;radio;cow_plough;solar_panel;...2Jan;Sept;Oct;Nov;Decuuid:099de9c9-3e5e-427b-8452-26250e840d6e
23God2016-11-17T00:00:00Z1015burntbricks1NaNNaN1solar_torch2Jan;Feb;Mar;Oct;Nov;Decuuid:193d7daf-9582-409b-bf09-027dd36f9007
34God2016-11-17T00:00:00Z76burntbricks1NaNNaN2bicycle;radio;cow_plough;solar_panel;mobile_phone2Sept;Oct;Nov;Decuuid:148d1105-778a-4755-aa71-281eadd4a973
45God2016-11-17T00:00:00Z740burntbricks1NaNNaN4motorcyle;radio;cow_plough;mobile_phone2Aug;Sept;Oct;Novuuid:2c867811-9696-4966-9866-f35c3e97d02d
\n", "
" ], "text/plain": [ " key_ID village interview_date no_membrs years_liv \\\n", "0 1 God 2016-11-17T00:00:00Z 3 4 \n", "1 1 God 2016-11-17T00:00:00Z 7 9 \n", "2 3 God 2016-11-17T00:00:00Z 10 15 \n", "3 4 God 2016-11-17T00:00:00Z 7 6 \n", "4 5 God 2016-11-17T00:00:00Z 7 40 \n", "\n", " respondent_wall_type rooms memb_assoc affect_conflicts liv_count \\\n", "0 muddaub 1 NaN NaN 1 \n", "1 muddaub 1 yes once 3 \n", "2 burntbricks 1 NaN NaN 1 \n", "3 burntbricks 1 NaN NaN 2 \n", "4 burntbricks 1 NaN NaN 4 \n", "\n", " items_owned no_meals \\\n", "0 bicycle;television;solar_panel;table 2 \n", "1 cow_cart;bicycle;radio;cow_plough;solar_panel;... 2 \n", "2 solar_torch 2 \n", "3 bicycle;radio;cow_plough;solar_panel;mobile_phone 2 \n", "4 motorcyle;radio;cow_plough;mobile_phone 2 \n", "\n", " months_lack_food instanceID \n", "0 Jan uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef \n", "1 Jan;Sept;Oct;Nov;Dec uuid:099de9c9-3e5e-427b-8452-26250e840d6e \n", "2 Jan;Feb;Mar;Oct;Nov;Dec uuid:193d7daf-9582-409b-bf09-027dd36f9007 \n", "3 Sept;Oct;Nov;Dec uuid:148d1105-778a-4755-aa71-281eadd4a973 \n", "4 Aug;Sept;Oct;Nov uuid:2c867811-9696-4966-9866-f35c3e97d02d " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "safi_df.head()" ] }, { "cell_type": "markdown", "id": "b899d1b0", "metadata": {}, "source": [ "We can call this function with a value to change how many rows are returned" ] }, { "cell_type": "code", "execution_count": 11, "id": "ee79ae60", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key_IDvillageinterview_dateno_membrsyears_livrespondent_wall_typeroomsmemb_assocaffect_conflictsliv_countitems_ownedno_mealsmonths_lack_foodinstanceID
01God2016-11-17T00:00:00Z34muddaub1NaNNaN1bicycle;television;solar_panel;table2Januuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef
11God2016-11-17T00:00:00Z79muddaub1yesonce3cow_cart;bicycle;radio;cow_plough;solar_panel;...2Jan;Sept;Oct;Nov;Decuuid:099de9c9-3e5e-427b-8452-26250e840d6e
23God2016-11-17T00:00:00Z1015burntbricks1NaNNaN1solar_torch2Jan;Feb;Mar;Oct;Nov;Decuuid:193d7daf-9582-409b-bf09-027dd36f9007
\n", "
" ], "text/plain": [ " key_ID village interview_date no_membrs years_liv \\\n", "0 1 God 2016-11-17T00:00:00Z 3 4 \n", "1 1 God 2016-11-17T00:00:00Z 7 9 \n", "2 3 God 2016-11-17T00:00:00Z 10 15 \n", "\n", " respondent_wall_type rooms memb_assoc affect_conflicts liv_count \\\n", "0 muddaub 1 NaN NaN 1 \n", "1 muddaub 1 yes once 3 \n", "2 burntbricks 1 NaN NaN 1 \n", "\n", " items_owned no_meals \\\n", "0 bicycle;television;solar_panel;table 2 \n", "1 cow_cart;bicycle;radio;cow_plough;solar_panel;... 2 \n", "2 solar_torch 2 \n", "\n", " months_lack_food instanceID \n", "0 Jan uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef \n", "1 Jan;Sept;Oct;Nov;Dec uuid:099de9c9-3e5e-427b-8452-26250e840d6e \n", "2 Jan;Feb;Mar;Oct;Nov;Dec uuid:193d7daf-9582-409b-bf09-027dd36f9007 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "safi_df.head(3)" ] }, { "cell_type": "markdown", "id": "5abd6749", "metadata": {}, "source": [ "To know how this works, we can view the documentation for the function\n", "````{margin}\n", "```{warning}\n", "this was changed from using `?` for help in class so that the help is desplayed in the rendered website, but the pop up was fine in real time\n", "```\n", "````" ] }, { "cell_type": "code", "execution_count": 12, "id": "36209ec9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on method head in module pandas.core.generic:\n", "\n", "head(n: 'int' = 5) -> 'FrameOrSeries' method of pandas.core.frame.DataFrame instance\n", " Return the first `n` rows.\n", " \n", " This function returns the first `n` rows for the object based\n", " on position. It is useful for quickly testing if your object\n", " has the right type of data in it.\n", " \n", " For negative values of `n`, this function returns all rows except\n", " the last `n` rows, equivalent to ``df[:-n]``.\n", " \n", " Parameters\n", " ----------\n", " n : int, default 5\n", " Number of rows to select.\n", " \n", " Returns\n", " -------\n", " same type as caller\n", " The first `n` rows of the caller object.\n", " \n", " See Also\n", " --------\n", " DataFrame.tail: Returns the last `n` rows.\n", " \n", " Examples\n", " --------\n", " >>> df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',\n", " ... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})\n", " >>> df\n", " animal\n", " 0 alligator\n", " 1 bee\n", " 2 falcon\n", " 3 lion\n", " 4 monkey\n", " 5 parrot\n", " 6 shark\n", " 7 whale\n", " 8 zebra\n", " \n", " Viewing the first 5 lines\n", " \n", " >>> df.head()\n", " animal\n", " 0 alligator\n", " 1 bee\n", " 2 falcon\n", " 3 lion\n", " 4 monkey\n", " \n", " Viewing the first `n` lines (three in this case)\n", " \n", " >>> df.head(3)\n", " animal\n", " 0 alligator\n", " 1 bee\n", " 2 falcon\n", " \n", " For negative values of `n`\n", " \n", " >>> df.head(-3)\n", " animal\n", " 0 alligator\n", " 1 bee\n", " 2 falcon\n", " 3 lion\n", " 4 monkey\n", " 5 parrot\n", "\n" ] } ], "source": [ "help(safi_df.head)" ] }, { "cell_type": "markdown", "id": "fb67a0fb", "metadata": {}, "source": [ "Since it says `n =5` we know that the default value of the parameter `n` is 5. When a function has a default value, we can call the function without a value.\n", "\n", "To view the last few lines, we use `tail`" ] }, { "cell_type": "code", "execution_count": 13, "id": "d26cfd68", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key_IDvillageinterview_dateno_membrsyears_livrespondent_wall_typeroomsmemb_assocaffect_conflictsliv_countitems_ownedno_mealsmonths_lack_foodinstanceID
126126Ruaca2017-05-18T00:00:00Z37burntbricks1nomore_once3motorcyle;radio;solar_panel3Oct;Nov;Decuuid:69caea81-a4e5-4e8d-83cd-9c18d8e8d965
127193Ruaca2017-06-04T00:00:00Z710cement3nomore_once3car;lorry;television;radio;sterio;cow_plough;s...3noneuuid:5ccc2e5a-ea90-48b5-8542-69400d5334df
128194Ruaca2017-06-04T00:00:00Z45muddaub1nomore_once1radio;solar_panel;solar_torch;mobile_phone3Sept;Oct;Novuuid:95c11a30-d44f-40c4-8ea8-ec34fca6bbbf
129199Chirodzo2017-06-04T00:00:00Z717burntbricks2yesmore_once2cow_cart;lorry;motorcyle;computer;television;r...3Nov;Decuuid:ffc83162-ff24-4a87-8709-eff17abc0b3b
130200Chirodzo2017-06-04T00:00:00Z820burntbricks2NaNNaN3radio;cow_plough;solar_panel;solar_torch;table...3Oct;Novuuid:aa77a0d7-7142-41c8-b494-483a5b68d8a7
\n", "
" ], "text/plain": [ " key_ID village interview_date no_membrs years_liv \\\n", "126 126 Ruaca 2017-05-18T00:00:00Z 3 7 \n", "127 193 Ruaca 2017-06-04T00:00:00Z 7 10 \n", "128 194 Ruaca 2017-06-04T00:00:00Z 4 5 \n", "129 199 Chirodzo 2017-06-04T00:00:00Z 7 17 \n", "130 200 Chirodzo 2017-06-04T00:00:00Z 8 20 \n", "\n", " respondent_wall_type rooms memb_assoc affect_conflicts liv_count \\\n", "126 burntbricks 1 no more_once 3 \n", "127 cement 3 no more_once 3 \n", "128 muddaub 1 no more_once 1 \n", "129 burntbricks 2 yes more_once 2 \n", "130 burntbricks 2 NaN NaN 3 \n", "\n", " items_owned no_meals \\\n", "126 motorcyle;radio;solar_panel 3 \n", "127 car;lorry;television;radio;sterio;cow_plough;s... 3 \n", "128 radio;solar_panel;solar_torch;mobile_phone 3 \n", "129 cow_cart;lorry;motorcyle;computer;television;r... 3 \n", "130 radio;cow_plough;solar_panel;solar_torch;table... 3 \n", "\n", " months_lack_food instanceID \n", "126 Oct;Nov;Dec uuid:69caea81-a4e5-4e8d-83cd-9c18d8e8d965 \n", "127 none uuid:5ccc2e5a-ea90-48b5-8542-69400d5334df \n", "128 Sept;Oct;Nov uuid:95c11a30-d44f-40c4-8ea8-ec34fca6bbbf \n", "129 Nov;Dec uuid:ffc83162-ff24-4a87-8709-eff17abc0b3b \n", "130 Oct;Nov uuid:aa77a0d7-7142-41c8-b494-483a5b68d8a7 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "safi_df.tail()" ] }, { "cell_type": "markdown", "id": "8535a4f9", "metadata": {}, "source": [ "We can also get an `Index` for the columns of the DataFrame.\n", "```{margin}\n", "[`Index`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html) is another data type that is defined by `pandas`.\n", "```" ] }, { "cell_type": "code", "execution_count": 14, "id": "570dc232", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['key_ID', 'village', 'interview_date', 'no_membrs', 'years_liv',\n", " 'respondent_wall_type', 'rooms', 'memb_assoc', 'affect_conflicts',\n", " 'liv_count', 'items_owned', 'no_meals', 'months_lack_food',\n", " 'instanceID'],\n", " dtype='object')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "safi_df.columns" ] }, { "cell_type": "markdown", "id": "976568cc", "metadata": {}, "source": [ "an `Index` variable is iterable so we can index into it\n", "```{admonition} Try it Yourself\n", "How would you view the name of the 3rd column?\n", "```\n", "\n", "First the correct answer:" ] }, { "cell_type": "code", "execution_count": 15, "id": "d01da3dd", "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/plain": [ "'interview_date'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "safi_df.columns[2]" ] }, { "cell_type": "markdown", "id": "634f981e", "metadata": {}, "source": [ "Now some misconceptions:" ] }, { "cell_type": "code", "execution_count": 16, "id": "3c79ab83", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2016-11-17T00:00:00Z\n", "1 2016-11-17T00:00:00Z\n", "2 2016-11-17T00:00:00Z\n", "3 2016-11-17T00:00:00Z\n", "4 2016-11-17T00:00:00Z\n", " ... \n", "126 2017-05-18T00:00:00Z\n", "127 2017-06-04T00:00:00Z\n", "128 2017-06-04T00:00:00Z\n", "129 2017-06-04T00:00:00Z\n", "130 2017-06-04T00:00:00Z\n", "Name: interview_date, Length: 131, dtype: object" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "safi_df['interview_date']" ] }, { "cell_type": "markdown", "id": "a9f75732", "metadata": {}, "source": [ "Indexing with the column name) will return the *values* in the column" ] }, { "cell_type": "code", "execution_count": 17, "id": "207e250f", "metadata": { "tags": [ "raises-exception" ] }, "outputs": [ { "ename": "TypeError", "evalue": "'Index' object is not callable", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msafi_df\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'Index' object is not callable" ] } ], "source": [ "safi_df.columns(2)" ] }, { "cell_type": "markdown", "id": "130b4c7a", "metadata": {}, "source": [ "Using `()` returns an error, because `columns` is an *attribute* which is referenced as is with no `()`. We get a type error because functions in python are objects of type `callable` and properties are values not functions." ] }, { "cell_type": "code", "execution_count": 18, "id": "80f76b19", "metadata": { "tags": [ "raises-exception" ] }, "outputs": [ { "ename": "TypeError", "evalue": "'pandas._libs.properties.AxisProperty' object is not subscriptable", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mDataFrame\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'pandas._libs.properties.AxisProperty' object is not subscriptable" ] } ], "source": [ "pd.DataFrame.columns[2]" ] }, { "cell_type": "markdown", "id": "47175367", "metadata": {}, "source": [ "This doesn't work because `columns` is an attribute of an object of type `pandas.DataFrame` and `pd.DataFrame.columns` is not an object. \n", "\n", "We can see what the type of pd.DataFrame is with the `type` function." ] }, { "cell_type": "code", "execution_count": 19, "id": "f0e97096", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "type" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(pd.DataFrame)" ] }, { "cell_type": "markdown", "id": "20273850", "metadata": {}, "source": [ "Knowing about types is helpful for the individual columns of a dataset as well." ] }, { "cell_type": "code", "execution_count": 20, "id": "df0d2b6b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "key_ID int64\n", "village object\n", "interview_date object\n", "no_membrs int64\n", "years_liv int64\n", "respondent_wall_type object\n", "rooms int64\n", "memb_assoc object\n", "affect_conflicts object\n", "liv_count int64\n", "items_owned object\n", "no_meals int64\n", "months_lack_food object\n", "instanceID object\n", "dtype: object" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "safi_df.dtypes" ] }, { "cell_type": "markdown", "id": "9ff68795", "metadata": {}, "source": [ "Note that it uses `int64` and `object` as the types." ] }, { "cell_type": "code", "execution_count": 21, "id": "028a4dc7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
key_IDvillageinterview_dateno_membrsyears_livrespondent_wall_typeroomsmemb_assocaffect_conflictsliv_countitems_ownedno_mealsmonths_lack_foodinstanceID
01God2016-11-17T00:00:00Z34muddaub1NaNNaN1bicycle;television;solar_panel;table2Januuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef
11God2016-11-17T00:00:00Z79muddaub1yesonce3cow_cart;bicycle;radio;cow_plough;solar_panel;...2Jan;Sept;Oct;Nov;Decuuid:099de9c9-3e5e-427b-8452-26250e840d6e
\n", "
" ], "text/plain": [ " key_ID village interview_date no_membrs years_liv \\\n", "0 1 God 2016-11-17T00:00:00Z 3 4 \n", "1 1 God 2016-11-17T00:00:00Z 7 9 \n", "\n", " respondent_wall_type rooms memb_assoc affect_conflicts liv_count \\\n", "0 muddaub 1 NaN NaN 1 \n", "1 muddaub 1 yes once 3 \n", "\n", " items_owned no_meals \\\n", "0 bicycle;television;solar_panel;table 2 \n", "1 cow_cart;bicycle;radio;cow_plough;solar_panel;... 2 \n", "\n", " months_lack_food instanceID \n", "0 Jan uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef \n", "1 Jan;Sept;Oct;Nov;Dec uuid:099de9c9-3e5e-427b-8452-26250e840d6e " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "safi_df.head(2)" ] }, { "cell_type": "markdown", "id": "d611fc9d", "metadata": {}, "source": [ "We might want to look at what villages were included in the data." ] }, { "cell_type": "code", "execution_count": 22, "id": "03be5cd5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['God', 'Chirodzo', 'Ruaca'], dtype=object)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.unique(safi_df['village'])" ] }, { "cell_type": "markdown", "id": "20333fab", "metadata": {}, "source": [ "We can also get count of the number of of each value" ] }, { "cell_type": "code", "execution_count": 23, "id": "d8aee397", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Ruaca 49\n", "God 43\n", "Chirodzo 39\n", "Name: village, dtype: int64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "safi_df['village'].value_counts()" ] }, { "cell_type": "markdown", "id": "eee49241", "metadata": {}, "source": [ "````{admonition} Try it Yourself!\n", "how many surveyed farms have all type `mauddaub`?\n", "\n", "\n", "```{toggle}\n", "46 or 45 count as good answers.\n", "```\n", "\n", "````" ] }, { "cell_type": "code", "execution_count": 24, "id": "24d15c86", "metadata": { "tags": [ "hide-input", "hide-output" ] }, "outputs": [ { "data": { "text/plain": [ "burntbricks 65\n", "muddaub 45\n", "sunbricks 17\n", " burntbricks 2\n", " muddaub 1\n", "cement 1\n", "Name: respondent_wall_type, dtype: int64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "safi_df['respondent_wall_type'].value_counts()" ] }, { "cell_type": "markdown", "id": "b9835d1b", "metadata": {}, "source": [ "## Review and Further reading\n", "\n", "- [reading data with pandas](https://brownsarahm.github.io/python-socialsci/08-Pandas/index.html)\n", "- [Python built in functions](https://docs.python.org/3/library/functions.html) and in particular the [`type` function](https://docs.python.org/3/library/functions.html#type)\n", "- [Pandas DataFrames](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html)\n", "- [`value_counts`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html#pandas.Series.value_counts)\n", "\n", "If you've made it this far, [let me know](https://forms.gle/PDATAmuRS5tAHiiZ8) how you found these notes." ] } ], "metadata": { "jupytext": { "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.12, "jupytext_version": "1.6.0" } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "source_map": [ 12, 18, 26, 28, 31, 34, 37, 39, 44, 46, 49, 52, 55, 57, 63, 65, 74, 76, 79, 81, 84, 86, 90, 92, 101, 103, 108, 110, 116, 118, 126, 130, 135, 138, 141, 145, 148, 152, 158, 160, 165, 167, 171, 173, 177, 179, 184, 186, 198, 201 ] }, "nbformat": 4, "nbformat_minor": 5 }