{ "cells": [ { "cell_type": "markdown", "id": "b4d18893", "metadata": {}, "source": [ "# Class 7: Visualization for EDA\n", "\n", "\n", "## Announcements\n", "\n", "Syllabus updated\n", "1. [rubric](https://github.com/rhodyprog4ds/BrownFall20/commit/315ce164bc4bd4b7d5ee321afd9143e08f05c07b#diff-1e595cb12e4db779fb0c857562c1e0dd) for summarize and visualize are slightly changed\n", "1. [Please accept assignments](https://github.com/rhodyprog4ds/BrownFall20/commit/f459e8b04dc0b09d2dadba10464f49b25de57190) if you plan to not complete for any reason\n", "\n", "\n", "Assignment updated to [clarify continuous and categorical variables](https://github.com/rhodyprog4ds/BrownFall20/commit/f719a92c4d5dd00cbe8bf646d9d63280fb4b1e50)\n", "\n", "\n", "## Loading Data\n", "\n", "Importing the libraries for today. We'll continue plotting with pandas and we'll use [`seaborn`](https://seaborn.pydata.org/introduction.html) as well. Seaborn provides higher level plotting functions and [better formatting](https://seaborn.pydata.org/examples/index.html).\n", "\n", "````{margin}\n", "The alias for `seaborn` is `sns` the result of an [inside joke](https://github.com/mwaskom/seaborn/issues/229) among the developers in reference so [Samuel Norman Seaborn](https://en.wikipedia.org/wiki/Sam_Seaborn) on The West Wing, per [stackexchange](https://stackoverflow.com/questions/41499857/seaborn-why-import-as-sns)\n", "````" ] }, { "cell_type": "code", "execution_count": 1, "id": "26e07b13", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import seaborn as sns" ] }, { "cell_type": "markdown", "id": "9e1d993b", "metadata": {}, "source": [ "Loading the data as usual." ] }, { "cell_type": "code", "execution_count": 2, "id": "dd771246", "metadata": {}, "outputs": [], "source": [ "data_url = 'https://raw.githubusercontent.com/brownsarahm/python-socialsci-files/master/data/SAFI_full_shortname.csv'" ] }, { "cell_type": "markdown", "id": "aa8de121", "metadata": {}, "source": [ "We know that the `key_id` column should be used as an index, not as data, so we'll use the `index_col` parameter t do that from the beginning." ] }, { "cell_type": "code", "execution_count": 3, "id": "9f6137ea", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | interview_date | \n", "quest_no | \n", "start | \n", "end | \n", "province | \n", "district | \n", "ward | \n", "village | \n", "years_farm | \n", "agr_assoc | \n", "... | \n", "items_owned | \n", "items_owned_other | \n", "no_meals | \n", "months_lack_food | \n", "no_food_mitigation | \n", "gps_Latitude | \n", "gps_Longitude | \n", "gps_Altitude | \n", "gps_Accuracy | \n", "instanceID | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
key_id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
1 | \n", "17 November 2016 | \n", "1 | \n", "2017-03-23T09:49:57.000Z | \n", "2017-04-02T17:29:08.000Z | \n", "Manica | \n", "Manica | \n", "Bandula | \n", "God | \n", "11 | \n", "no | \n", "... | \n", "['bicycle' ; 'television' ; 'solar_panel' ; ... | \n", "NaN | \n", "2 | \n", "['Jan'] | \n", "['na' ; 'rely_less_food' ; 'reduce_meals' ; ... | \n", "-19.112259 | \n", "33.483456 | \n", "698 | \n", "14.0 | \n", "uuid:ec241f2c-0609-46ed-b5e8-fe575f6cefef | \n", "
2 | \n", "17 November 2016 | \n", "1 | \n", "2017-04-02T09:48:16.000Z | \n", "2017-04-02T17:26:19.000Z | \n", "Manica | \n", "Manica | \n", "Bandula | \n", "God | \n", "2 | \n", "yes | \n", "... | \n", "['cow_cart' ; 'bicycle' ; 'radio' ; 'cow_pl... | \n", "NaN | \n", "2 | \n", "['Jan' ; 'Sept' ; 'Oct' ; 'Nov' ; 'Dec'] | \n", "['na' ; 'reduce_meals' ; 'restrict_adults' ;... | \n", "-19.112477 | \n", "33.483416 | \n", "690 | \n", "19.0 | \n", "uuid:099de9c9-3e5e-427b-8452-26250e840d6e | \n", "
3 | \n", "17 November 2016 | \n", "3 | \n", "2017-04-02T14:35:26.000Z | \n", "2017-04-02T17:26:53.000Z | \n", "Manica | \n", "Manica | \n", "Bandula | \n", "God | \n", "40 | \n", "no | \n", "... | \n", "['solar_torch'] | \n", "NaN | \n", "2 | \n", "['Jan' ; 'Feb' ; 'Mar' ; 'Oct' ; 'Nov' ; ... | \n", "['na' ; 'restrict_adults' ; 'lab_ex_food'] | \n", "-19.112108 | \n", "33.483450 | \n", "674 | \n", "13.0 | \n", "uuid:193d7daf-9582-409b-bf09-027dd36f9007 | \n", "
4 | \n", "17 November 2016 | \n", "4 | \n", "2017-04-02T14:55:18.000Z | \n", "2017-04-02T17:27:16.000Z | \n", "Manica | \n", "Manica | \n", "Bandula | \n", "God | \n", "6 | \n", "no | \n", "... | \n", "['bicycle' ; 'radio' ; 'cow_plough' ; 'sola... | \n", "NaN | \n", "2 | \n", "['Sept' ; 'Oct' ; 'Nov' ; 'Dec'] | \n", "['na' ; 'reduce_meals' ; 'restrict_adults' ;... | \n", "-19.112229 | \n", "33.483424 | \n", "679 | \n", "5.0 | \n", "uuid:148d1105-778a-4755-aa71-281eadd4a973 | \n", "
5 | \n", "17 November 2016 | \n", "5 | \n", "2017-04-02T15:10:35.000Z | \n", "2017-04-02T17:27:35.000Z | \n", "Manica | \n", "Manica | \n", "Bandula | \n", "God | \n", "18 | \n", "no | \n", "... | \n", "['motorcyle' ; 'radio' ; 'cow_plough' ; 'mo... | \n", "NaN | \n", "2 | \n", "['Aug' ; 'Sept' ; 'Oct' ; 'Nov'] | \n", "['na' ; 'go_forest' ; 'migrate'] | \n", "-19.112217 | \n", "33.483425 | \n", "689 | \n", "10.0 | \n", "uuid:2c867811-9696-4966-9866-f35c3e97d02d | \n", "
5 rows × 64 columns
\n", "