Iterables and Indexing
Contents
4. Iterables and Indexing#
4.1. Logistics#
4.1.1. Assignment#
On Assignment 2, there’s a script to run to prepare it for grading.
Important
I missed a file in the template, so please create a file called requirements.txt that has the following contents.
jupyter-book
pyppeteer
pandas
jupytext
Correct script file (for .githu/workflows/submit.yml)
name: Prepare and Submit
on:
workflow_dispatch
jobs:
generatereport:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
# Install dependencies
- name: Set up Python 3.9
uses: actions/setup-python@v1
with:
python-version: 3.9
# install dependencies
- name: dependencies
run: pip install -r requirements.txt
# generate the report
- name: Convert notebooks to md for reading
run: |
jupytext --to myst *.ipynb
jupytext --to myst */*.ipynb
# compile the pdf
- name: Build a basic html to pdf
run: |
jupyter-book build *.ipynb --builder pdfhtml
# start a pull request for it
- name: Create Pull Request
uses: peter-evans/create-pull-request@v4
with:
commit-message: 'convert to md'
draft: false
branch: published
base: main
title: Submission
Add this to .gitignore
_build/
How to apply these updates:
To run the script
4.1.2. Grade Tracking#
Create a Project board
Then create your repo and run the manual workflow on that actions tab to create issues:
4.2. Lists of lists#
import pandas as pd
Recall how indexing works with negagive numbers
topics = ['what is data science', 'jupyter', 'conditional','functions', 'lists', 'dictionaries','pandas' ]
topics[-1]
'pandas'
A list we built together:
# You can remain anonymous (this page & the notes will be fully public)
# by attributing it to a celebrity or psuedonym, but include *some* sort of attribution
sentence_list = [
"The class is just starting to feel settled for me. - Dr. Brown",
"",
"Hello, I like sushi! - ",
"Why squared aka the mask - is a computer science student."
"Data science is fun",
"Hello my fellow gaymers - Sun Tzu",
"Soccer is a sport -Obama",
"Hello, I love pizza - Bear",
"This class is CSC/DSP 310. - Student",
"It is 2:21pm -",
"Pizza conquers all- Beetlejuice"
"",
"ayyy whaddup wit it - frankie",
"This is a sentence - George W Bush",
"Steam is the best place to play videogames change my mind. - Todd Howard",
"This is a hello -",
"Hello how are you -",
"The monkey likes bananas. - A banana",
"",
"Just type a random sentence - Rosa Parks",
"",
"I love CSC. - Everyone",
"",
"The quick brown fox jumps over the lazy dog - Brendan Chadwick",
"I like computers - David",
"",
"The fitness gram pacer test is a multi aerobic capacity test - Matt 3",
"Sally sells seashells by the seashore. - Narrator",
"I would like to take a nap. - Tom Cruise,"
]
We can confirm this is a list
type(sentence_list)
list
We can use a list comprehension to remove the empty ones.
sentences_clean = [s for s in sentence_list if len(s)>1]
sentences_clean
['The class is just starting to feel settled for me. - Dr. Brown',
'Hello, I like sushi! - ',
'Why squared aka the mask - is a computer science student.Data science is fun',
'Hello my fellow gaymers - Sun Tzu',
'Soccer is a sport -Obama',
'Hello, I love pizza - Bear',
'This class is CSC/DSP 310. - Student',
'It is 2:21pm -',
'Pizza conquers all- Beetlejuice',
'ayyy whaddup wit it - frankie',
'This is a sentence - George W Bush',
'Steam is the best place to play videogames change my mind. - Todd Howard',
'This is a hello -',
'Hello how are you -',
'The monkey likes bananas. - A banana',
'Just type a random sentence - Rosa Parks',
'I love CSC. - Everyone',
'The quick brown fox jumps over the lazy dog - Brendan Chadwick',
'I like computers - David',
'The fitness gram pacer test is a multi aerobic capacity test - Matt 3',
'Sally sells seashells by the seashore. - Narrator',
'I would like to take a nap. - Tom Cruise,']
We can use the split
method on a string to make a list of lists
sentence_data = [s.split('-') for s in sentences_clean]
sentence_data
[['The class is just starting to feel settled for me. ', ' Dr. Brown'],
['Hello, I like sushi! ', ' '],
['Why squared aka the mask ',
' is a computer science student.Data science is fun'],
['Hello my fellow gaymers ', ' Sun Tzu'],
['Soccer is a sport ', 'Obama'],
['Hello, I love pizza ', ' Bear'],
['This class is CSC/DSP 310. ', ' Student'],
['It is 2:21pm ', ''],
['Pizza conquers all', ' Beetlejuice'],
['ayyy whaddup wit it ', ' frankie'],
['This is a sentence ', ' George W Bush'],
['Steam is the best place to play videogames change my mind. ',
' Todd Howard'],
['This is a hello ', ''],
['Hello how are you ', ''],
['The monkey likes bananas. ', ' A banana'],
['Just type a random sentence ', ' Rosa Parks'],
['I love CSC. ', ' Everyone'],
['The quick brown fox jumps over the lazy dog ', ' Brendan Chadwick'],
['I like computers ', ' David'],
['The fitness gram pacer test is a multi aerobic capacity test ', ' Matt 3'],
['Sally sells seashells by the seashore. ', ' Narrator'],
['I would like to take a nap. ', ' Tom Cruise,']]
then we can use the DataFrame Constructor)
sentence_df = pd.DataFrame(data= sentence_data, columns=['sentence','attribution'])
We can use head by default for 5
sentence_df.head()
sentence | attribution | |
---|---|---|
0 | The class is just starting to feel settled for... | Dr. Brown |
1 | Hello, I like sushi! | |
2 | Why squared aka the mask | is a computer science student.Data science is... |
3 | Hello my fellow gaymers | Sun Tzu |
4 | Soccer is a sport | Obama |
or pass a nubmer to get a different number of rows
sentence_df.head(3)
sentence | attribution | |
---|---|---|
0 | The class is just starting to feel settled for... | Dr. Brown |
1 | Hello, I like sushi! | |
2 | Why squared aka the mask | is a computer science student.Data science is... |
the loc
property can index on rows or columns, but is rows by deault
sentence_df.loc[3]
sentence Hello my fellow gaymers
attribution Sun Tzu
Name: 3, dtype: object
We can select a range, like in base python with a colon
sentence_df.loc[3:5]
sentence | attribution | |
---|---|---|
3 | Hello my fellow gaymers | Sun Tzu |
4 | Soccer is a sport | Obama |
5 | Hello, I love pizza | Bear |
4.3. Loading a json#
We can load a json using the read_json
method the same way we used the read_csv
rhodyprog4ds_gh_events_url = 'https://api.github.com/orgs/rhodyprog4ds/events'
pd.read_json(rhodyprog4ds_gh_events_url)
id | type | actor | repo | payload | public | created_at | org | |
---|---|---|---|---|---|---|---|---|
0 | 25983496729 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 12047706003, 'size': 1, 'distinct_... | True | 2022-12-19 21:12:05+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
1 | 25943982635 | PushEvent | {'id': 41898282, 'login': 'github-actions[bot]... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 12025213311, 'size': 1, 'distinct_... | True | 2022-12-16 21:42:14+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
2 | 25943825627 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 12025131284, 'size': 1, 'distinct_... | True | 2022-12-16 21:31:27+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
3 | 25872136902 | PushEvent | {'id': 41898282, 'login': 'github-actions[bot]... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11989885243, 'size': 1, 'distinct_... | True | 2022-12-14 04:00:44+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
4 | 25872131897 | PushEvent | {'id': 41898282, 'login': 'github-actions[bot]... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11989882657, 'size': 1, 'distinct_... | True | 2022-12-14 04:00:20+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
5 | 25872088863 | ReleaseEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'action': 'published', 'release': {'url': 'ht... | True | 2022-12-14 03:56:34+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
6 | 25872058273 | CreateEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'ref': 'c32', 'ref_type': 'tag', 'master_bran... | True | 2022-12-14 03:53:56+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
7 | 25872053767 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11989842670, 'size': 1, 'distinct_... | True | 2022-12-14 03:53:31+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
8 | 25872040714 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11989835972, 'size': 2, 'distinct_... | True | 2022-12-14 03:52:24+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
9 | 25832860080 | PushEvent | {'id': 41898282, 'login': 'github-actions[bot]... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11970836092, 'size': 1, 'distinct_... | True | 2022-12-12 17:17:07+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
10 | 25832610823 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11970721871, 'size': 1, 'distinct_... | True | 2022-12-12 17:07:27+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
11 | 25786395843 | MemberEvent | {'id': 69595187, 'login': 'rhodyprog4ds', 'dis... | {'id': 576430088, 'name': 'rhodyprog4ds/hands-... | {'member': {'login': 'stubbsdiondra', 'id': 83... | True | 2022-12-09 20:58:22+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
12 | 25734197812 | ForkEvent | {'id': 17578666, 'login': 'andresavage', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'forkee': {'id': 575613543, 'node_id': 'R_kgD... | True | 2022-12-07 22:57:27+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
13 | 25707552350 | PushEvent | {'id': 41898282, 'login': 'github-actions[bot]... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11907999512, 'size': 1, 'distinct_... | True | 2022-12-07 02:31:25+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
14 | 25707409089 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11907927389, 'size': 1, 'distinct_... | True | 2022-12-07 02:21:11+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
15 | 25707350606 | PushEvent | {'id': 41898282, 'login': 'github-actions[bot]... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11907897314, 'size': 1, 'distinct_... | True | 2022-12-07 02:17:06+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
16 | 25707319331 | ReleaseEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'action': 'published', 'release': {'url': 'ht... | True | 2022-12-07 02:14:59+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
17 | 25707286223 | CreateEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'ref': 'c31', 'ref_type': 'tag', 'master_bran... | True | 2022-12-07 02:12:45+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
18 | 25707236522 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11907839113, 'size': 1, 'distinct_... | True | 2022-12-07 02:09:26+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
19 | 25559123842 | ForkEvent | {'id': 119482217, 'login': 'thuthaont', 'displ... | {'id': 287067905, 'name': 'rhodyprog4ds/portfo... | {'forkee': {'id': 572439723, 'node_id': 'R_kgD... | True | 2022-11-30 09:26:11+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
20 | 25454470489 | PushEvent | {'id': 41898282, 'login': 'github-actions[bot]... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11775411829, 'size': 1, 'distinct_... | True | 2022-11-24 13:07:35+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
21 | 25454293906 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11775323125, 'size': 1, 'distinct_... | True | 2022-11-24 13:00:08+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
22 | 25443448383 | PushEvent | {'id': 41898282, 'login': 'github-actions[bot]... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11769796411, 'size': 1, 'distinct_... | True | 2022-11-24 02:14:06+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
23 | 25443372137 | ReleaseEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'action': 'published', 'release': {'url': 'ht... | True | 2022-11-24 02:07:04+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
24 | 25443352568 | CreateEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'ref': 'c30', 'ref_type': 'tag', 'master_bran... | True | 2022-11-24 02:05:18+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
25 | 25443343932 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11769742784, 'size': 1, 'distinct_... | True | 2022-11-24 02:04:31+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
26 | 25390211963 | PushEvent | {'id': 41898282, 'login': 'github-actions[bot]... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11742950045, 'size': 1, 'distinct_... | True | 2022-11-22 02:55:20+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
27 | 25390113561 | ReleaseEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'action': 'published', 'release': {'url': 'ht... | True | 2022-11-22 02:47:58+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
28 | 25390087086 | CreateEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'ref': 'c29', 'ref_type': 'tag', 'master_bran... | True | 2022-11-22 02:45:58+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
29 | 25390070156 | PushEvent | {'id': 10656079, 'login': 'brownsarahm', 'disp... | {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... | {'push_id': 11742876923, 'size': 1, 'distinct_... | True | 2022-11-22 02:44:44+00:00 | {'id': 69595187, 'login': 'rhodyprog4ds', 'gra... |
4.4. Working with your repo offline#
Warning
This was not done in class and is optional
4.4.1. Authenticate on Mac#
On macOS install GitHub CLI
gh auth login
use defualts and choose to log in via browser.
4.4.2. Authenticate on Windows#
On winows install GitBash
Then try to do the clone step and GitBash will help you authenticate
4.4.3. Work offline#
Get your repo URL:
cd
to where you want to save
cd prog4ds
git clone https://github.com/rhodyprog4ds/02-loading-data-brownsarahm.git
work in the new folder that creates
When you want to Save
git add .
git commit -m 'describe the work you did'
git push
4.5. Questions After Class#
4.5.1. Logistics#
4.5.1.1. where do we find the grading page?#
4.5.2. Assignemnt#
4.5.2.1. what are the keys needed on the dictionaries for the assignment?#
See the datasets.py
file in the template repo
4.5.2.2. do we have to accept assignment 2 anywhere and if so how#
Yes on the assignment page. The link says “accept the assignment”
4.5.2.3. For the purposes of the assignment should we download it locally to work with our notebooks?#
Read the instructions carefully on the assignment. It tells you exactly what to do.
4.5.3. Content#
4.5.3.1. How do you locate a specific row and column from a dataframe?#
.loc
accepts both, using a comma to separate. The docs for loc
have lots of examples.
4.5.3.2. with data sets if there is an error with formatting and we can modify the original how would we fix it#
Download to a copy where you can edit.
4.5.3.3. Can you use .loc to pull out multiple rows that aren’t next to each other. For example, if I wanted to view rows 3, 8 and 12#
Yes, to select multiple nonconsecutive, you pass a list. The docs for loc
have lots of examples.
4.5.3.4. how can we iterate through dictionaries#
dictionaries have a .items()
method that pops off tuples of th key and value.
Warning
the assignment does not ask you to iterate through a dictionary object, but over a list of dictionaries
4.5.3.5. What is the main differnce between JSON and csv files; does one allocate more memory / store larger sets?#
The main difference is the structure. JSON can hold nested data. For example look at the GitHub data that we read in in class.
4.5.3.6. what exactly does json mean / do#
json is a data file format. It is an acronym for JavaScript Object Notation. It’s a popular format for internet content.
4.5.3.7. are nested lists the only way to create DataFrames in python#
nested lists are not the only way to create pandas DataFrames, you can also do that from a Dictionary. see in the docs for the constructor.
4.5.3.8. How do nested loops work in the jupyter notebook#
Python constructs other than display items work just as they do in any other interpretter in a jupyter notebook. The list comprehension that we saw today also works in base Python. You can nest list comphrehenions in different ways depending on your goal.