4. Iterables and Indexing#

4.1. Logistics#

4.1.1. Assignment#

On Assignment 2, there’s a script to run to prepare it for grading.

Important

I missed a file in the template, so please create a file called requirements.txt that has the following contents.

jupyter-book
pyppeteer
pandas
jupytext

Correct script file (for .githu/workflows/submit.yml)

name: Prepare and Submit
on:
  workflow_dispatch

jobs:
  generatereport:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2

    # Install dependencies
    - name: Set up Python 3.9
      uses: actions/setup-python@v1
      with:
        python-version: 3.9

    # install dependencies
    - name: dependencies
      run: pip install -r requirements.txt

    # generate the report
    - name: Convert notebooks to md for reading
      run: |
        jupytext --to myst *.ipynb
        jupytext --to myst */*.ipynb

    # compile the pdf
    - name: Build a basic html to pdf
      run: |
        jupyter-book build *.ipynb --builder pdfhtml

    # start a pull request for it
    - name: Create Pull Request
      uses: peter-evans/create-pull-request@v4
      with:
        commit-message: 'convert to md'
        draft: false
        branch: published
        base: main
        title: Submission

Add this to .gitignore

_build/

How to apply these updates:

full view

To run the script

see instructions in a full page

4.1.2. Grade Tracking#

Create a Project board

Then create your repo and run the manual workflow on that actions tab to create issues:

4.2. Lists of lists#

import pandas as pd

Recall how indexing works with negagive numbers

topics = ['what is data science', 'jupyter', 'conditional','functions', 'lists', 'dictionaries','pandas' ]
topics[-1]
'pandas'

A list we built together:

# You can remain anonymous (this page & the notes will be fully public)
# by attributing it to a celebrity or psuedonym, but include *some* sort of attribution
sentence_list = [
"The class is just starting to feel settled for me. - Dr. Brown",
"",
"Hello, I like sushi! - ",
"Why squared  aka the mask - is a computer science student."
"Data science is fun",
"Hello my fellow gaymers - Sun Tzu",
"Soccer is a sport -Obama",
"Hello, I love pizza - Bear",
"This class is CSC/DSP 310. - Student",
"It is 2:21pm -",
"Pizza conquers all- Beetlejuice"
"",
"ayyy whaddup wit it - frankie",
"This is a sentence - George W Bush",
"Steam is the best place to play videogames change my mind. - Todd Howard",
"This is a hello -",
"Hello how are you -",
"The monkey likes bananas. - A banana",
"",
"Just type a random sentence - Rosa Parks",
"",
"I love CSC. - Everyone",
"",
"The quick brown fox jumps over the lazy dog - Brendan Chadwick",
"I like computers - David",
"",
"The fitness gram pacer test is a multi aerobic capacity test - Matt 3",
"Sally sells seashells by the seashore. - Narrator",
"I would like to take a nap. - Tom Cruise,"
]

We can confirm this is a list

type(sentence_list)
list

We can use a list comprehension to remove the empty ones.

sentences_clean = [s for s in sentence_list if len(s)>1]
sentences_clean
['The class is just starting to feel settled for me. - Dr. Brown',
 'Hello, I like sushi! - ',
 'Why squared  aka the mask - is a computer science student.Data science is fun',
 'Hello my fellow gaymers - Sun Tzu',
 'Soccer is a sport -Obama',
 'Hello, I love pizza - Bear',
 'This class is CSC/DSP 310. - Student',
 'It is 2:21pm -',
 'Pizza conquers all- Beetlejuice',
 'ayyy whaddup wit it - frankie',
 'This is a sentence - George W Bush',
 'Steam is the best place to play videogames change my mind. - Todd Howard',
 'This is a hello -',
 'Hello how are you -',
 'The monkey likes bananas. - A banana',
 'Just type a random sentence - Rosa Parks',
 'I love CSC. - Everyone',
 'The quick brown fox jumps over the lazy dog - Brendan Chadwick',
 'I like computers - David',
 'The fitness gram pacer test is a multi aerobic capacity test - Matt 3',
 'Sally sells seashells by the seashore. - Narrator',
 'I would like to take a nap. - Tom Cruise,']

We can use the split method on a string to make a list of lists

sentence_data = [s.split('-') for s in sentences_clean]
sentence_data
[['The class is just starting to feel settled for me. ', ' Dr. Brown'],
 ['Hello, I like sushi! ', ' '],
 ['Why squared  aka the mask ',
  ' is a computer science student.Data science is fun'],
 ['Hello my fellow gaymers ', ' Sun Tzu'],
 ['Soccer is a sport ', 'Obama'],
 ['Hello, I love pizza ', ' Bear'],
 ['This class is CSC/DSP 310. ', ' Student'],
 ['It is 2:21pm ', ''],
 ['Pizza conquers all', ' Beetlejuice'],
 ['ayyy whaddup wit it ', ' frankie'],
 ['This is a sentence ', ' George W Bush'],
 ['Steam is the best place to play videogames change my mind. ',
  ' Todd Howard'],
 ['This is a hello ', ''],
 ['Hello how are you ', ''],
 ['The monkey likes bananas. ', ' A banana'],
 ['Just type a random sentence ', ' Rosa Parks'],
 ['I love CSC. ', ' Everyone'],
 ['The quick brown fox jumps over the lazy dog ', ' Brendan Chadwick'],
 ['I like computers ', ' David'],
 ['The fitness gram pacer test is a multi aerobic capacity test ', ' Matt 3'],
 ['Sally sells seashells by the seashore. ', ' Narrator'],
 ['I would like to take a nap. ', ' Tom Cruise,']]

then we can use the DataFrame Constructor)

sentence_df = pd.DataFrame(data= sentence_data, columns=['sentence','attribution'])

We can use head by default for 5

sentence_df.head()
sentence attribution
0 The class is just starting to feel settled for... Dr. Brown
1 Hello, I like sushi!
2 Why squared aka the mask is a computer science student.Data science is...
3 Hello my fellow gaymers Sun Tzu
4 Soccer is a sport Obama

or pass a nubmer to get a different number of rows

sentence_df.head(3)
sentence attribution
0 The class is just starting to feel settled for... Dr. Brown
1 Hello, I like sushi!
2 Why squared aka the mask is a computer science student.Data science is...

the loc property can index on rows or columns, but is rows by deault

sentence_df.loc[3]
sentence       Hello my fellow gaymers 
attribution                     Sun Tzu
Name: 3, dtype: object

We can select a range, like in base python with a colon

sentence_df.loc[3:5]
sentence attribution
3 Hello my fellow gaymers Sun Tzu
4 Soccer is a sport Obama
5 Hello, I love pizza Bear

4.3. Loading a json#

We can load a json using the read_json method the same way we used the read_csv

rhodyprog4ds_gh_events_url = 'https://api.github.com/orgs/rhodyprog4ds/events'
pd.read_json(rhodyprog4ds_gh_events_url)
id type actor repo payload public created_at org
0 25983496729 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 12047706003, 'size': 1, 'distinct_... True 2022-12-19 21:12:05+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
1 25943982635 PushEvent {'id': 41898282, 'login': 'github-actions[bot]... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 12025213311, 'size': 1, 'distinct_... True 2022-12-16 21:42:14+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
2 25943825627 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 12025131284, 'size': 1, 'distinct_... True 2022-12-16 21:31:27+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
3 25872136902 PushEvent {'id': 41898282, 'login': 'github-actions[bot]... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11989885243, 'size': 1, 'distinct_... True 2022-12-14 04:00:44+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
4 25872131897 PushEvent {'id': 41898282, 'login': 'github-actions[bot]... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11989882657, 'size': 1, 'distinct_... True 2022-12-14 04:00:20+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
5 25872088863 ReleaseEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'action': 'published', 'release': {'url': 'ht... True 2022-12-14 03:56:34+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
6 25872058273 CreateEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'ref': 'c32', 'ref_type': 'tag', 'master_bran... True 2022-12-14 03:53:56+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
7 25872053767 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11989842670, 'size': 1, 'distinct_... True 2022-12-14 03:53:31+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
8 25872040714 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11989835972, 'size': 2, 'distinct_... True 2022-12-14 03:52:24+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
9 25832860080 PushEvent {'id': 41898282, 'login': 'github-actions[bot]... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11970836092, 'size': 1, 'distinct_... True 2022-12-12 17:17:07+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
10 25832610823 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11970721871, 'size': 1, 'distinct_... True 2022-12-12 17:07:27+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
11 25786395843 MemberEvent {'id': 69595187, 'login': 'rhodyprog4ds', 'dis... {'id': 576430088, 'name': 'rhodyprog4ds/hands-... {'member': {'login': 'stubbsdiondra', 'id': 83... True 2022-12-09 20:58:22+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
12 25734197812 ForkEvent {'id': 17578666, 'login': 'andresavage', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'forkee': {'id': 575613543, 'node_id': 'R_kgD... True 2022-12-07 22:57:27+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
13 25707552350 PushEvent {'id': 41898282, 'login': 'github-actions[bot]... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11907999512, 'size': 1, 'distinct_... True 2022-12-07 02:31:25+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
14 25707409089 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11907927389, 'size': 1, 'distinct_... True 2022-12-07 02:21:11+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
15 25707350606 PushEvent {'id': 41898282, 'login': 'github-actions[bot]... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11907897314, 'size': 1, 'distinct_... True 2022-12-07 02:17:06+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
16 25707319331 ReleaseEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'action': 'published', 'release': {'url': 'ht... True 2022-12-07 02:14:59+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
17 25707286223 CreateEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'ref': 'c31', 'ref_type': 'tag', 'master_bran... True 2022-12-07 02:12:45+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
18 25707236522 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11907839113, 'size': 1, 'distinct_... True 2022-12-07 02:09:26+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
19 25559123842 ForkEvent {'id': 119482217, 'login': 'thuthaont', 'displ... {'id': 287067905, 'name': 'rhodyprog4ds/portfo... {'forkee': {'id': 572439723, 'node_id': 'R_kgD... True 2022-11-30 09:26:11+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
20 25454470489 PushEvent {'id': 41898282, 'login': 'github-actions[bot]... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11775411829, 'size': 1, 'distinct_... True 2022-11-24 13:07:35+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
21 25454293906 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11775323125, 'size': 1, 'distinct_... True 2022-11-24 13:00:08+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
22 25443448383 PushEvent {'id': 41898282, 'login': 'github-actions[bot]... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11769796411, 'size': 1, 'distinct_... True 2022-11-24 02:14:06+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
23 25443372137 ReleaseEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'action': 'published', 'release': {'url': 'ht... True 2022-11-24 02:07:04+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
24 25443352568 CreateEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'ref': 'c30', 'ref_type': 'tag', 'master_bran... True 2022-11-24 02:05:18+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
25 25443343932 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11769742784, 'size': 1, 'distinct_... True 2022-11-24 02:04:31+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
26 25390211963 PushEvent {'id': 41898282, 'login': 'github-actions[bot]... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11742950045, 'size': 1, 'distinct_... True 2022-11-22 02:55:20+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
27 25390113561 ReleaseEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'action': 'published', 'release': {'url': 'ht... True 2022-11-22 02:47:58+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
28 25390087086 CreateEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'ref': 'c29', 'ref_type': 'tag', 'master_bran... True 2022-11-22 02:45:58+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...
29 25390070156 PushEvent {'id': 10656079, 'login': 'brownsarahm', 'disp... {'id': 532028859, 'name': 'rhodyprog4ds/BrownF... {'push_id': 11742876923, 'size': 1, 'distinct_... True 2022-11-22 02:44:44+00:00 {'id': 69595187, 'login': 'rhodyprog4ds', 'gra...

4.4. Working with your repo offline#

Warning

This was not done in class and is optional

4.4.1. Authenticate on Mac#

On macOS install GitHub CLI

gh auth login

use defualts and choose to log in via browser.

4.4.2. Authenticate on Windows#

On winows install GitBash

Then try to do the clone step and GitBash will help you authenticate

4.4.3. Work offline#

Get your repo URL:

cd to where you want to save

cd prog4ds
git clone https://github.com/rhodyprog4ds/02-loading-data-brownsarahm.git

work in the new folder that creates

When you want to Save

git add .
git commit -m 'describe the work you did'
git push

4.5. Questions After Class#

4.5.1. Logistics#

4.5.1.1. where do we find the grading page?#

4.5.2. Assignemnt#

4.5.2.1. what are the keys needed on the dictionaries for the assignment?#

See the datasets.py file in the template repo

4.5.2.2. do we have to accept assignment 2 anywhere and if so how#

Yes on the assignment page. The link says “accept the assignment”

4.5.2.3. For the purposes of the assignment should we download it locally to work with our notebooks?#

Read the instructions carefully on the assignment. It tells you exactly what to do.

4.5.3. Content#

4.5.3.1. How do you locate a specific row and column from a dataframe?#

.loc accepts both, using a comma to separate. The docs for loc have lots of examples.

4.5.3.2. with data sets if there is an error with formatting and we can modify the original how would we fix it#

Download to a copy where you can edit.

4.5.3.3. Can you use .loc to pull out multiple rows that aren’t next to each other. For example, if I wanted to view rows 3, 8 and 12#

Yes, to select multiple nonconsecutive, you pass a list. The docs for loc have lots of examples.

4.5.3.4. how can we iterate through dictionaries#

dictionaries have a .items() method that pops off tuples of th key and value.

Warning

the assignment does not ask you to iterate through a dictionary object, but over a list of dictionaries

4.5.3.5. What is the main differnce between JSON and csv files; does one allocate more memory / store larger sets?#

The main difference is the structure. JSON can hold nested data. For example look at the GitHub data that we read in in class.

4.5.3.6. what exactly does json mean / do#

json is a data file format. It is an acronym for JavaScript Object Notation. It’s a popular format for internet content.

4.5.3.7. are nested lists the only way to create DataFrames in python#

nested lists are not the only way to create pandas DataFrames, you can also do that from a Dictionary. see in the docs for the constructor.

4.5.3.8. How do nested loops work in the jupyter notebook#

Python constructs other than display items work just as they do in any other interpretter in a jupyter notebook. The list comprehension that we saw today also works in base Python. You can nest list comphrehenions in different ways depending on your goal.