3. Getting help, object inspection, loading data#

3.1. First, Don’t Worry members#

Class Response Summary:

blabla

calm down

funny CS memes

Will Smith don't worry meme

lebron w tweet "we jsut left a rest stop! Why didnt you go then"

mandatory alt text

I leave my clothes on my bed and chair

3.2. Getting Help in Jupyter#

Python has a print function and we can use the help in jupyter to learn about how to use it in different ways.

Given this code excerpt, how could you print out “Sarah_Brown”?

first = 'Sarah'
last = 'Brown'

We can use jupyter popup help wiht shift +tab or ?

print?

Or the base python help function

help(print)
Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.

Notice that function can take multiple arguments and has a keyword argument (must be used like argument=value) described as sep=' '. This means that by default it adds a space

print(first,last)
Sarah Brown

But we can change the separator.

print(first,last, sep='_')
# shift + tab for help
Sarah_Brown

Note that it also defaults to end to use \n

print(first,last)
print('hello')
Sarah Brown
hello

Where does this help information come from?

def compute_grade(num_level1,num_level2,num_level3):
    '''
    Computes a grade for CSC/DSP310 from numbers of achievements at each level

    Parameters:
    ------------
    num_level1 : int
      number of level 1 achievements earned
    num_level2 : int
      number of level 2 achievements earned
    num_level3 : int
      number of level 3 achievements earned

    Returns:
    --------
    letter_grade : string
      letter grade with modifier (+/-)
    '''
    if num_level1 == 15:
        if num_level2 == 15:
            if num_level3 == 15:
                grade = 'A'
            elif num_level3 >= 10:
                grade = 'A-'
            elif num_level3 >=5:
                grade = 'B+'
            else:
                grade = 'B'
        elif num_level2 >=10:
            grade = 'B-'
        elif num_level2 >=5:
            grade = 'C+'
        else:
            grade = 'C'
    elif num_level1 >= 10:
        grade = 'C-'
    elif num_level1 >= 5:
        grade = 'D+'
    elif num_level1 >=3:
        grade = 'D'
    else:
        grade = 'F'


    return grade

We can apply help on the function we wrote

help(compute_grade)
Help on function compute_grade in module __main__:

compute_grade(num_level1, num_level2, num_level3)
    Computes a grade for CSC/DSP310 from numbers of achievements at each level
    
    Parameters:
    ------------
    num_level1 : int
      number of level 1 achievements earned
    num_level2 : int
      number of level 2 achievements earned
    num_level3 : int
      number of level 3 achievements earned
    
    Returns:
    --------
    letter_grade : string
      letter grade with modifier (+/-)

It gets the docstring

3.3. Everything is an Object in Python#

we can use the builtin function type to inspect them, and get attributes with .

type(compute_grade)
function
compute_grade.__name__
'compute_grade'
c = 4.5
type(c)
float
c= 'hello'
type(c)
str

When do we use single vs double quotes?

  • You can use either, unless you need to put one inside the string then use the other.

my_sentence = "The professor's name is Dr. Brown"
my_sentence = 'The professor's name is Dr. Brown'
  Input In [15]
    my_sentence = 'The professor's name is Dr. Brown'
                                 ^
SyntaxError: invalid syntax

Yes we can escape special characters:

my_sentence = 'The professor\'s name is Dr. Brown'

but, it’s less readable and not recommended.

3.4. Good Code is always relative#

In programming for data science, we are often trying to tell a story.

Try it yourself

How might this goal change your code for this class relative to other code you have written or could imagine writing?

Python is a fully open source project and as such is governed by community standards and conventions.

Try it yourself

Find PEP8 (note that following it is part of earning python achievements)

The documentation for the full language is online too.

Guido van Rossum was the first main developer and wrote essays about python too.

it’s pretty popular

3.5. Coffee Data#

We’re going to use a dataset about coffee quality today.

How was this dataset collected?

  • reviewrs added to DB

  • then scraped

Where did it come from?

  • offee Quality Institute’s trained reviewers.

what format is it provided in?

  • csv (Comma Separated Values)

what other information is in this repository?

  • the code to scrape

Get raw url for the dataset click on the raw button on the csv page, then copy the url. a screenshot from github of the data file page with the raw button circled in pink

We’ll save that url as a variable to work with it.

data_url = 'https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/robusta_data_cleaned.csv'

We will use a library called Pandas

import pandas as pd
# import library and give it an alias (nickname) pd
pd.read_csv(data_url)
Unnamed: 0 Species Owner Country.of.Origin Farm.Name Lot.Number Mill ICO.Number Company Altitude ... Color Category.Two.Defects Expiration Certification.Body Certification.Address Certification.Contact unit_of_measurement altitude_low_meters altitude_high_meters altitude_mean_meters
0 1 Robusta ankole coffee producers coop Uganda kyangundu cooperative society NaN ankole coffee producers 0 ankole coffee producers coop 1488 ... Green 2 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1488.0 1488.0 1488.0
1 2 Robusta nishant gurjer India sethuraman estate kaapi royale 25 sethuraman estate 14/1148/2017/21 kaapi royale 3170 ... NaN 2 October 31st, 2018 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3170.0 3170.0 3170.0
2 3 Robusta andrew hetzel India sethuraman estate NaN NaN 0000 sethuraman estate 1000m ... Green 0 April 29th, 2016 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
3 4 Robusta ugacof Uganda ugacof project area NaN ugacof 0 ugacof ltd 1212 ... Green 7 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1212.0 1212.0 1212.0
4 5 Robusta katuka development trust ltd Uganda katikamu capca farmers association NaN katuka development trust 0 katuka development trust ltd 1200-1300 ... Green 3 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1200.0 1300.0 1250.0
5 6 Robusta andrew hetzel India NaN NaN (self) NaN cafemakers, llc 3000' ... Green 0 February 28th, 2013 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3000.0 3000.0 3000.0
6 7 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m ... Green 0 May 15th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
7 8 Robusta nishant gurjer India sethuraman estate kaapi royale 7 sethuraman estate 14/1148/2017/18 kaapi royale 3140 ... Bluish-Green 0 October 25th, 2018 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3140.0 3140.0 3140.0
8 9 Robusta nishant gurjer India sethuraman estate RKR sethuraman estate 14/1148/2016/17 kaapi royale 1000 ... Green 0 August 17th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
9 10 Robusta ugacof Uganda ishaka NaN nsubuga umar 0 ugacof ltd 900-1300 ... Green 6 August 5th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 900.0 1300.0 1100.0
10 11 Robusta ugacof Uganda ugacof project area NaN ugacof 0 ugacof ltd 1095 ... Green 1 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1095.0 1095.0 1095.0
11 12 Robusta nishant gurjer India sethuraman estate kaapi royale RC AB sethuraman estate 14/1148/2016/12 kaapi royale 1000 ... Green 0 August 23rd, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
12 13 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m ... Green 1 May 19th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
13 14 Robusta kasozi coffee farmers association Uganda kasozi coffee farmers NaN NaN 0 kasozi coffee farmers association 1367 ... Green 7 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1367.0 1367.0 1367.0
14 15 Robusta ankole coffee producers coop Uganda kyangundu coop society NaN ankole coffee producers coop union ltd 0 ankole coffee producers coop 1488 ... Green 2 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1488.0 1488.0 1488.0
15 16 Robusta andrew hetzel India sethuraman estate NaN NaN 0000 sethuraman estate 1000m ... Green 0 April 29th, 2016 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
16 17 Robusta andrew hetzel India sethuraman estates NaN sethuraman estates NaN cafemakers, llc 750m ... Blue-Green 0 June 3rd, 2014 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
17 18 Robusta kawacom uganda ltd Uganda bushenyi NaN kawacom 0 kawacom uganda ltd 1600 ... Green 1 June 27th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1600.0 1600.0 1600.0
18 19 Robusta nitubaasa ltd Uganda kigezi coffee farmers association NaN nitubaasa 0 nitubaasa ltd 1745 ... Green 2 June 27th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1745.0 1745.0 1745.0
19 20 Robusta mannya coffee project Uganda mannya coffee project NaN mannya coffee project 0 mannya coffee project 1200 ... Green 1 June 27th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1200.0 1200.0 1200.0
20 21 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m ... Bluish-Green 1 May 19th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
21 22 Robusta andrew hetzel India sethuraman estates NaN sethuraman estates NaN cafemakers, llc 750m ... Green 0 June 20th, 2014 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
22 23 Robusta andrew hetzel United States sethuraman estates NaN sethuraman estates NaN cafemakers, llc 3000' ... Green 0 February 28th, 2013 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3000.0 3000.0 3000.0
23 24 Robusta luis robles Ecuador robustasa Lavado 1 our own lab NaN robustasa NaN ... Blue-Green 1 January 18th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN
24 25 Robusta luis robles Ecuador robustasa Lavado 3 own laboratory NaN robustasa 40 ... Blue-Green 0 January 18th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 40.0 40.0 40.0
25 26 Robusta james moore United States fazenda cazengo NaN cafe cazengo NaN global opportunity fund 795 meters ... NaN 6 December 23rd, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 795.0 795.0 795.0
26 27 Robusta cafe politico India NaN NaN NaN 14-1118-2014-0087 cafe politico NaN ... Green 1 August 25th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN
27 28 Robusta cafe politico Vietnam NaN NaN NaN NaN cafe politico NaN ... None 9 August 25th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN

28 rows × 44 columns

Try it yourself

Read the data in again, but with the index correct and save it to a variable.

coffee_df = pd.read_csv(data_url, index_col='Unnamed: 0')

Once we read it in, we van view the first 5 rows with the head method.

coffee_df.head()
Species Owner Country.of.Origin Farm.Name Lot.Number Mill ICO.Number Company Altitude Region ... Color Category.Two.Defects Expiration Certification.Body Certification.Address Certification.Contact unit_of_measurement altitude_low_meters altitude_high_meters altitude_mean_meters
1 Robusta ankole coffee producers coop Uganda kyangundu cooperative society NaN ankole coffee producers 0 ankole coffee producers coop 1488 sheema south western ... Green 2 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1488.0 1488.0 1488.0
2 Robusta nishant gurjer India sethuraman estate kaapi royale 25 sethuraman estate 14/1148/2017/21 kaapi royale 3170 chikmagalur karnataka indua ... NaN 2 October 31st, 2018 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3170.0 3170.0 3170.0
3 Robusta andrew hetzel India sethuraman estate NaN NaN 0000 sethuraman estate 1000m chikmagalur ... Green 0 April 29th, 2016 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
4 Robusta ugacof Uganda ugacof project area NaN ugacof 0 ugacof ltd 1212 central ... Green 7 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1212.0 1212.0 1212.0
5 Robusta katuka development trust ltd Uganda katikamu capca farmers association NaN katuka development trust 0 katuka development trust ltd 1200-1300 luwero central region ... Green 3 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1200.0 1300.0 1250.0

5 rows × 43 columns

Important

Remember to comment & annotate your code

3.6. Follow Up questions#

3.6.1. General Questions#

How do you create code to scrape data from a website and compile it into a csv file?

We won’t do too much of this in class, we’ll rely mostly on data that’s already organized into a data file of some sort.

We will read directly from html tables though. To scrape data that’s not structured or only partially structured there are libraries for that.

Will we be using pandas a lot during the semester?

3.6.2. Clarifying#

How do you auto finish your directories

in a terminal (and in jupyter and most code environments) you can use the tab key to complete things. We’ll keep coming back to this.

How do you properly shut down Jupyter Notebook

Shut down in the browser, close the tabs, then Ctrl+C in the terminal window.

Is pd some sort of variable we set or was it built in?

pd is an alias that we set when we ran the line import pandas as pd

How should I be organized for this class? Keep it all in a single folder? Keep it on GitHub?

I recommend keeping a folder for class ege CSC310 and then inside there create a notes folder where you keep the Notebooks we generate in class each day and clone the repository for each assignment in there. Everyt assignment will have a repo that will be on github, but your notes do not need to be in there.

You’ll only add specific things to your portfolio. More infomation to follow

I’m still not sure how to keep everything together in a portfolio for the semester?

Your portfolio will be carefully curated, not everything you do all semester.

I have questions about the parts of the first assignment,

How the grading works is posted in the grading page now. More information on the github stuff to follow.

I am still wondering if I am using anaconda or just normal terminal

on Windows: anaconda prompt for python things and GitBash for for git things

Can I push this code into my portfolio using the anaconda terminal

Gti probably won’t work on anaconda terminal.

3.6.3. Grading Questions#

How do we keep track of which achievements we’ve earned?

They will get posted to Brightspace in the grade section. I’ll notify you when we post and make them visible and you should check that Brightspace matches what we say on your code and e-mail if it doesn’t.

I don’t really have many questions from today, but I was wondering if office hours were posted.

Not yet, still trying to find times, but I will post an announcement when they are posted.

Will we always submit homework through the portfolio folder in github?

It will always be on GitHub, but each assignment will have its own repository

I’m just confused as how to view my feedback from the assignment

On your

3.6.4. Questions we’ll answer later this week#

  • does each column have a number assigned to it in data frames?

  • Can other data types be imported into a notebook and edited the same way as .csv files?

3.7. More Practice#

  • How could you check if pd is built in or if we defined it?

  • If we wanted to see more than 5 rows when printing the head of the dataset how would we do so?

Ram Token Opportunity

Contribute possible practice questions to the notes using the suggest an edit button behind the GitHub menu at the top of the page.