Getting help, object inspection, loading data
Contents
3. Getting help, object inspection, loading data#
3.1. First, Don’t Worry members#
Class Response Summary:
3.2. Getting Help in Jupyter#
Python has a print
function and we can use the help in jupyter to learn about
how to use it in different ways.
Given this code excerpt, how could you print out “Sarah_Brown”?
first = 'Sarah'
last = 'Brown'
We can use jupyter popup help wiht shift +tab or ?
print?
Or the base python help
function
help(print)
Help on built-in function print in module builtins:
print(...)
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Notice that function can take multiple arguments and has a
keyword argument (must be used like argument=value
) described as sep=' '
.
This means that by default it adds a space
print(first,last)
Sarah Brown
But we can change the separator.
print(first,last, sep='_')
# shift + tab for help
Sarah_Brown
Note that it also defaults to end to use \n
print(first,last)
print('hello')
Sarah Brown
hello
Where does this help information come from?
def compute_grade(num_level1,num_level2,num_level3):
'''
Computes a grade for CSC/DSP310 from numbers of achievements at each level
Parameters:
------------
num_level1 : int
number of level 1 achievements earned
num_level2 : int
number of level 2 achievements earned
num_level3 : int
number of level 3 achievements earned
Returns:
--------
letter_grade : string
letter grade with modifier (+/-)
'''
if num_level1 == 15:
if num_level2 == 15:
if num_level3 == 15:
grade = 'A'
elif num_level3 >= 10:
grade = 'A-'
elif num_level3 >=5:
grade = 'B+'
else:
grade = 'B'
elif num_level2 >=10:
grade = 'B-'
elif num_level2 >=5:
grade = 'C+'
else:
grade = 'C'
elif num_level1 >= 10:
grade = 'C-'
elif num_level1 >= 5:
grade = 'D+'
elif num_level1 >=3:
grade = 'D'
else:
grade = 'F'
return grade
We can apply help
on the function we wrote
help(compute_grade)
Help on function compute_grade in module __main__:
compute_grade(num_level1, num_level2, num_level3)
Computes a grade for CSC/DSP310 from numbers of achievements at each level
Parameters:
------------
num_level1 : int
number of level 1 achievements earned
num_level2 : int
number of level 2 achievements earned
num_level3 : int
number of level 3 achievements earned
Returns:
--------
letter_grade : string
letter grade with modifier (+/-)
It gets the docstring
3.3. Everything is an Object in Python#
we can use the builtin function type
to inspect them, and get attributes with .
type(compute_grade)
function
compute_grade.__name__
'compute_grade'
c = 4.5
type(c)
float
c= 'hello'
type(c)
str
When do we use single vs double quotes?
You can use either, unless you need to put one inside the string then use the other.
my_sentence = "The professor's name is Dr. Brown"
my_sentence = 'The professor's name is Dr. Brown'
Input In [15]
my_sentence = 'The professor's name is Dr. Brown'
^
SyntaxError: invalid syntax
Yes we can escape special characters:
my_sentence = 'The professor\'s name is Dr. Brown'
but, it’s less readable and not recommended.
3.4. Good Code is always relative#
In programming for data science, we are often trying to tell a story.
Try it yourself
How might this goal change your code for this class relative to other code you have written or could imagine writing?
Python is a fully open source project and as such is governed by community standards and conventions.
Try it yourself
Find PEP8 (note that following it is part of earning python achievements)
The documentation for the full language is online too.
Guido van Rossum was the first main developer and wrote essays about python too.
it’s pretty popular
3.5. Coffee Data#
We’re going to use a dataset about coffee quality today.
How was this dataset collected?
reviewrs added to DB
then scraped
Where did it come from?
offee Quality Institute’s trained reviewers.
what format is it provided in?
csv (Comma Separated Values)
what other information is in this repository?
the code to scrape
Get raw url for the dataset click on the raw button on the csv page, then copy the url.
We’ll save that url as a variable to work with it.
data_url = 'https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/robusta_data_cleaned.csv'
We will use a library called Pandas
import pandas as pd
# import library and give it an alias (nickname) pd
pd.read_csv(data_url)
Unnamed: 0 | Species | Owner | Country.of.Origin | Farm.Name | Lot.Number | Mill | ICO.Number | Company | Altitude | ... | Color | Category.Two.Defects | Expiration | Certification.Body | Certification.Address | Certification.Contact | unit_of_measurement | altitude_low_meters | altitude_high_meters | altitude_mean_meters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Robusta | ankole coffee producers coop | Uganda | kyangundu cooperative society | NaN | ankole coffee producers | 0 | ankole coffee producers coop | 1488 | ... | Green | 2 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1488.0 | 1488.0 | 1488.0 |
1 | 2 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | 25 | sethuraman estate | 14/1148/2017/21 | kaapi royale | 3170 | ... | NaN | 2 | October 31st, 2018 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3170.0 | 3170.0 | 3170.0 |
2 | 3 | Robusta | andrew hetzel | India | sethuraman estate | NaN | NaN | 0000 | sethuraman estate | 1000m | ... | Green | 0 | April 29th, 2016 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
3 | 4 | Robusta | ugacof | Uganda | ugacof project area | NaN | ugacof | 0 | ugacof ltd | 1212 | ... | Green | 7 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1212.0 | 1212.0 | 1212.0 |
4 | 5 | Robusta | katuka development trust ltd | Uganda | katikamu capca farmers association | NaN | katuka development trust | 0 | katuka development trust ltd | 1200-1300 | ... | Green | 3 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1200.0 | 1300.0 | 1250.0 |
5 | 6 | Robusta | andrew hetzel | India | NaN | NaN | (self) | NaN | cafemakers, llc | 3000' | ... | Green | 0 | February 28th, 2013 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3000.0 | 3000.0 | 3000.0 |
6 | 7 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | ... | Green | 0 | May 15th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
7 | 8 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | 7 | sethuraman estate | 14/1148/2017/18 | kaapi royale | 3140 | ... | Bluish-Green | 0 | October 25th, 2018 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3140.0 | 3140.0 | 3140.0 |
8 | 9 | Robusta | nishant gurjer | India | sethuraman estate | RKR | sethuraman estate | 14/1148/2016/17 | kaapi royale | 1000 | ... | Green | 0 | August 17th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
9 | 10 | Robusta | ugacof | Uganda | ishaka | NaN | nsubuga umar | 0 | ugacof ltd | 900-1300 | ... | Green | 6 | August 5th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 900.0 | 1300.0 | 1100.0 |
10 | 11 | Robusta | ugacof | Uganda | ugacof project area | NaN | ugacof | 0 | ugacof ltd | 1095 | ... | Green | 1 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1095.0 | 1095.0 | 1095.0 |
11 | 12 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | RC AB | sethuraman estate | 14/1148/2016/12 | kaapi royale | 1000 | ... | Green | 0 | August 23rd, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
12 | 13 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | ... | Green | 1 | May 19th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
13 | 14 | Robusta | kasozi coffee farmers association | Uganda | kasozi coffee farmers | NaN | NaN | 0 | kasozi coffee farmers association | 1367 | ... | Green | 7 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1367.0 | 1367.0 | 1367.0 |
14 | 15 | Robusta | ankole coffee producers coop | Uganda | kyangundu coop society | NaN | ankole coffee producers coop union ltd | 0 | ankole coffee producers coop | 1488 | ... | Green | 2 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1488.0 | 1488.0 | 1488.0 |
15 | 16 | Robusta | andrew hetzel | India | sethuraman estate | NaN | NaN | 0000 | sethuraman estate | 1000m | ... | Green | 0 | April 29th, 2016 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
16 | 17 | Robusta | andrew hetzel | India | sethuraman estates | NaN | sethuraman estates | NaN | cafemakers, llc | 750m | ... | Blue-Green | 0 | June 3rd, 2014 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
17 | 18 | Robusta | kawacom uganda ltd | Uganda | bushenyi | NaN | kawacom | 0 | kawacom uganda ltd | 1600 | ... | Green | 1 | June 27th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1600.0 | 1600.0 | 1600.0 |
18 | 19 | Robusta | nitubaasa ltd | Uganda | kigezi coffee farmers association | NaN | nitubaasa | 0 | nitubaasa ltd | 1745 | ... | Green | 2 | June 27th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1745.0 | 1745.0 | 1745.0 |
19 | 20 | Robusta | mannya coffee project | Uganda | mannya coffee project | NaN | mannya coffee project | 0 | mannya coffee project | 1200 | ... | Green | 1 | June 27th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1200.0 | 1200.0 | 1200.0 |
20 | 21 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | ... | Bluish-Green | 1 | May 19th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
21 | 22 | Robusta | andrew hetzel | India | sethuraman estates | NaN | sethuraman estates | NaN | cafemakers, llc | 750m | ... | Green | 0 | June 20th, 2014 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
22 | 23 | Robusta | andrew hetzel | United States | sethuraman estates | NaN | sethuraman estates | NaN | cafemakers, llc | 3000' | ... | Green | 0 | February 28th, 2013 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3000.0 | 3000.0 | 3000.0 |
23 | 24 | Robusta | luis robles | Ecuador | robustasa | Lavado 1 | our own lab | NaN | robustasa | NaN | ... | Blue-Green | 1 | January 18th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
24 | 25 | Robusta | luis robles | Ecuador | robustasa | Lavado 3 | own laboratory | NaN | robustasa | 40 | ... | Blue-Green | 0 | January 18th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 40.0 | 40.0 | 40.0 |
25 | 26 | Robusta | james moore | United States | fazenda cazengo | NaN | cafe cazengo | NaN | global opportunity fund | 795 meters | ... | NaN | 6 | December 23rd, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 795.0 | 795.0 | 795.0 |
26 | 27 | Robusta | cafe politico | India | NaN | NaN | NaN | 14-1118-2014-0087 | cafe politico | NaN | ... | Green | 1 | August 25th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
27 | 28 | Robusta | cafe politico | Vietnam | NaN | NaN | NaN | NaN | cafe politico | NaN | ... | None | 9 | August 25th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
28 rows × 44 columns
Try it yourself
Read the data in again, but with the index correct and save it to a variable.
coffee_df = pd.read_csv(data_url, index_col='Unnamed: 0')
Once we read it in, we van view the first 5 rows with the head
method.
coffee_df.head()
Species | Owner | Country.of.Origin | Farm.Name | Lot.Number | Mill | ICO.Number | Company | Altitude | Region | ... | Color | Category.Two.Defects | Expiration | Certification.Body | Certification.Address | Certification.Contact | unit_of_measurement | altitude_low_meters | altitude_high_meters | altitude_mean_meters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Robusta | ankole coffee producers coop | Uganda | kyangundu cooperative society | NaN | ankole coffee producers | 0 | ankole coffee producers coop | 1488 | sheema south western | ... | Green | 2 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1488.0 | 1488.0 | 1488.0 |
2 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | 25 | sethuraman estate | 14/1148/2017/21 | kaapi royale | 3170 | chikmagalur karnataka indua | ... | NaN | 2 | October 31st, 2018 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3170.0 | 3170.0 | 3170.0 |
3 | Robusta | andrew hetzel | India | sethuraman estate | NaN | NaN | 0000 | sethuraman estate | 1000m | chikmagalur | ... | Green | 0 | April 29th, 2016 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
4 | Robusta | ugacof | Uganda | ugacof project area | NaN | ugacof | 0 | ugacof ltd | 1212 | central | ... | Green | 7 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1212.0 | 1212.0 | 1212.0 |
5 | Robusta | katuka development trust ltd | Uganda | katikamu capca farmers association | NaN | katuka development trust | 0 | katuka development trust ltd | 1200-1300 | luwero central region | ... | Green | 3 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1200.0 | 1300.0 | 1250.0 |
5 rows × 43 columns
Important
Remember to comment & annotate your code
3.6. Follow Up questions#
3.6.1. General Questions#
How do you create code to scrape data from a website and compile it into a csv file?
We won’t do too much of this in class, we’ll rely mostly on data that’s already organized into a data file of some sort.
We will read directly from html tables though. To scrape data that’s not structured or only partially structured there are libraries for that.
Will we be using pandas a lot during the semester?
3.6.2. Clarifying#
How do you auto finish your directories
in a terminal (and in jupyter and most code environments) you can use the tab key to complete things. We’ll keep coming back to this.
How do you properly shut down Jupyter Notebook
Shut down in the browser, close the tabs, then Ctrl+C in the terminal window.
Is pd some sort of variable we set or was it built in?
pd is an alias that we set when we ran the line import pandas as pd
How should I be organized for this class? Keep it all in a single folder? Keep it on GitHub?
I recommend keeping a folder for class ege CSC310 and then inside there create a notes folder where you keep the Notebooks we generate in class each day and clone the repository for each assignment in there. Everyt assignment will have a repo that will be on github, but your notes do not need to be in there.
You’ll only add specific things to your portfolio. More infomation to follow
I’m still not sure how to keep everything together in a portfolio for the semester?
Your portfolio will be carefully curated, not everything you do all semester.
I have questions about the parts of the first assignment,
How the grading works is posted in the grading page now. More information on the github stuff to follow.
I am still wondering if I am using anaconda or just normal terminal
on Windows: anaconda prompt for python things and GitBash for for git things
Can I push this code into my portfolio using the anaconda terminal
Gti probably won’t work on anaconda terminal.
3.6.3. Grading Questions#
How do we keep track of which achievements we’ve earned?
They will get posted to Brightspace in the grade section. I’ll notify you when we post and make them visible and you should check that Brightspace matches what we say on your code and e-mail if it doesn’t.
I don’t really have many questions from today, but I was wondering if office hours were posted.
Not yet, still trying to find times, but I will post an announcement when they are posted.
Will we always submit homework through the portfolio folder in github?
It will always be on GitHub, but each assignment will have its own repository
I’m just confused as how to view my feedback from the assignment
On your
3.6.4. Questions we’ll answer later this week#
does each column have a number assigned to it in data frames?
Can other data types be imported into a notebook and edited the same way as .csv files?
3.7. More Practice#
How could you check if
pd
is built in or if we defined it?If we wanted to see more than 5 rows when printing the head of the dataset how would we do so?
Ram Token Opportunity
Contribute possible practice questions to the notes using the suggest an edit button behind the GitHub menu at the top of the page.