2. Iterables and Pandas Data Frames#

2.1. House Keeping#

2.1.1. Grading is not done,#

you will get a notificaiton when yours is

2.1.2. Closing Jupyter server.#

In the terminal use Ctrl+C (actually control, not command on mac).

It will ask you a question and give options, read and follow

or

do ctrl+C a second time.

A jupyter server typically runs at localhost:8888, but if you have multiple servers running the count increases.

Once I saw a student in office hours working on localhost:8894 asking why their code kept crashing.

Important

Remember to close your jupyter server

2.2. Grading solution#

def compute_grade(num_level1,num_level2,num_level3):
    '''
    Computes a grade for CSC/DSP310 from numbers of achievements at each level

    Parameters:
    -----------

    num_level1 : int
      number of level 1 achievements earned
    num_level2 : int
      number of level 2 achievements earned
    num_level3 : int
      number of level 3 achievements earned

    Returns:
    --------
    letter_grade : string
      letter grade with modifier (+/-)
    '''
    if num_level1 == 15:
        if num_level2 == 15:
            if num_level3 == 15:
                grade = 'A'
            elif num_level3 >= 10:
                grade = 'A-'
            elif num_level3 >=5:
                grade = 'B+'
            else:
                grade = 'B'
        elif num_level2 >=10:
            grade = 'B-'
        elif num_level2 >=5:
            grade = 'C+'
        else:
            grade = 'C'
    elif num_level1 >= 10:
        grade = 'C-'
    elif num_level1 >= 5:
        grade = 'D+'
    elif num_level1 >=3:
        grade = 'D'
    else:
        grade = 'F'


    return grade

When we run the cell above that adds the function to memory.

Now that it is run, jupyter can show us compute_grade as an option when we tab complete after typing the first few letters.

When we restarted the kernel, we saw that before running the cell above, the tab complete did not work.

Important

this is important to understand what works when and why so that you know what to expect and can get unstuck

compute_grade(15,15,14)
'A-'
assert compute_grade(15,15,15) =='A'

asssert succeeds quietly

assert compute_grade(15,15,15) =='B'
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 assert compute_grade(15,15,15) =='B'

AssertionError: 

but fails with a specific error

The docstring is important, because it is the help.

help(compute_grade)
Help on function compute_grade in module __main__:

compute_grade(num_level1, num_level2, num_level3)
    Computes a grade for CSC/DSP310 from numbers of achievements at each level
    
    Parameters:
    -----------
    
    num_level1 : int
      number of level 1 achievements earned
    num_level2 : int
      number of level 2 achievements earned
    num_level3 : int
      number of level 3 achievements earned
    
    Returns:
    --------
    letter_grade : string
      letter grade with modifier (+/-)

2.3. Everything is Data#

Data we will see:

  • tabular data

  • websites as data

  • activity logs on websites

  • images

  • text

2.4. Why inspection in code?#

Some IDEs give you GUI based tools to inspect objects. We are going to do it programmatically inline with our analyses for two reasons.

  • (minor, logistical) it helps make for good notes

  • (most importantly) it helps build habits of data science

In data science, our code will be aiming to tell a story.

If you’re curious about something, try it out, see what happens. We’re going to use a lot of code inspection tools during class. These are helpful both for understanding what’s going on, but the advantage to knowing how to get this information programmatically even though a different IDE would give you inspection tools is that it helps you treat your code as data.

2.5. everything is an object#

let’s examine the type of some variables:

a = 4
b ='monday'
c = 5.3
d =print
type(a)
int

ints are a base python type, like they appear in other languages

strings are iterable type, meaning that theycan be indexed into, or their elements iterated over. For a more technical definition, see the official python glossary entry

type(b)
str

we can select one element

b[0]
'm'

or multiple, this is called slicing.

b[0:3]
'mon'

negative numbers count from the right.

b[-1]
'y'

decimals defualt to float

type(c)
float

a variable can hold a whole function.

type(d)
builtin_function_or_method

functions are also objects like any other type in python

we can use the variable just like the function itself

d('hello')
hello
print('hello')
hello

2.6. Tabular Data#

Structured data is easier to work with than other data.

We’re going to focus on tabular data for now. At the end of the course, we’ll examine images, which are structured, but more complex and text, which is much less structured.

2.7. Getting familiar with the datset#

We’re going to use a dataset about coffee quality today.

How was this dataset collected?

  • reviews added to DB

  • then scraped

Where did it come from?

  • coffee Quality Institute’s trained reviewers.

what format is it provided in?

  • csv (Comma Separated Values)

what other information is in this repository?

  • the code to scrape and clean the data

  • the data before cleaning

It’s important to always know where data came from and how it was collected.

This helps you know what is is useful for and what its limitations are.

Further Reading

An important research article on documenting datasets for machine learning is called Datasheets for Datasets these researchers also did a follow up study to better understand how practitioner use datasheets and decide how to use data.

If topics like this are interesting to you, let me know! my research is related to this and I have a lot of students who complete 310 do research in my lab.

2.8. Loading data#

Get raw url for the dataset click on the raw button on the csv page, then copy the url.

coffee_data_url = 'https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/robusta_data_cleaned.csv'

We will use data with a library called pandas. By convention, we import it like:

import pandas as pd
  • the import keyword is used for loading packages

  • pandas is the name of the package that is installed

  • as keyword allows us to assign an alias (nickname)

  • pd is the typical alias for pandas

we will load the data with pd.read_csv()

pd.read_csv(coffee_data_url)
Unnamed: 0 Species Owner Country.of.Origin Farm.Name Lot.Number Mill ICO.Number Company Altitude ... Color Category.Two.Defects Expiration Certification.Body Certification.Address Certification.Contact unit_of_measurement altitude_low_meters altitude_high_meters altitude_mean_meters
0 1 Robusta ankole coffee producers coop Uganda kyangundu cooperative society NaN ankole coffee producers 0 ankole coffee producers coop 1488 ... Green 2 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1488.0 1488.0 1488.0
1 2 Robusta nishant gurjer India sethuraman estate kaapi royale 25 sethuraman estate 14/1148/2017/21 kaapi royale 3170 ... NaN 2 October 31st, 2018 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3170.0 3170.0 3170.0
2 3 Robusta andrew hetzel India sethuraman estate NaN NaN 0000 sethuraman estate 1000m ... Green 0 April 29th, 2016 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
3 4 Robusta ugacof Uganda ugacof project area NaN ugacof 0 ugacof ltd 1212 ... Green 7 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1212.0 1212.0 1212.0
4 5 Robusta katuka development trust ltd Uganda katikamu capca farmers association NaN katuka development trust 0 katuka development trust ltd 1200-1300 ... Green 3 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1200.0 1300.0 1250.0
5 6 Robusta andrew hetzel India NaN NaN (self) NaN cafemakers, llc 3000' ... Green 0 February 28th, 2013 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3000.0 3000.0 3000.0
6 7 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m ... Green 0 May 15th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
7 8 Robusta nishant gurjer India sethuraman estate kaapi royale 7 sethuraman estate 14/1148/2017/18 kaapi royale 3140 ... Bluish-Green 0 October 25th, 2018 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3140.0 3140.0 3140.0
8 9 Robusta nishant gurjer India sethuraman estate RKR sethuraman estate 14/1148/2016/17 kaapi royale 1000 ... Green 0 August 17th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
9 10 Robusta ugacof Uganda ishaka NaN nsubuga umar 0 ugacof ltd 900-1300 ... Green 6 August 5th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 900.0 1300.0 1100.0
10 11 Robusta ugacof Uganda ugacof project area NaN ugacof 0 ugacof ltd 1095 ... Green 1 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1095.0 1095.0 1095.0
11 12 Robusta nishant gurjer India sethuraman estate kaapi royale RC AB sethuraman estate 14/1148/2016/12 kaapi royale 1000 ... Green 0 August 23rd, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
12 13 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m ... Green 1 May 19th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
13 14 Robusta kasozi coffee farmers association Uganda kasozi coffee farmers NaN NaN 0 kasozi coffee farmers association 1367 ... Green 7 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1367.0 1367.0 1367.0
14 15 Robusta ankole coffee producers coop Uganda kyangundu coop society NaN ankole coffee producers coop union ltd 0 ankole coffee producers coop 1488 ... Green 2 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1488.0 1488.0 1488.0
15 16 Robusta andrew hetzel India sethuraman estate NaN NaN 0000 sethuraman estate 1000m ... Green 0 April 29th, 2016 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
16 17 Robusta andrew hetzel India sethuraman estates NaN sethuraman estates NaN cafemakers, llc 750m ... Blue-Green 0 June 3rd, 2014 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
17 18 Robusta kawacom uganda ltd Uganda bushenyi NaN kawacom 0 kawacom uganda ltd 1600 ... Green 1 June 27th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1600.0 1600.0 1600.0
18 19 Robusta nitubaasa ltd Uganda kigezi coffee farmers association NaN nitubaasa 0 nitubaasa ltd 1745 ... Green 2 June 27th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1745.0 1745.0 1745.0
19 20 Robusta mannya coffee project Uganda mannya coffee project NaN mannya coffee project 0 mannya coffee project 1200 ... Green 1 June 27th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1200.0 1200.0 1200.0
20 21 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m ... Bluish-Green 1 May 19th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
21 22 Robusta andrew hetzel India sethuraman estates NaN sethuraman estates NaN cafemakers, llc 750m ... Green 0 June 20th, 2014 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
22 23 Robusta andrew hetzel United States sethuraman estates NaN sethuraman estates NaN cafemakers, llc 3000' ... Green 0 February 28th, 2013 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3000.0 3000.0 3000.0
23 24 Robusta luis robles Ecuador robustasa Lavado 1 our own lab NaN robustasa NaN ... Blue-Green 1 January 18th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN
24 25 Robusta luis robles Ecuador robustasa Lavado 3 own laboratory NaN robustasa 40 ... Blue-Green 0 January 18th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 40.0 40.0 40.0
25 26 Robusta james moore United States fazenda cazengo NaN cafe cazengo NaN global opportunity fund 795 meters ... NaN 6 December 23rd, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 795.0 795.0 795.0
26 27 Robusta cafe politico India NaN NaN NaN 14-1118-2014-0087 cafe politico NaN ... Green 1 August 25th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN
27 28 Robusta cafe politico Vietnam NaN NaN NaN NaN cafe politico NaN ... NaN 9 August 25th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN

28 rows × 44 columns

This read in the data and printed it out because it is the last line on the cell. If we do something else after, it will read it in, but not print it out.

In order to use it, we save the output to a variable.

coffee_df = pd.read_csv(coffee_data_url)

we can look at it again using the jupyter display

coffee_df
Unnamed: 0 Species Owner Country.of.Origin Farm.Name Lot.Number Mill ICO.Number Company Altitude ... Color Category.Two.Defects Expiration Certification.Body Certification.Address Certification.Contact unit_of_measurement altitude_low_meters altitude_high_meters altitude_mean_meters
0 1 Robusta ankole coffee producers coop Uganda kyangundu cooperative society NaN ankole coffee producers 0 ankole coffee producers coop 1488 ... Green 2 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1488.0 1488.0 1488.0
1 2 Robusta nishant gurjer India sethuraman estate kaapi royale 25 sethuraman estate 14/1148/2017/21 kaapi royale 3170 ... NaN 2 October 31st, 2018 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3170.0 3170.0 3170.0
2 3 Robusta andrew hetzel India sethuraman estate NaN NaN 0000 sethuraman estate 1000m ... Green 0 April 29th, 2016 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
3 4 Robusta ugacof Uganda ugacof project area NaN ugacof 0 ugacof ltd 1212 ... Green 7 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1212.0 1212.0 1212.0
4 5 Robusta katuka development trust ltd Uganda katikamu capca farmers association NaN katuka development trust 0 katuka development trust ltd 1200-1300 ... Green 3 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1200.0 1300.0 1250.0
5 6 Robusta andrew hetzel India NaN NaN (self) NaN cafemakers, llc 3000' ... Green 0 February 28th, 2013 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3000.0 3000.0 3000.0
6 7 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m ... Green 0 May 15th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
7 8 Robusta nishant gurjer India sethuraman estate kaapi royale 7 sethuraman estate 14/1148/2017/18 kaapi royale 3140 ... Bluish-Green 0 October 25th, 2018 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3140.0 3140.0 3140.0
8 9 Robusta nishant gurjer India sethuraman estate RKR sethuraman estate 14/1148/2016/17 kaapi royale 1000 ... Green 0 August 17th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
9 10 Robusta ugacof Uganda ishaka NaN nsubuga umar 0 ugacof ltd 900-1300 ... Green 6 August 5th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 900.0 1300.0 1100.0
10 11 Robusta ugacof Uganda ugacof project area NaN ugacof 0 ugacof ltd 1095 ... Green 1 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1095.0 1095.0 1095.0
11 12 Robusta nishant gurjer India sethuraman estate kaapi royale RC AB sethuraman estate 14/1148/2016/12 kaapi royale 1000 ... Green 0 August 23rd, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
12 13 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m ... Green 1 May 19th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
13 14 Robusta kasozi coffee farmers association Uganda kasozi coffee farmers NaN NaN 0 kasozi coffee farmers association 1367 ... Green 7 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1367.0 1367.0 1367.0
14 15 Robusta ankole coffee producers coop Uganda kyangundu coop society NaN ankole coffee producers coop union ltd 0 ankole coffee producers coop 1488 ... Green 2 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1488.0 1488.0 1488.0
15 16 Robusta andrew hetzel India sethuraman estate NaN NaN 0000 sethuraman estate 1000m ... Green 0 April 29th, 2016 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
16 17 Robusta andrew hetzel India sethuraman estates NaN sethuraman estates NaN cafemakers, llc 750m ... Blue-Green 0 June 3rd, 2014 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
17 18 Robusta kawacom uganda ltd Uganda bushenyi NaN kawacom 0 kawacom uganda ltd 1600 ... Green 1 June 27th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1600.0 1600.0 1600.0
18 19 Robusta nitubaasa ltd Uganda kigezi coffee farmers association NaN nitubaasa 0 nitubaasa ltd 1745 ... Green 2 June 27th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1745.0 1745.0 1745.0
19 20 Robusta mannya coffee project Uganda mannya coffee project NaN mannya coffee project 0 mannya coffee project 1200 ... Green 1 June 27th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1200.0 1200.0 1200.0
20 21 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m ... Bluish-Green 1 May 19th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
21 22 Robusta andrew hetzel India sethuraman estates NaN sethuraman estates NaN cafemakers, llc 750m ... Green 0 June 20th, 2014 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
22 23 Robusta andrew hetzel United States sethuraman estates NaN sethuraman estates NaN cafemakers, llc 3000' ... Green 0 February 28th, 2013 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3000.0 3000.0 3000.0
23 24 Robusta luis robles Ecuador robustasa Lavado 1 our own lab NaN robustasa NaN ... Blue-Green 1 January 18th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN
24 25 Robusta luis robles Ecuador robustasa Lavado 3 own laboratory NaN robustasa 40 ... Blue-Green 0 January 18th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 40.0 40.0 40.0
25 26 Robusta james moore United States fazenda cazengo NaN cafe cazengo NaN global opportunity fund 795 meters ... NaN 6 December 23rd, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 795.0 795.0 795.0
26 27 Robusta cafe politico India NaN NaN NaN 14-1118-2014-0087 cafe politico NaN ... Green 1 August 25th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN
27 28 Robusta cafe politico Vietnam NaN NaN NaN NaN cafe politico NaN ... NaN 9 August 25th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN

28 rows × 44 columns

Next we examine the type

type(coffee_df)
pandas.core.frame.DataFrame

This is a new type provided by the pandas library, called a dataframe

We can also exmaine its parts. It consists of several; first the column headings

coffee_df.columns
Index(['Unnamed: 0', 'Species', 'Owner', 'Country.of.Origin', 'Farm.Name',
       'Lot.Number', 'Mill', 'ICO.Number', 'Company', 'Altitude', 'Region',
       'Producer', 'Number.of.Bags', 'Bag.Weight', 'In.Country.Partner',
       'Harvest.Year', 'Grading.Date', 'Owner.1', 'Variety',
       'Processing.Method', 'Fragrance...Aroma', 'Flavor', 'Aftertaste',
       'Salt...Acid', 'Bitter...Sweet', 'Mouthfeel', 'Uniform.Cup',
       'Clean.Cup', 'Balance', 'Cupper.Points', 'Total.Cup.Points', 'Moisture',
       'Category.One.Defects', 'Quakers', 'Color', 'Category.Two.Defects',
       'Expiration', 'Certification.Body', 'Certification.Address',
       'Certification.Contact', 'unit_of_measurement', 'altitude_low_meters',
       'altitude_high_meters', 'altitude_mean_meters'],
      dtype='object')

These are a special type called Index that is also provided by pandas.

It also tells us that the actual headings are of dtype object. object is used for strings or columns with mixed types

the dtype is slightly different from base Python types and is how pandas classifies but roughly is the same idea as a type.

type(coffee_df.columns)
pandas.core.indexes.base.Index

It also has an index (first column, visually) but it is special because this is how you can index the data.

coffee_df.index
RangeIndex(start=0, stop=28, step=1)

Right now this is an autogenerated index, but we can also use the index_col parameter to set that up front.

coffee_df = pd.read_csv(coffee_data_url,index_col=0)
coffee_df
Species Owner Country.of.Origin Farm.Name Lot.Number Mill ICO.Number Company Altitude Region ... Color Category.Two.Defects Expiration Certification.Body Certification.Address Certification.Contact unit_of_measurement altitude_low_meters altitude_high_meters altitude_mean_meters
1 Robusta ankole coffee producers coop Uganda kyangundu cooperative society NaN ankole coffee producers 0 ankole coffee producers coop 1488 sheema south western ... Green 2 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1488.0 1488.0 1488.0
2 Robusta nishant gurjer India sethuraman estate kaapi royale 25 sethuraman estate 14/1148/2017/21 kaapi royale 3170 chikmagalur karnataka indua ... NaN 2 October 31st, 2018 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3170.0 3170.0 3170.0
3 Robusta andrew hetzel India sethuraman estate NaN NaN 0000 sethuraman estate 1000m chikmagalur ... Green 0 April 29th, 2016 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
4 Robusta ugacof Uganda ugacof project area NaN ugacof 0 ugacof ltd 1212 central ... Green 7 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1212.0 1212.0 1212.0
5 Robusta katuka development trust ltd Uganda katikamu capca farmers association NaN katuka development trust 0 katuka development trust ltd 1200-1300 luwero central region ... Green 3 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1200.0 1300.0 1250.0
6 Robusta andrew hetzel India NaN NaN (self) NaN cafemakers, llc 3000' chikmagalur ... Green 0 February 28th, 2013 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3000.0 3000.0 3000.0
7 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m chikmagalur ... Green 0 May 15th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
8 Robusta nishant gurjer India sethuraman estate kaapi royale 7 sethuraman estate 14/1148/2017/18 kaapi royale 3140 chikmagalur karnataka india ... Bluish-Green 0 October 25th, 2018 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3140.0 3140.0 3140.0
9 Robusta nishant gurjer India sethuraman estate RKR sethuraman estate 14/1148/2016/17 kaapi royale 1000 chikmagalur karnataka ... Green 0 August 17th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
10 Robusta ugacof Uganda ishaka NaN nsubuga umar 0 ugacof ltd 900-1300 western ... Green 6 August 5th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 900.0 1300.0 1100.0
11 Robusta ugacof Uganda ugacof project area NaN ugacof 0 ugacof ltd 1095 iganga namadrope eastern ... Green 1 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1095.0 1095.0 1095.0
12 Robusta nishant gurjer India sethuraman estate kaapi royale RC AB sethuraman estate 14/1148/2016/12 kaapi royale 1000 chikmagalur karnataka ... Green 0 August 23rd, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
13 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m chikmagalur ... Green 1 May 19th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
14 Robusta kasozi coffee farmers association Uganda kasozi coffee farmers NaN NaN 0 kasozi coffee farmers association 1367 eastern ... Green 7 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1367.0 1367.0 1367.0
15 Robusta ankole coffee producers coop Uganda kyangundu coop society NaN ankole coffee producers coop union ltd 0 ankole coffee producers coop 1488 south western ... Green 2 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1488.0 1488.0 1488.0
16 Robusta andrew hetzel India sethuraman estate NaN NaN 0000 sethuraman estate 1000m chikmagalur ... Green 0 April 29th, 2016 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
17 Robusta andrew hetzel India sethuraman estates NaN sethuraman estates NaN cafemakers, llc 750m chikmagalur ... Blue-Green 0 June 3rd, 2014 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
18 Robusta kawacom uganda ltd Uganda bushenyi NaN kawacom 0 kawacom uganda ltd 1600 western ... Green 1 June 27th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1600.0 1600.0 1600.0
19 Robusta nitubaasa ltd Uganda kigezi coffee farmers association NaN nitubaasa 0 nitubaasa ltd 1745 western ... Green 2 June 27th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1745.0 1745.0 1745.0
20 Robusta mannya coffee project Uganda mannya coffee project NaN mannya coffee project 0 mannya coffee project 1200 southern ... Green 1 June 27th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1200.0 1200.0 1200.0
21 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m chikmagalur ... Bluish-Green 1 May 19th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
22 Robusta andrew hetzel India sethuraman estates NaN sethuraman estates NaN cafemakers, llc 750m chikmagalur ... Green 0 June 20th, 2014 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
23 Robusta andrew hetzel United States sethuraman estates NaN sethuraman estates NaN cafemakers, llc 3000' chikmagalur ... Green 0 February 28th, 2013 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3000.0 3000.0 3000.0
24 Robusta luis robles Ecuador robustasa Lavado 1 our own lab NaN robustasa NaN san juan, playas ... Blue-Green 1 January 18th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN
25 Robusta luis robles Ecuador robustasa Lavado 3 own laboratory NaN robustasa 40 san juan, playas ... Blue-Green 0 January 18th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 40.0 40.0 40.0
26 Robusta james moore United States fazenda cazengo NaN cafe cazengo NaN global opportunity fund 795 meters kwanza norte province, angola ... NaN 6 December 23rd, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 795.0 795.0 795.0
27 Robusta cafe politico India NaN NaN NaN 14-1118-2014-0087 cafe politico NaN NaN ... Green 1 August 25th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN
28 Robusta cafe politico Vietnam NaN NaN NaN NaN cafe politico NaN NaN ... NaN 9 August 25th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN

28 rows × 43 columns

coffee_df.index
Index([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
       19, 20, 21, 22, 23, 24, 25, 26, 27, 28],
      dtype='int64')

Now we see that it uses the actual first column as the index that is bolded.

We can look at the first 5 rows with head

coffee_df.head()
Species Owner Country.of.Origin Farm.Name Lot.Number Mill ICO.Number Company Altitude Region ... Color Category.Two.Defects Expiration Certification.Body Certification.Address Certification.Contact unit_of_measurement altitude_low_meters altitude_high_meters altitude_mean_meters
1 Robusta ankole coffee producers coop Uganda kyangundu cooperative society NaN ankole coffee producers 0 ankole coffee producers coop 1488 sheema south western ... Green 2 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1488.0 1488.0 1488.0
2 Robusta nishant gurjer India sethuraman estate kaapi royale 25 sethuraman estate 14/1148/2017/21 kaapi royale 3170 chikmagalur karnataka indua ... NaN 2 October 31st, 2018 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3170.0 3170.0 3170.0
3 Robusta andrew hetzel India sethuraman estate NaN NaN 0000 sethuraman estate 1000m chikmagalur ... Green 0 April 29th, 2016 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 1000.0 1000.0 1000.0
4 Robusta ugacof Uganda ugacof project area NaN ugacof 0 ugacof ltd 1212 central ... Green 7 July 14th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1212.0 1212.0 1212.0
5 Robusta katuka development trust ltd Uganda katikamu capca farmers association NaN katuka development trust 0 katuka development trust ltd 1200-1300 luwero central region ... Green 3 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1200.0 1300.0 1250.0

5 rows × 43 columns

Try it yourself

How can you look at the first 3 or last 2 rows?

and the last 5 with tail

coffee_df.tail()
Species Owner Country.of.Origin Farm.Name Lot.Number Mill ICO.Number Company Altitude Region ... Color Category.Two.Defects Expiration Certification.Body Certification.Address Certification.Contact unit_of_measurement altitude_low_meters altitude_high_meters altitude_mean_meters
24 Robusta luis robles Ecuador robustasa Lavado 1 our own lab NaN robustasa NaN san juan, playas ... Blue-Green 1 January 18th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN
25 Robusta luis robles Ecuador robustasa Lavado 3 own laboratory NaN robustasa 40 san juan, playas ... Blue-Green 0 January 18th, 2017 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 40.0 40.0 40.0
26 Robusta james moore United States fazenda cazengo NaN cafe cazengo NaN global opportunity fund 795 meters kwanza norte province, angola ... NaN 6 December 23rd, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 795.0 795.0 795.0
27 Robusta cafe politico India NaN NaN NaN 14-1118-2014-0087 cafe politico NaN NaN ... Green 1 August 25th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN
28 Robusta cafe politico Vietnam NaN NaN NaN NaN cafe politico NaN NaN ... NaN 9 August 25th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m NaN NaN NaN

5 rows × 43 columns

Important

We did not do this step in class

the shape of a DataFrame is an attribute

coffee_df.shape
(28, 43)

We can pick out columns by name.

coffee_df['Color']
1            Green
2              NaN
3            Green
4            Green
5            Green
6            Green
7            Green
8     Bluish-Green
9            Green
10           Green
11           Green
12           Green
13           Green
14           Green
15           Green
16           Green
17      Blue-Green
18           Green
19           Green
20           Green
21    Bluish-Green
22           Green
23           Green
24      Blue-Green
25      Blue-Green
26             NaN
27           Green
28             NaN
Name: Color, dtype: object

a single column is a new type, called Series

type(coffee_df['Color'])
pandas.core.series.Series

We can pick out rows using the loc accessor. It is a tricky concept because it is indexing so it uses square brackets [] but it uses a . like a method. This is a sort of atypical syntax, but we do not use it very often. We pick out single columns a lot, so that has a nice easy syntax like above, but this is rare, so it got the less elegant syntax.

coffee_df.loc[1]
Species                                                   Robusta
Owner                                ankole coffee producers coop
Country.of.Origin                                          Uganda
Farm.Name                           kyangundu cooperative society
Lot.Number                                                    NaN
Mill                                      ankole coffee producers
ICO.Number                                                      0
Company                              ankole coffee producers coop
Altitude                                                     1488
Region                                       sheema south western
Producer                             Ankole coffee producers coop
Number.of.Bags                                                300
Bag.Weight                                                  60 kg
In.Country.Partner            Uganda Coffee Development Authority
Harvest.Year                                                 2013
Grading.Date                                      June 26th, 2014
Owner.1                              Ankole coffee producers coop
Variety                                                       NaN
Processing.Method                                             NaN
Fragrance...Aroma                                            7.83
Flavor                                                       8.08
Aftertaste                                                   7.75
Salt...Acid                                                  7.92
Bitter...Sweet                                                8.0
Mouthfeel                                                    8.25
Uniform.Cup                                                  10.0
Clean.Cup                                                    10.0
Balance                                                      7.92
Cupper.Points                                                 8.0
Total.Cup.Points                                            83.75
Moisture                                                     0.12
Category.One.Defects                                            0
Quakers                                                         0
Color                                                       Green
Category.Two.Defects                                            2
Expiration                                        June 26th, 2015
Certification.Body            Uganda Coffee Development Authority
Certification.Address    e36d0270932c3b657e96b7b0278dfd85dc0fe743
Certification.Contact    03077a1c6bac60e6f514691634a7f6eb5c85aae8
unit_of_measurement                                             m
altitude_low_meters                                        1488.0
altitude_high_meters                                       1488.0
altitude_mean_meters                                       1488.0
Name: 1, dtype: object

We can also slice in dataframes, just like in strings.

subset_df = coffee_df.loc[5:8]
subset_df
Species Owner Country.of.Origin Farm.Name Lot.Number Mill ICO.Number Company Altitude Region ... Color Category.Two.Defects Expiration Certification.Body Certification.Address Certification.Contact unit_of_measurement altitude_low_meters altitude_high_meters altitude_mean_meters
5 Robusta katuka development trust ltd Uganda katikamu capca farmers association NaN katuka development trust 0 katuka development trust ltd 1200-1300 luwero central region ... Green 3 June 26th, 2015 Uganda Coffee Development Authority e36d0270932c3b657e96b7b0278dfd85dc0fe743 03077a1c6bac60e6f514691634a7f6eb5c85aae8 m 1200.0 1300.0 1250.0
6 Robusta andrew hetzel India NaN NaN (self) NaN cafemakers, llc 3000' chikmagalur ... Green 0 February 28th, 2013 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3000.0 3000.0 3000.0
7 Robusta andrew hetzel India sethuraman estates NaN NaN NaN cafemakers 750m chikmagalur ... Green 0 May 15th, 2015 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 750.0 750.0 750.0
8 Robusta nishant gurjer India sethuraman estate kaapi royale 7 sethuraman estate 14/1148/2017/18 kaapi royale 3140 chikmagalur karnataka india ... Bluish-Green 0 October 25th, 2018 Specialty Coffee Association ff7c18ad303d4b603ac3f8cff7e611ffc735e720 352d0cf7f3e9be14dad7df644ad65efc27605ae2 m 3140.0 3140.0 3140.0

4 rows × 43 columns

Now loc[1] will give a key error because there is no 1 in the index.

subset_df.loc[1]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/core/indexes/base.py:3653, in Index.get_loc(self, key)
   3652 try:
-> 3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/_libs/index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/_libs/index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:2606, in pandas._libs.hashtable.Int64HashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:2630, in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 1

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[34], line 1
----> 1 subset_df.loc[1]

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/core/indexing.py:1103, in _LocationIndexer.__getitem__(self, key)
   1100 axis = self.axis or 0
   1102 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1103 return self._getitem_axis(maybe_callable, axis=axis)

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/core/indexing.py:1343, in _LocIndexer._getitem_axis(self, key, axis)
   1341 # fall thru to straight lookup
   1342 self._validate_key(key, axis)
-> 1343 return self._get_label(key, axis=axis)

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/core/indexing.py:1293, in _LocIndexer._get_label(self, label, axis)
   1291 def _get_label(self, label, axis: AxisInt):
   1292     # GH#5567 this will fail if the label is not present in the axis.
-> 1293     return self.obj.xs(label, axis=axis)

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/core/generic.py:4095, in NDFrame.xs(self, key, axis, level, drop_level)
   4093             new_index = index[loc]
   4094 else:
-> 4095     loc = index.get_loc(key)
   4097     if isinstance(loc, np.ndarray):
   4098         if loc.dtype == np.bool_:

File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/core/indexes/base.py:3655, in Index.get_loc(self, key)
   3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:
-> 3655     raise KeyError(key) from err
   3656 except TypeError:
   3657     # If we have a listlike key, _check_indexing_error will raise
   3658     #  InvalidIndexError. Otherwise we fall through and re-raise
   3659     #  the TypeError.
   3660     self._check_indexing_error(key)

KeyError: 1

the only values that will work in loc are the ones in the index:

subset_df.index
Index([5, 6, 7, 8], dtype='int64')

however, with iloc they are indexed by integer values starting with 0.

subset_df.iloc[1]
Species                                                   Robusta
Owner                                               andrew hetzel
Country.of.Origin                                           India
Farm.Name                                                     NaN
Lot.Number                                                    NaN
Mill                                                       (self)
ICO.Number                                                    NaN
Company                                           cafemakers, llc
Altitude                                                    3000'
Region                                                chikmagalur
Producer                                       Sethuraman Estates
Number.of.Bags                                                200
Bag.Weight                                                   1 kg
In.Country.Partner                   Specialty Coffee Association
Harvest.Year                                                 2012
Grading.Date                                  February 29th, 2012
Owner.1                                             Andrew Hetzel
Variety                                                       NaN
Processing.Method                                             NaN
Fragrance...Aroma                                             8.0
Flavor                                                       7.92
Aftertaste                                                   7.67
Salt...Acid                                                   8.0
Bitter...Sweet                                               7.75
Mouthfeel                                                    7.75
Uniform.Cup                                                  10.0
Clean.Cup                                                    10.0
Balance                                                      7.92
Cupper.Points                                                7.75
Total.Cup.Points                                            82.75
Moisture                                                      0.0
Category.One.Defects                                            0
Quakers                                                         0
Color                                                       Green
Category.Two.Defects                                            0
Expiration                                    February 28th, 2013
Certification.Body                   Specialty Coffee Association
Certification.Address    ff7c18ad303d4b603ac3f8cff7e611ffc735e720
Certification.Contact    352d0cf7f3e9be14dad7df644ad65efc27605ae2
unit_of_measurement                                             m
altitude_low_meters                                        3000.0
altitude_high_meters                                       3000.0
altitude_mean_meters                                       3000.0
Name: 6, dtype: object

2.9. Questions After Class#

2.9.1. I think this something I need to figure but how do the localhost or just utilizing a url in VS Code? I was late to the class, I never got how to do the jupyter lab thing.#

For this class, you need to use jupyter notebooks without extraneous metadata. If you use jupyter inside of vs code, it adds extraneous metadata that makes it hard to grade and VS code, in my experience, does not provide the most helpful autocomplete for Data Science.

Please see office hours to get help with it.

2.9.2. Is the Python we use in Jupyter lab notebooks any different from traditional Python?#

the Python is mostly all the same. There are different python interpreters that have some slightly different behaviors, but mostly only in the display. As a matter of technicality, jupyter uses the ipython python as the kernel.

2.9.3. Does index just list all the rows?#

the index is the name of the rows the same way that the column headers are the name of the columns.

2.9.4. How you copied the file url from github.#

Click the raw button and then copy the URL from your browswer’s url bar.

2.9.5. How did we change the index?#

we changed the index from the inferred (figured out by pandas) RangeIndex to a column of the data by adding the index_col=0 parameter to our read_csv call.

2.9.6. I would like to learn more about the panda commands#

We will continue learning more pandas features for the next few weeks.

2.9.7. are we gonna have to use what we learned today in a bigger program in the future#

Yes, these features we used today are the basis of all of the data analysis we will do all semester. However, we will not be writing “programs” the way you may have for other classes, we will be doing data analyses, which are more narrative.

2.9.8. How can I use Jupyter to clean data?#

jupyter is a way to work with python code. We will learn what clean data looks like and more ways to manipulate dataframes to make it clean in two weeks.

2.9.9. Why is taking data from columns much more common than taking it from rows?#

We set our data up so that each column is a varible. We often want to treat different variables differently, but do the same thing to all of the rows.

2.9.10. I was wondering more about the Index variable type and was also curious as to what that could be used for.#

the Index type from pandas is a component of a DataFrame, we will use them implicitly whenever we work with a part of a dataframe and explicitly when we clean data.

2.9.11. I know by convention we use the typical alias for importing libraries, but is it okay to use our own alias for our own private programs?#

using nonstandard aliases is a bad habit to develop and I cannot endorse it. Technically the code will run, but in class it will be a style violation.

2.9.12. does the panda’s data start indexing at 1 because 0 is where the table headers are located?#

Indexing using loc started at 1 because the dataset had 1 there, in the second example it started at 5.

using iloc starts at 0.

2.9.13. In the dataset we loaded, I noticed that there were some zeros, which are nulls, I’m guessing we have to clean those out, and I was wondering how?#

zeros are a value, nulls are encoded in different ways. We will learn how to deal will missing values in two weeks.

2.9.14. How often do data sets need to be cleaned/manipulated before proper analysis can be done?#

Real data, will almost alwasy need to be fixed a little bit.

2.9.15. how does being able to view specific rows/columns help us make conclusions about data?#

For example, maybe one column is the thing you are interested in, you may want to know on stats on the one column.

2.9.16. Are assignments always a certain level or can one assignment be done to be a level 1 or a level 2 assignment#

Going forward, Assignments are always targeted at level 2. In class prismia questions will assess at level 1. An incomplete attempt at an assignment might be evaluated only at level 1, so that can be a way to make up for missed class and then you earn the level 2 in the next assignment that assess that skill.

2.9.17. will I be able to use data from an existing lab that I work in for certain assignment?#

Yes, as long as the data is allowed to be shared. Please confirm with your PI.

2.9.18. When we submit to GitHub, do we need to do anything other than upload the file?#

For your portfolio, no. For other assignments, there will be a step to do, and there will be instructions in the assignment.

2.9.19. Are we going to have to create our own datasets for any future assignments rather than downloading datasets form the Internet?#

Assignment 2 you will build a dataset about datasets, but other than that you will mostly use datasets that you find online or that you have for another purpose.

2.9.20. How to better navigate github and not second guess my posts there#

Warning

this will be added later.