2. Iterables and Pandas Data Frames#
2.1. House Keeping#
2.1.1. Grading is not done,#
you will get a notificaiton when yours is
2.1.2. Closing Jupyter server.#
In the terminal use Ctrl+C (actually control, not command on mac).
It will ask you a question and give options, read and follow
or
do ctrl+C a second time.
A jupyter server typically runs at localhost:8888
, but if you have multiple servers running the count increases.
Once I saw a student in office hours working on localhost:8894
asking why their code kept crashing.
Important
Remember to close your jupyter server
2.2. Grading solution#
def compute_grade(num_level1,num_level2,num_level3):
'''
Computes a grade for CSC/DSP310 from numbers of achievements at each level
Parameters:
-----------
num_level1 : int
number of level 1 achievements earned
num_level2 : int
number of level 2 achievements earned
num_level3 : int
number of level 3 achievements earned
Returns:
--------
letter_grade : string
letter grade with modifier (+/-)
'''
if num_level1 == 15:
if num_level2 == 15:
if num_level3 == 15:
grade = 'A'
elif num_level3 >= 10:
grade = 'A-'
elif num_level3 >=5:
grade = 'B+'
else:
grade = 'B'
elif num_level2 >=10:
grade = 'B-'
elif num_level2 >=5:
grade = 'C+'
else:
grade = 'C'
elif num_level1 >= 10:
grade = 'C-'
elif num_level1 >= 5:
grade = 'D+'
elif num_level1 >=3:
grade = 'D'
else:
grade = 'F'
return grade
When we run the cell above that adds the function to memory.
Now that it is run, jupyter can show us compute_grade
as an option when we tab complete after typing the first few letters.
When we restarted the kernel, we saw that before running the cell above, the tab complete did not work.
Important
this is important to understand what works when and why so that you know what to expect and can get unstuck
compute_grade(15,15,14)
'A-'
assert compute_grade(15,15,15) =='A'
asssert
succeeds quietly
assert compute_grade(15,15,15) =='B'
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[4], line 1
----> 1 assert compute_grade(15,15,15) =='B'
AssertionError:
but fails with a specific error
The docstring is important, because it is the help.
help(compute_grade)
Help on function compute_grade in module __main__:
compute_grade(num_level1, num_level2, num_level3)
Computes a grade for CSC/DSP310 from numbers of achievements at each level
Parameters:
-----------
num_level1 : int
number of level 1 achievements earned
num_level2 : int
number of level 2 achievements earned
num_level3 : int
number of level 3 achievements earned
Returns:
--------
letter_grade : string
letter grade with modifier (+/-)
2.3. Everything is Data#
Data we will see:
tabular data
websites as data
activity logs on websites
images
text
2.4. Why inspection in code?#
Some IDEs give you GUI based tools to inspect objects. We are going to do it programmatically inline with our analyses for two reasons.
(minor, logistical) it helps make for good notes
(most importantly) it helps build habits of data science
In data science, our code will be aiming to tell a story.
If you’re curious about something, try it out, see what happens. We’re going to use a lot of code inspection tools during class. These are helpful both for understanding what’s going on, but the advantage to knowing how to get this information programmatically even though a different IDE would give you inspection tools is that it helps you treat your code as data.
2.5. everything is an object#
let’s examine the type
of some variables:
a = 4
b ='monday'
c = 5.3
d =print
type(a)
int
ints are a base python type, like they appear in other languages
strings are iterable type, meaning that theycan be indexed into, or their elements iterated over. For a more technical definition, see the official python glossary entry
type(b)
str
we can select one element
b[0]
'm'
or multiple, this is called slicing.
b[0:3]
'mon'
negative numbers count from the right.
b[-1]
'y'
decimals defualt to float
type(c)
float
a variable can hold a whole function.
type(d)
builtin_function_or_method
functions are also objects like any other type in python
we can use the variable just like the function itself
d('hello')
hello
print('hello')
hello
2.6. Tabular Data#
Structured data is easier to work with than other data.
We’re going to focus on tabular data for now. At the end of the course, we’ll examine images, which are structured, but more complex and text, which is much less structured.
2.7. Getting familiar with the datset#
We’re going to use a dataset about coffee quality today.
How was this dataset collected?
reviews added to DB
then scraped
Where did it come from?
coffee Quality Institute’s trained reviewers.
what format is it provided in?
csv (Comma Separated Values)
what other information is in this repository?
the code to scrape and clean the data
the data before cleaning
It’s important to always know where data came from and how it was collected.
This helps you know what is is useful for and what its limitations are.
Further Reading
An important research article on documenting datasets for machine learning is called Datasheets for Datasets these researchers also did a follow up study to better understand how practitioner use datasheets and decide how to use data.
If topics like this are interesting to you, let me know! my research is related to this and I have a lot of students who complete 310 do research in my lab.
2.8. Loading data#
Get raw url for the dataset click on the raw button on the csv page, then copy the url.
coffee_data_url = 'https://raw.githubusercontent.com/jldbc/coffee-quality-database/master/data/robusta_data_cleaned.csv'
We will use data with a library called pandas. By convention, we import it like:
import pandas as pd
the
import
keyword is used for loading packagespandas
is the name of the package that is installedas
keyword allows us to assign an alias (nickname)pd
is the typical alias for pandas
we will load the data with pd.read_csv()
pd.read_csv(coffee_data_url)
Unnamed: 0 | Species | Owner | Country.of.Origin | Farm.Name | Lot.Number | Mill | ICO.Number | Company | Altitude | ... | Color | Category.Two.Defects | Expiration | Certification.Body | Certification.Address | Certification.Contact | unit_of_measurement | altitude_low_meters | altitude_high_meters | altitude_mean_meters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Robusta | ankole coffee producers coop | Uganda | kyangundu cooperative society | NaN | ankole coffee producers | 0 | ankole coffee producers coop | 1488 | ... | Green | 2 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1488.0 | 1488.0 | 1488.0 |
1 | 2 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | 25 | sethuraman estate | 14/1148/2017/21 | kaapi royale | 3170 | ... | NaN | 2 | October 31st, 2018 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3170.0 | 3170.0 | 3170.0 |
2 | 3 | Robusta | andrew hetzel | India | sethuraman estate | NaN | NaN | 0000 | sethuraman estate | 1000m | ... | Green | 0 | April 29th, 2016 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
3 | 4 | Robusta | ugacof | Uganda | ugacof project area | NaN | ugacof | 0 | ugacof ltd | 1212 | ... | Green | 7 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1212.0 | 1212.0 | 1212.0 |
4 | 5 | Robusta | katuka development trust ltd | Uganda | katikamu capca farmers association | NaN | katuka development trust | 0 | katuka development trust ltd | 1200-1300 | ... | Green | 3 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1200.0 | 1300.0 | 1250.0 |
5 | 6 | Robusta | andrew hetzel | India | NaN | NaN | (self) | NaN | cafemakers, llc | 3000' | ... | Green | 0 | February 28th, 2013 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3000.0 | 3000.0 | 3000.0 |
6 | 7 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | ... | Green | 0 | May 15th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
7 | 8 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | 7 | sethuraman estate | 14/1148/2017/18 | kaapi royale | 3140 | ... | Bluish-Green | 0 | October 25th, 2018 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3140.0 | 3140.0 | 3140.0 |
8 | 9 | Robusta | nishant gurjer | India | sethuraman estate | RKR | sethuraman estate | 14/1148/2016/17 | kaapi royale | 1000 | ... | Green | 0 | August 17th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
9 | 10 | Robusta | ugacof | Uganda | ishaka | NaN | nsubuga umar | 0 | ugacof ltd | 900-1300 | ... | Green | 6 | August 5th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 900.0 | 1300.0 | 1100.0 |
10 | 11 | Robusta | ugacof | Uganda | ugacof project area | NaN | ugacof | 0 | ugacof ltd | 1095 | ... | Green | 1 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1095.0 | 1095.0 | 1095.0 |
11 | 12 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | RC AB | sethuraman estate | 14/1148/2016/12 | kaapi royale | 1000 | ... | Green | 0 | August 23rd, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
12 | 13 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | ... | Green | 1 | May 19th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
13 | 14 | Robusta | kasozi coffee farmers association | Uganda | kasozi coffee farmers | NaN | NaN | 0 | kasozi coffee farmers association | 1367 | ... | Green | 7 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1367.0 | 1367.0 | 1367.0 |
14 | 15 | Robusta | ankole coffee producers coop | Uganda | kyangundu coop society | NaN | ankole coffee producers coop union ltd | 0 | ankole coffee producers coop | 1488 | ... | Green | 2 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1488.0 | 1488.0 | 1488.0 |
15 | 16 | Robusta | andrew hetzel | India | sethuraman estate | NaN | NaN | 0000 | sethuraman estate | 1000m | ... | Green | 0 | April 29th, 2016 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
16 | 17 | Robusta | andrew hetzel | India | sethuraman estates | NaN | sethuraman estates | NaN | cafemakers, llc | 750m | ... | Blue-Green | 0 | June 3rd, 2014 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
17 | 18 | Robusta | kawacom uganda ltd | Uganda | bushenyi | NaN | kawacom | 0 | kawacom uganda ltd | 1600 | ... | Green | 1 | June 27th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1600.0 | 1600.0 | 1600.0 |
18 | 19 | Robusta | nitubaasa ltd | Uganda | kigezi coffee farmers association | NaN | nitubaasa | 0 | nitubaasa ltd | 1745 | ... | Green | 2 | June 27th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1745.0 | 1745.0 | 1745.0 |
19 | 20 | Robusta | mannya coffee project | Uganda | mannya coffee project | NaN | mannya coffee project | 0 | mannya coffee project | 1200 | ... | Green | 1 | June 27th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1200.0 | 1200.0 | 1200.0 |
20 | 21 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | ... | Bluish-Green | 1 | May 19th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
21 | 22 | Robusta | andrew hetzel | India | sethuraman estates | NaN | sethuraman estates | NaN | cafemakers, llc | 750m | ... | Green | 0 | June 20th, 2014 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
22 | 23 | Robusta | andrew hetzel | United States | sethuraman estates | NaN | sethuraman estates | NaN | cafemakers, llc | 3000' | ... | Green | 0 | February 28th, 2013 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3000.0 | 3000.0 | 3000.0 |
23 | 24 | Robusta | luis robles | Ecuador | robustasa | Lavado 1 | our own lab | NaN | robustasa | NaN | ... | Blue-Green | 1 | January 18th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
24 | 25 | Robusta | luis robles | Ecuador | robustasa | Lavado 3 | own laboratory | NaN | robustasa | 40 | ... | Blue-Green | 0 | January 18th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 40.0 | 40.0 | 40.0 |
25 | 26 | Robusta | james moore | United States | fazenda cazengo | NaN | cafe cazengo | NaN | global opportunity fund | 795 meters | ... | NaN | 6 | December 23rd, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 795.0 | 795.0 | 795.0 |
26 | 27 | Robusta | cafe politico | India | NaN | NaN | NaN | 14-1118-2014-0087 | cafe politico | NaN | ... | Green | 1 | August 25th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
27 | 28 | Robusta | cafe politico | Vietnam | NaN | NaN | NaN | NaN | cafe politico | NaN | ... | NaN | 9 | August 25th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
28 rows × 44 columns
This read in the data and printed it out because it is the last line on the cell. If we do something else after, it will read it in, but not print it out.
In order to use it, we save the output to a variable.
coffee_df = pd.read_csv(coffee_data_url)
we can look at it again using the jupyter display
coffee_df
Unnamed: 0 | Species | Owner | Country.of.Origin | Farm.Name | Lot.Number | Mill | ICO.Number | Company | Altitude | ... | Color | Category.Two.Defects | Expiration | Certification.Body | Certification.Address | Certification.Contact | unit_of_measurement | altitude_low_meters | altitude_high_meters | altitude_mean_meters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Robusta | ankole coffee producers coop | Uganda | kyangundu cooperative society | NaN | ankole coffee producers | 0 | ankole coffee producers coop | 1488 | ... | Green | 2 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1488.0 | 1488.0 | 1488.0 |
1 | 2 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | 25 | sethuraman estate | 14/1148/2017/21 | kaapi royale | 3170 | ... | NaN | 2 | October 31st, 2018 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3170.0 | 3170.0 | 3170.0 |
2 | 3 | Robusta | andrew hetzel | India | sethuraman estate | NaN | NaN | 0000 | sethuraman estate | 1000m | ... | Green | 0 | April 29th, 2016 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
3 | 4 | Robusta | ugacof | Uganda | ugacof project area | NaN | ugacof | 0 | ugacof ltd | 1212 | ... | Green | 7 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1212.0 | 1212.0 | 1212.0 |
4 | 5 | Robusta | katuka development trust ltd | Uganda | katikamu capca farmers association | NaN | katuka development trust | 0 | katuka development trust ltd | 1200-1300 | ... | Green | 3 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1200.0 | 1300.0 | 1250.0 |
5 | 6 | Robusta | andrew hetzel | India | NaN | NaN | (self) | NaN | cafemakers, llc | 3000' | ... | Green | 0 | February 28th, 2013 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3000.0 | 3000.0 | 3000.0 |
6 | 7 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | ... | Green | 0 | May 15th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
7 | 8 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | 7 | sethuraman estate | 14/1148/2017/18 | kaapi royale | 3140 | ... | Bluish-Green | 0 | October 25th, 2018 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3140.0 | 3140.0 | 3140.0 |
8 | 9 | Robusta | nishant gurjer | India | sethuraman estate | RKR | sethuraman estate | 14/1148/2016/17 | kaapi royale | 1000 | ... | Green | 0 | August 17th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
9 | 10 | Robusta | ugacof | Uganda | ishaka | NaN | nsubuga umar | 0 | ugacof ltd | 900-1300 | ... | Green | 6 | August 5th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 900.0 | 1300.0 | 1100.0 |
10 | 11 | Robusta | ugacof | Uganda | ugacof project area | NaN | ugacof | 0 | ugacof ltd | 1095 | ... | Green | 1 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1095.0 | 1095.0 | 1095.0 |
11 | 12 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | RC AB | sethuraman estate | 14/1148/2016/12 | kaapi royale | 1000 | ... | Green | 0 | August 23rd, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
12 | 13 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | ... | Green | 1 | May 19th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
13 | 14 | Robusta | kasozi coffee farmers association | Uganda | kasozi coffee farmers | NaN | NaN | 0 | kasozi coffee farmers association | 1367 | ... | Green | 7 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1367.0 | 1367.0 | 1367.0 |
14 | 15 | Robusta | ankole coffee producers coop | Uganda | kyangundu coop society | NaN | ankole coffee producers coop union ltd | 0 | ankole coffee producers coop | 1488 | ... | Green | 2 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1488.0 | 1488.0 | 1488.0 |
15 | 16 | Robusta | andrew hetzel | India | sethuraman estate | NaN | NaN | 0000 | sethuraman estate | 1000m | ... | Green | 0 | April 29th, 2016 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
16 | 17 | Robusta | andrew hetzel | India | sethuraman estates | NaN | sethuraman estates | NaN | cafemakers, llc | 750m | ... | Blue-Green | 0 | June 3rd, 2014 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
17 | 18 | Robusta | kawacom uganda ltd | Uganda | bushenyi | NaN | kawacom | 0 | kawacom uganda ltd | 1600 | ... | Green | 1 | June 27th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1600.0 | 1600.0 | 1600.0 |
18 | 19 | Robusta | nitubaasa ltd | Uganda | kigezi coffee farmers association | NaN | nitubaasa | 0 | nitubaasa ltd | 1745 | ... | Green | 2 | June 27th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1745.0 | 1745.0 | 1745.0 |
19 | 20 | Robusta | mannya coffee project | Uganda | mannya coffee project | NaN | mannya coffee project | 0 | mannya coffee project | 1200 | ... | Green | 1 | June 27th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1200.0 | 1200.0 | 1200.0 |
20 | 21 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | ... | Bluish-Green | 1 | May 19th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
21 | 22 | Robusta | andrew hetzel | India | sethuraman estates | NaN | sethuraman estates | NaN | cafemakers, llc | 750m | ... | Green | 0 | June 20th, 2014 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
22 | 23 | Robusta | andrew hetzel | United States | sethuraman estates | NaN | sethuraman estates | NaN | cafemakers, llc | 3000' | ... | Green | 0 | February 28th, 2013 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3000.0 | 3000.0 | 3000.0 |
23 | 24 | Robusta | luis robles | Ecuador | robustasa | Lavado 1 | our own lab | NaN | robustasa | NaN | ... | Blue-Green | 1 | January 18th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
24 | 25 | Robusta | luis robles | Ecuador | robustasa | Lavado 3 | own laboratory | NaN | robustasa | 40 | ... | Blue-Green | 0 | January 18th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 40.0 | 40.0 | 40.0 |
25 | 26 | Robusta | james moore | United States | fazenda cazengo | NaN | cafe cazengo | NaN | global opportunity fund | 795 meters | ... | NaN | 6 | December 23rd, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 795.0 | 795.0 | 795.0 |
26 | 27 | Robusta | cafe politico | India | NaN | NaN | NaN | 14-1118-2014-0087 | cafe politico | NaN | ... | Green | 1 | August 25th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
27 | 28 | Robusta | cafe politico | Vietnam | NaN | NaN | NaN | NaN | cafe politico | NaN | ... | NaN | 9 | August 25th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
28 rows × 44 columns
Next we examine the type
type(coffee_df)
pandas.core.frame.DataFrame
This is a new type provided by the pandas
library, called a dataframe
We can also exmaine its parts. It consists of several; first the column headings
coffee_df.columns
Index(['Unnamed: 0', 'Species', 'Owner', 'Country.of.Origin', 'Farm.Name',
'Lot.Number', 'Mill', 'ICO.Number', 'Company', 'Altitude', 'Region',
'Producer', 'Number.of.Bags', 'Bag.Weight', 'In.Country.Partner',
'Harvest.Year', 'Grading.Date', 'Owner.1', 'Variety',
'Processing.Method', 'Fragrance...Aroma', 'Flavor', 'Aftertaste',
'Salt...Acid', 'Bitter...Sweet', 'Mouthfeel', 'Uniform.Cup',
'Clean.Cup', 'Balance', 'Cupper.Points', 'Total.Cup.Points', 'Moisture',
'Category.One.Defects', 'Quakers', 'Color', 'Category.Two.Defects',
'Expiration', 'Certification.Body', 'Certification.Address',
'Certification.Contact', 'unit_of_measurement', 'altitude_low_meters',
'altitude_high_meters', 'altitude_mean_meters'],
dtype='object')
These are a special type called Index that is also provided by pandas.
It also tells us that the actual headings are of dtype
object
. object
is used for strings or columns with mixed types
the dtype
is slightly different from base Python types and is how pandas classifies but roughly is the same idea as a type.
type(coffee_df.columns)
pandas.core.indexes.base.Index
It also has an index (first column, visually) but it is special because this is how you can index the data.
coffee_df.index
RangeIndex(start=0, stop=28, step=1)
Right now this is an autogenerated index, but we can also use the index_col
parameter to set that up front.
coffee_df = pd.read_csv(coffee_data_url,index_col=0)
coffee_df
Species | Owner | Country.of.Origin | Farm.Name | Lot.Number | Mill | ICO.Number | Company | Altitude | Region | ... | Color | Category.Two.Defects | Expiration | Certification.Body | Certification.Address | Certification.Contact | unit_of_measurement | altitude_low_meters | altitude_high_meters | altitude_mean_meters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Robusta | ankole coffee producers coop | Uganda | kyangundu cooperative society | NaN | ankole coffee producers | 0 | ankole coffee producers coop | 1488 | sheema south western | ... | Green | 2 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1488.0 | 1488.0 | 1488.0 |
2 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | 25 | sethuraman estate | 14/1148/2017/21 | kaapi royale | 3170 | chikmagalur karnataka indua | ... | NaN | 2 | October 31st, 2018 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3170.0 | 3170.0 | 3170.0 |
3 | Robusta | andrew hetzel | India | sethuraman estate | NaN | NaN | 0000 | sethuraman estate | 1000m | chikmagalur | ... | Green | 0 | April 29th, 2016 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
4 | Robusta | ugacof | Uganda | ugacof project area | NaN | ugacof | 0 | ugacof ltd | 1212 | central | ... | Green | 7 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1212.0 | 1212.0 | 1212.0 |
5 | Robusta | katuka development trust ltd | Uganda | katikamu capca farmers association | NaN | katuka development trust | 0 | katuka development trust ltd | 1200-1300 | luwero central region | ... | Green | 3 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1200.0 | 1300.0 | 1250.0 |
6 | Robusta | andrew hetzel | India | NaN | NaN | (self) | NaN | cafemakers, llc | 3000' | chikmagalur | ... | Green | 0 | February 28th, 2013 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3000.0 | 3000.0 | 3000.0 |
7 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | chikmagalur | ... | Green | 0 | May 15th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
8 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | 7 | sethuraman estate | 14/1148/2017/18 | kaapi royale | 3140 | chikmagalur karnataka india | ... | Bluish-Green | 0 | October 25th, 2018 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3140.0 | 3140.0 | 3140.0 |
9 | Robusta | nishant gurjer | India | sethuraman estate | RKR | sethuraman estate | 14/1148/2016/17 | kaapi royale | 1000 | chikmagalur karnataka | ... | Green | 0 | August 17th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
10 | Robusta | ugacof | Uganda | ishaka | NaN | nsubuga umar | 0 | ugacof ltd | 900-1300 | western | ... | Green | 6 | August 5th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 900.0 | 1300.0 | 1100.0 |
11 | Robusta | ugacof | Uganda | ugacof project area | NaN | ugacof | 0 | ugacof ltd | 1095 | iganga namadrope eastern | ... | Green | 1 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1095.0 | 1095.0 | 1095.0 |
12 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | RC AB | sethuraman estate | 14/1148/2016/12 | kaapi royale | 1000 | chikmagalur karnataka | ... | Green | 0 | August 23rd, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
13 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | chikmagalur | ... | Green | 1 | May 19th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
14 | Robusta | kasozi coffee farmers association | Uganda | kasozi coffee farmers | NaN | NaN | 0 | kasozi coffee farmers association | 1367 | eastern | ... | Green | 7 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1367.0 | 1367.0 | 1367.0 |
15 | Robusta | ankole coffee producers coop | Uganda | kyangundu coop society | NaN | ankole coffee producers coop union ltd | 0 | ankole coffee producers coop | 1488 | south western | ... | Green | 2 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1488.0 | 1488.0 | 1488.0 |
16 | Robusta | andrew hetzel | India | sethuraman estate | NaN | NaN | 0000 | sethuraman estate | 1000m | chikmagalur | ... | Green | 0 | April 29th, 2016 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
17 | Robusta | andrew hetzel | India | sethuraman estates | NaN | sethuraman estates | NaN | cafemakers, llc | 750m | chikmagalur | ... | Blue-Green | 0 | June 3rd, 2014 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
18 | Robusta | kawacom uganda ltd | Uganda | bushenyi | NaN | kawacom | 0 | kawacom uganda ltd | 1600 | western | ... | Green | 1 | June 27th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1600.0 | 1600.0 | 1600.0 |
19 | Robusta | nitubaasa ltd | Uganda | kigezi coffee farmers association | NaN | nitubaasa | 0 | nitubaasa ltd | 1745 | western | ... | Green | 2 | June 27th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1745.0 | 1745.0 | 1745.0 |
20 | Robusta | mannya coffee project | Uganda | mannya coffee project | NaN | mannya coffee project | 0 | mannya coffee project | 1200 | southern | ... | Green | 1 | June 27th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1200.0 | 1200.0 | 1200.0 |
21 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | chikmagalur | ... | Bluish-Green | 1 | May 19th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
22 | Robusta | andrew hetzel | India | sethuraman estates | NaN | sethuraman estates | NaN | cafemakers, llc | 750m | chikmagalur | ... | Green | 0 | June 20th, 2014 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
23 | Robusta | andrew hetzel | United States | sethuraman estates | NaN | sethuraman estates | NaN | cafemakers, llc | 3000' | chikmagalur | ... | Green | 0 | February 28th, 2013 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3000.0 | 3000.0 | 3000.0 |
24 | Robusta | luis robles | Ecuador | robustasa | Lavado 1 | our own lab | NaN | robustasa | NaN | san juan, playas | ... | Blue-Green | 1 | January 18th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
25 | Robusta | luis robles | Ecuador | robustasa | Lavado 3 | own laboratory | NaN | robustasa | 40 | san juan, playas | ... | Blue-Green | 0 | January 18th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 40.0 | 40.0 | 40.0 |
26 | Robusta | james moore | United States | fazenda cazengo | NaN | cafe cazengo | NaN | global opportunity fund | 795 meters | kwanza norte province, angola | ... | NaN | 6 | December 23rd, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 795.0 | 795.0 | 795.0 |
27 | Robusta | cafe politico | India | NaN | NaN | NaN | 14-1118-2014-0087 | cafe politico | NaN | NaN | ... | Green | 1 | August 25th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
28 | Robusta | cafe politico | Vietnam | NaN | NaN | NaN | NaN | cafe politico | NaN | NaN | ... | NaN | 9 | August 25th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
28 rows × 43 columns
coffee_df.index
Index([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28],
dtype='int64')
Now we see that it uses the actual first column as the index that is bolded.
We can look at the first 5 rows with head
coffee_df.head()
Species | Owner | Country.of.Origin | Farm.Name | Lot.Number | Mill | ICO.Number | Company | Altitude | Region | ... | Color | Category.Two.Defects | Expiration | Certification.Body | Certification.Address | Certification.Contact | unit_of_measurement | altitude_low_meters | altitude_high_meters | altitude_mean_meters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Robusta | ankole coffee producers coop | Uganda | kyangundu cooperative society | NaN | ankole coffee producers | 0 | ankole coffee producers coop | 1488 | sheema south western | ... | Green | 2 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1488.0 | 1488.0 | 1488.0 |
2 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | 25 | sethuraman estate | 14/1148/2017/21 | kaapi royale | 3170 | chikmagalur karnataka indua | ... | NaN | 2 | October 31st, 2018 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3170.0 | 3170.0 | 3170.0 |
3 | Robusta | andrew hetzel | India | sethuraman estate | NaN | NaN | 0000 | sethuraman estate | 1000m | chikmagalur | ... | Green | 0 | April 29th, 2016 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 1000.0 | 1000.0 | 1000.0 |
4 | Robusta | ugacof | Uganda | ugacof project area | NaN | ugacof | 0 | ugacof ltd | 1212 | central | ... | Green | 7 | July 14th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1212.0 | 1212.0 | 1212.0 |
5 | Robusta | katuka development trust ltd | Uganda | katikamu capca farmers association | NaN | katuka development trust | 0 | katuka development trust ltd | 1200-1300 | luwero central region | ... | Green | 3 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1200.0 | 1300.0 | 1250.0 |
5 rows × 43 columns
Try it yourself
How can you look at the first 3 or last 2 rows?
and the last 5 with tail
coffee_df.tail()
Species | Owner | Country.of.Origin | Farm.Name | Lot.Number | Mill | ICO.Number | Company | Altitude | Region | ... | Color | Category.Two.Defects | Expiration | Certification.Body | Certification.Address | Certification.Contact | unit_of_measurement | altitude_low_meters | altitude_high_meters | altitude_mean_meters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
24 | Robusta | luis robles | Ecuador | robustasa | Lavado 1 | our own lab | NaN | robustasa | NaN | san juan, playas | ... | Blue-Green | 1 | January 18th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
25 | Robusta | luis robles | Ecuador | robustasa | Lavado 3 | own laboratory | NaN | robustasa | 40 | san juan, playas | ... | Blue-Green | 0 | January 18th, 2017 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 40.0 | 40.0 | 40.0 |
26 | Robusta | james moore | United States | fazenda cazengo | NaN | cafe cazengo | NaN | global opportunity fund | 795 meters | kwanza norte province, angola | ... | NaN | 6 | December 23rd, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 795.0 | 795.0 | 795.0 |
27 | Robusta | cafe politico | India | NaN | NaN | NaN | 14-1118-2014-0087 | cafe politico | NaN | NaN | ... | Green | 1 | August 25th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
28 | Robusta | cafe politico | Vietnam | NaN | NaN | NaN | NaN | cafe politico | NaN | NaN | ... | NaN | 9 | August 25th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | NaN | NaN | NaN |
5 rows × 43 columns
Important
We did not do this step in class
the shape of a DataFrame is an attribute
coffee_df.shape
(28, 43)
We can pick out columns by name.
coffee_df['Color']
1 Green
2 NaN
3 Green
4 Green
5 Green
6 Green
7 Green
8 Bluish-Green
9 Green
10 Green
11 Green
12 Green
13 Green
14 Green
15 Green
16 Green
17 Blue-Green
18 Green
19 Green
20 Green
21 Bluish-Green
22 Green
23 Green
24 Blue-Green
25 Blue-Green
26 NaN
27 Green
28 NaN
Name: Color, dtype: object
a single column is a new type, called Series
type(coffee_df['Color'])
pandas.core.series.Series
We can pick out rows using the loc
accessor. It is a tricky concept because it is indexing so it uses square brackets []
but it uses a .
like a method. This is a sort of atypical syntax, but we do not use it very often. We pick out single columns a lot, so that has a nice easy syntax like above, but this is rare, so it got the less elegant syntax.
coffee_df.loc[1]
Species Robusta
Owner ankole coffee producers coop
Country.of.Origin Uganda
Farm.Name kyangundu cooperative society
Lot.Number NaN
Mill ankole coffee producers
ICO.Number 0
Company ankole coffee producers coop
Altitude 1488
Region sheema south western
Producer Ankole coffee producers coop
Number.of.Bags 300
Bag.Weight 60 kg
In.Country.Partner Uganda Coffee Development Authority
Harvest.Year 2013
Grading.Date June 26th, 2014
Owner.1 Ankole coffee producers coop
Variety NaN
Processing.Method NaN
Fragrance...Aroma 7.83
Flavor 8.08
Aftertaste 7.75
Salt...Acid 7.92
Bitter...Sweet 8.0
Mouthfeel 8.25
Uniform.Cup 10.0
Clean.Cup 10.0
Balance 7.92
Cupper.Points 8.0
Total.Cup.Points 83.75
Moisture 0.12
Category.One.Defects 0
Quakers 0
Color Green
Category.Two.Defects 2
Expiration June 26th, 2015
Certification.Body Uganda Coffee Development Authority
Certification.Address e36d0270932c3b657e96b7b0278dfd85dc0fe743
Certification.Contact 03077a1c6bac60e6f514691634a7f6eb5c85aae8
unit_of_measurement m
altitude_low_meters 1488.0
altitude_high_meters 1488.0
altitude_mean_meters 1488.0
Name: 1, dtype: object
We can also slice in dataframes, just like in strings.
subset_df = coffee_df.loc[5:8]
subset_df
Species | Owner | Country.of.Origin | Farm.Name | Lot.Number | Mill | ICO.Number | Company | Altitude | Region | ... | Color | Category.Two.Defects | Expiration | Certification.Body | Certification.Address | Certification.Contact | unit_of_measurement | altitude_low_meters | altitude_high_meters | altitude_mean_meters | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | Robusta | katuka development trust ltd | Uganda | katikamu capca farmers association | NaN | katuka development trust | 0 | katuka development trust ltd | 1200-1300 | luwero central region | ... | Green | 3 | June 26th, 2015 | Uganda Coffee Development Authority | e36d0270932c3b657e96b7b0278dfd85dc0fe743 | 03077a1c6bac60e6f514691634a7f6eb5c85aae8 | m | 1200.0 | 1300.0 | 1250.0 |
6 | Robusta | andrew hetzel | India | NaN | NaN | (self) | NaN | cafemakers, llc | 3000' | chikmagalur | ... | Green | 0 | February 28th, 2013 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3000.0 | 3000.0 | 3000.0 |
7 | Robusta | andrew hetzel | India | sethuraman estates | NaN | NaN | NaN | cafemakers | 750m | chikmagalur | ... | Green | 0 | May 15th, 2015 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 750.0 | 750.0 | 750.0 |
8 | Robusta | nishant gurjer | India | sethuraman estate kaapi royale | 7 | sethuraman estate | 14/1148/2017/18 | kaapi royale | 3140 | chikmagalur karnataka india | ... | Bluish-Green | 0 | October 25th, 2018 | Specialty Coffee Association | ff7c18ad303d4b603ac3f8cff7e611ffc735e720 | 352d0cf7f3e9be14dad7df644ad65efc27605ae2 | m | 3140.0 | 3140.0 | 3140.0 |
4 rows × 43 columns
Now loc[1]
will give a key error because there is no 1
in the index.
subset_df.loc[1]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/core/indexes/base.py:3653, in Index.get_loc(self, key)
3652 try:
-> 3653 return self._engine.get_loc(casted_key)
3654 except KeyError as err:
File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/_libs/index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()
File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/_libs/index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:2606, in pandas._libs.hashtable.Int64HashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:2630, in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 1
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[34], line 1
----> 1 subset_df.loc[1]
File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/core/indexing.py:1103, in _LocationIndexer.__getitem__(self, key)
1100 axis = self.axis or 0
1102 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1103 return self._getitem_axis(maybe_callable, axis=axis)
File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/core/indexing.py:1343, in _LocIndexer._getitem_axis(self, key, axis)
1341 # fall thru to straight lookup
1342 self._validate_key(key, axis)
-> 1343 return self._get_label(key, axis=axis)
File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/core/indexing.py:1293, in _LocIndexer._get_label(self, label, axis)
1291 def _get_label(self, label, axis: AxisInt):
1292 # GH#5567 this will fail if the label is not present in the axis.
-> 1293 return self.obj.xs(label, axis=axis)
File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/core/generic.py:4095, in NDFrame.xs(self, key, axis, level, drop_level)
4093 new_index = index[loc]
4094 else:
-> 4095 loc = index.get_loc(key)
4097 if isinstance(loc, np.ndarray):
4098 if loc.dtype == np.bool_:
File /opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/pandas/core/indexes/base.py:3655, in Index.get_loc(self, key)
3653 return self._engine.get_loc(casted_key)
3654 except KeyError as err:
-> 3655 raise KeyError(key) from err
3656 except TypeError:
3657 # If we have a listlike key, _check_indexing_error will raise
3658 # InvalidIndexError. Otherwise we fall through and re-raise
3659 # the TypeError.
3660 self._check_indexing_error(key)
KeyError: 1
the only values that will work in loc
are the ones in the index:
subset_df.index
Index([5, 6, 7, 8], dtype='int64')
however, with iloc
they are indexed by integer values starting with 0.
subset_df.iloc[1]
Species Robusta
Owner andrew hetzel
Country.of.Origin India
Farm.Name NaN
Lot.Number NaN
Mill (self)
ICO.Number NaN
Company cafemakers, llc
Altitude 3000'
Region chikmagalur
Producer Sethuraman Estates
Number.of.Bags 200
Bag.Weight 1 kg
In.Country.Partner Specialty Coffee Association
Harvest.Year 2012
Grading.Date February 29th, 2012
Owner.1 Andrew Hetzel
Variety NaN
Processing.Method NaN
Fragrance...Aroma 8.0
Flavor 7.92
Aftertaste 7.67
Salt...Acid 8.0
Bitter...Sweet 7.75
Mouthfeel 7.75
Uniform.Cup 10.0
Clean.Cup 10.0
Balance 7.92
Cupper.Points 7.75
Total.Cup.Points 82.75
Moisture 0.0
Category.One.Defects 0
Quakers 0
Color Green
Category.Two.Defects 0
Expiration February 28th, 2013
Certification.Body Specialty Coffee Association
Certification.Address ff7c18ad303d4b603ac3f8cff7e611ffc735e720
Certification.Contact 352d0cf7f3e9be14dad7df644ad65efc27605ae2
unit_of_measurement m
altitude_low_meters 3000.0
altitude_high_meters 3000.0
altitude_mean_meters 3000.0
Name: 6, dtype: object
2.9. Questions After Class#
2.9.1. I think this something I need to figure but how do the localhost or just utilizing a url in VS Code? I was late to the class, I never got how to do the jupyter lab thing.#
For this class, you need to use jupyter notebooks without extraneous metadata. If you use jupyter inside of vs code, it adds extraneous metadata that makes it hard to grade and VS code, in my experience, does not provide the most helpful autocomplete for Data Science.
Please see office hours to get help with it.
2.9.2. Is the Python we use in Jupyter lab notebooks any different from traditional Python?#
the Python is mostly all the same. There are different python interpreters that have some slightly different behaviors, but mostly only in the display. As a matter of technicality, jupyter uses the ipython python as the kernel.
2.9.3. Does index just list all the rows?#
the index is the name of the rows the same way that the column headers are the name of the columns.
2.9.4. How you copied the file url from github.#
Click the raw button and then copy the URL from your browswer’s url bar.
2.9.5. How did we change the index?#
we changed the index from the inferred (figured out by pandas) RangeIndex
to a column of the data by adding the index_col=0
parameter to our read_csv
call.
2.9.6. I would like to learn more about the panda commands#
We will continue learning more pandas features for the next few weeks.
2.9.7. are we gonna have to use what we learned today in a bigger program in the future#
Yes, these features we used today are the basis of all of the data analysis we will do all semester. However, we will not be writing “programs” the way you may have for other classes, we will be doing data analyses, which are more narrative.
2.9.8. How can I use Jupyter to clean data?#
jupyter is a way to work with python code. We will learn what clean data looks like and more ways to manipulate dataframes to make it clean in two weeks.
2.9.9. Why is taking data from columns much more common than taking it from rows?#
We set our data up so that each column is a varible. We often want to treat different variables differently, but do the same thing to all of the rows.
2.9.10. I was wondering more about the Index variable type and was also curious as to what that could be used for.#
the Index
type from pandas
is a component of a DataFrame
, we will use them implicitly whenever we work with a part of a dataframe and explicitly when we clean data.
2.9.11. I know by convention we use the typical alias for importing libraries, but is it okay to use our own alias for our own private programs?#
using nonstandard aliases is a bad habit to develop and I cannot endorse it. Technically the code will run, but in class it will be a style violation.
2.9.12. does the panda’s data start indexing at 1 because 0 is where the table headers are located?#
Indexing using loc
started at 1 because the dataset had 1 there, in the second example it started at 5.
using iloc
starts at 0.
2.9.13. In the dataset we loaded, I noticed that there were some zeros, which are nulls, I’m guessing we have to clean those out, and I was wondering how?#
zeros are a value, nulls are encoded in different ways. We will learn how to deal will missing values in two weeks.
2.9.14. How often do data sets need to be cleaned/manipulated before proper analysis can be done?#
Real data, will almost alwasy need to be fixed a little bit.
2.9.15. how does being able to view specific rows/columns help us make conclusions about data?#
For example, maybe one column is the thing you are interested in, you may want to know on stats on the one column.
2.9.16. Are assignments always a certain level or can one assignment be done to be a level 1 or a level 2 assignment#
Going forward, Assignments are always targeted at level 2. In class prismia questions will assess at level 1. An incomplete attempt at an assignment might be evaluated only at level 1, so that can be a way to make up for missed class and then you earn the level 2 in the next assignment that assess that skill.
2.9.17. will I be able to use data from an existing lab that I work in for certain assignment?#
Yes, as long as the data is allowed to be shared. Please confirm with your PI.
2.9.18. When we submit to GitHub, do we need to do anything other than upload the file?#
For your portfolio, no. For other assignments, there will be a step to do, and there will be instructions in the assignment.
2.9.19. Are we going to have to create our own datasets for any future assignments rather than downloading datasets form the Internet?#
Assignment 2 you will build a dataset about datasets, but other than that you will mostly use datasets that you find online or that you have for another purpose.