1. Welcome to Programming to Data Science#

Today’s goals:

  1. Operate tools for in-class participation

  2. Understand what Data Science is, in broad terms

  3. Understand the syllabus (grading, topics covered, schedule, etc)

  4. Understand how to learn in this course


1.1. Prismia Chat#

We will use these to monitor your participation in class and to gather information. Features:

  • instructor only

  • reply to you directly

  • share responses for all

1.2. What is Data Science#

In general: venn diagram of CS, Stats, & domain expertise with DS at the center

statistics is the type of math we use to make sense of data. Formally, a statistic is just a function of data.

computer science is so that we can manipulate visualize and automate the inferences we make.

domain expertise helps us have the intuition to know if what we did worked right. A statistic must be interpreted in context; the relevant context determines what they mean and which are valid. The context will say whether automating something is safe or not, it can help us tell whether our code actually worked right or not.

For this class venn diagram of CS, Stats, & domain expertise with DS at the center, w/310 location marked

We’ll focus on the programming as our main means of studying data science, but we will use bits of the other parts. In particular, you’re encouraged to choose datasets that you have domain expertise about, or that you want to learn about.

But there are many definitions. We’ll use this one, but you may come across others.

1.2.1. How does data science happen?#

DS pipeline: collect, clean, explore, model, deploy

1.2.2. how we’ll cover it, in depth#

DS pipeline: collect, clean, explore, model, deploy

  • collect: Discuss only a little; Minimal programming involved

  • clean: Cover the main programming techniques; Some requires domain knowledge beyond scope of course

  • explore: Cover the main programming techniques; Some requires domain knowledge beyond scope of course

  • model:Cover the main programming, basic idea of models; How to use models, not how learning algorithms work

  • deploy: A little bit at the end, but a lot of preparation for decision making around deployment

1.2.2.1. how we’ll cover it in, time#

DS pipeline: collect, clean, explore, model, deploy

We’ll cover exploratory data analysis before cleaning because those tools will help us check how we’ve cleaneed the data.

1.3. How this class will work#

  • today is an exception

  • in general we’ll be live coding

Let’s look at the syllabus

Read carefully to make sure you understand the grading; it’s not typical points and an average.

Class is designed to avoid this:

1.4. gif of man throwing computer monitor#

1.5. Learning Cycle#

Read more about how I’m designing this course to help you learn on the how to learn page.

1.6. Check your understaning of the syllabus#

It’s easy when reading something long to lose track of it. Your eyes can go over each word, without actually retaining the information, but it’s important to understand the syllabus for the course.

You can find the answers to the following questions on the syllabus. If you’ve already read it, try answering them to check your understanding. If you haven’t read it yet, use these to guide you to get familiar with finding key facts about the course on the syllabus.

  1. What do you need to bring to class each day?

  2. What is the basis of grading for this course?

  3. How do you reference the course text?

  4. What is the penalty for missing an assignment?

More information about the course is available throughout the site, the next few questions will help you self-check that you’ve found the important things. Remember, the goal is not necessarily to memorize all of this, but to be able to find it.

  1. When & what are you expected to read for this class?

  • [ ] read the text book before class

  • [ ] review notes & documentation after class

  • [ ] preview the notes & documentation before class

  • [ ] read documentation and text book after class

  1. Your assignment says to find a dataset that has variables of a specific type, which website can you use?

  2. Your assignment says to find a dataset of any type about something you’re interested in, which resource would you use?