Assignment 12: Fake News

Due: 2020-12-01

Submission Template

For this exercise you will build a classifier that can distinguish real news from fake news. A training set for this is available here: https://raw.githubusercontent.com/lutzhamel/fake-news/master/data/fake_or_real_news.csv

The fields you are interested in are ‘text’ and ‘label’ with the obvious interpretations. The data set contains a large number of articles (takes a long time to train), you can downsample this to approximately 1,000 articles in order to speed up training and evaluation (hint: use shuffle).

  1. How accurately can you predict real vs fake news from the text?

  2. Are titles of real or fake news more similar to one another based on euclidean distance? for this question, describe what you would need to do to answer it and answer it

For unstructured and workflow plan out solutions, determine what tools you’ll need and answer the two questions. You can earn unstructured by representing the text for analysis, to earn workflow, you’ll need to answer both questions.