12. Assignment 12: Fake News#
12.1. Quick Facts#
first: 2023-11-29
final: 2023-12-07
Note
there are 2 deadlines for this instead of an assignment 13. This means that we will review whatever you have done by the 29th on 11/30 and give you personalized feedback in order to finish the assignment.
12.3. Assessment#
Eligible skills: (links to checklists)
12.4. Instructions#
Use the dataset in the assignment template repo to answer the following questions.
Is the text or the title of an article more predictive of whether it is real or fake?
Are titles of real or fake news more similar to one another?
The data includes variables:
‘text’: contents of an article
‘label’: whether it is real or fake news
‘title’: title of the article
Include narrative around the code required to answer these and interpret the results to give an actual answer.
Provide context on your answer and consider how strong it is based on what differences you can have in how you represent the data and how that might impact your model performance.
Consider if the analysis you have done is enough evidence answer the question from the analysis you have completed or could something else chang the answer.
Use summary statistics and visualizations appropriately in order to explain your results.
Hint
The data set contains a large number of articles (takes a long time to train), you can downsample this to something like a 1,000 articles or so in order to speed up training and evaluation (hint: use shuffle).