Assignment 9: Clustering
Contents
9. Assignment 9: Clustering#
9.1. Quick Facts#
Due: 2020-11-10 11:59pm
9.3. Assessment#
task |
skill |
---|---|
apply and interpret kmeans clustering |
clustering (2) |
use multiple metrics evaluate performance |
evaluate (2) |
interpret how decisions impact model performance |
evaluate (2) |
interpret the classifier performance in the context of the dataset |
process (2) |
analyze the impact of model parameters on model performance |
process (2) |
usse EDA techniques to interpret the experimental results |
summarize (2), visualize (2) |
9.4. Instructions#
Use the same dataset you used for assignment 7, unless there was a problem, or pick one of the recommended ones for that assignment if you did not complete assignment 7.
Describe what question you’d be asking in applying clustering to this dataset.
Apply Kmeans using the known, correct number of clusters, \(K\).
Evaluate how well clustering worked on the data:
using a true clustering metric
using visual inspection
using a clustering metric that uses the ground truth labels
Include a discussion of your results that addresses the following:
describes what the clustering means
what the metrics show
Does this clustering work better or worse than expected based on the classification performance (if you didn’t complete assignment 7, also apply a classifier)
Repeat your analysis using a different number of clusters:
can you interpret the new clusters?
how to they relate to the original clusters? are they completely different, did one split? did some merge?
is there a reasonable explanation for more clusters than there are classes in this dataset?
Think Ahead
How can clustering be used to ask many different questions? What can you do with clustering results?