8. Assignment 8: Clustering#

accept the assignment Due: 2023-03-29

8.1. Evaluation#

Eligible skills: (links to checklists)

  • first chance clustering 1 and 2

  • evaluate 1 and 2

  • python 1 and 2

  • summarize 1 and 2

  • visualize 1 and 2

for some of these you will need to add analysis that is not described in the instructions below, but is related to this and that skill

8.3. Instructions#

Use the same dataset you used for assignment 7, unless there was a problem. If you skipped assignment 7, choose a dataset well suited for classification.

  1. Describe what question you would be asking in applying clustering to this dataset. What does it mean if clustering does not work well?

  2. Apply Kmeans using the known, correct number of clusters, \(K\).

  3. Evaluate how well clustering worked on the data:

    • using a true clustering metric and

    • using visualization and

    • using a clustering metric that uses the ground truth labels

  4. Include a discussion of your results that addresses the following:

    • describes what the clustering means

    • what the metrics show

    • Does this clustering work better or worse than expected based on the classification performance (if you didn’t complete assignment 7, also apply a classifier)

  5. Repeat your analysis using a 2 different numbers (1 higher, one lower) of clusters:

    • can you interpret the new clusters?

    • how do they relate to the original clusters? are they completely different, did one split?

    • is there a reasonable explanation for more clusters than there are classes in this dataset?