CMSC 395 -- Lab 1

Clustering

Introduction

Clustering is an unsupervised method of grouping data. We have discussed several clustering methods including k-means and hierarchical clustering.

The Data

This assignment will be using time series hurricane data. You will need to find both spacial data (lat/long) and categorical data (wind speed/pressure/damage/cat./rainfall) for hurricanes. You do not need to find data for all categories but do need both position data and categorical data.

Clustering

You need to cluster on both position and category information. You will need to choose clustering methods, distance metrics, number of clusters, and features to use. You should generate clusters by varying many of these parameters. You may implement your own version of Llyods algorithm, use predefined packages, or other software.

Analysis

A major part of pattern recognition is understanding what your algorithm provides/does (testing!). A great way to understand your algorithm is to visually look at results. You should generate graphs of your clusters which show both what was clustered together and the prototype (centroid) of the cluster.

Analysis of your data can reveal problems with your algorithms/metrics. Answer the following questions for each of your clusterings.
  • What is the spread of each cluster? (are they similar in error)
  • Are there outliers?
  • Is the distance metric meaningful?
  • What is an english explanation for what the clusters are showing?

Presentation

You will need to present your findings to the class/me. You will present what YOU did on the project. These findings should be put in a/some powerpoints.
  • What data did you use? Where did you find it? How did you combine it with other data?
  • What code did you use? What language/program? What distance metrics can you use?
  • What features did you sort on? What distance metrics?
  • What graphs/plots did you produce? How did you produce them? What do they mean?
  • Explain your findings.

Handin

Your presentation and code will be submitted on canvas. In addition you will submit a readme document describing what you did in detail and rating (positive/negative/neutral) your fellow groupmates.

You will present 9/26 at 4pm with your materials due 9/26 at 11:59pm