CMMB 461 Lecture Notes - Lecture 9: Dna Microarray, Hierarchical Clustering, Dendrogram
Document Summary
Microarrays generate thousands of data points for each experiment (high dimensional data) Viewing/analyzing such volumes of data is overwhelming in spreadsheets and graphs. One approach to reduce the number of data points is to cluster and group objects (eg. genes) based on their similarity to each other. Genes that tend to be in the same pathway or have the same function should show the same properties. Euclidean distance: absolute distance between two points in space. Correlation distance: similarity of the directions in which two vectors point. How close the vectors are in terms of the direction in which they are moving. For microarrays, we are clustering genes with similar expression. Do not have to specify (cid:1688)k(cid:1689) number of clusters. Deterministic: cluster together the two closest genes and then the next closest. If you take the same data and use the same clustering metric, you"ll always get the same results. Have to specify (cid:1688)k(cid:1689) number of clusters.