Data Item _em_classes.clustering_method


Item name
Category name
Attribute name
Required in PDB entries
Used in current PDB entries

Item Description

The clustering_method used

Data Type

Data type code
Data type detail
code item types/single words (case insensitive) ...
Primitive data type code
Regular expression

Controlled Vocabulary

Allowed Value Details
Automatic Clustering and Hierarchical Ascendant Classifications (HAC) HAC it uses only Ward's criterion. Ward's criterion
states that merging HAC clusters should be focused
on minimizing the added interclass variance. The two
clusters that differ the least between each other
will be merged and create a new group, one "level" higher.
Correspondence Analysis (CA) uses Chi-squared distance This is superior
because it ignores differences in exposure between
images, eliminating the need to rescale between images.
Didays method A disadvantage of the K-means method is that the
final grouping is very dependent of what seeds are
initially chosen. Diday surpassed this by appplying
the K-means technique multiple times with different
seeds. Then, cross-tabuluating the results, and
using only the clusters that were repeatedly formed.
K-Means Clustering K-Means is a method of clustering that devides the data
into a user defined number of groups. Two random images
"seeds" are chosen, and their centers of gravity are
computed. A partition is drawn down the middle between
the centers, the new centers of gravity are computed,
and the process is repeated for a given number of times.
The final result is VERY dependent on which image seeds
are the first chosen. Because our faces data set is
manufactured. We know exactly which images are identical,
except the random noise, and the exact number of groups.
The output discussed was obtained with 8 classes, using
factors 1-3, and an even factor weight of 1.0 between
those three factors.
Principal Component Analysis (PCA) computes the distance between data vectors
with Euclidean distances.
Wards method
average linkage
centroid method
complete linkage
single linkage

Controlled Vocabulary at Deposition

Allowed Value Details
Automatic Clustering and Hierarchical Ascendant Classifications (HAC) HAC it uses only Ward's criterion. Ward's criterion
states that merging HAC clusters should be focused
on minimizing the added interclass variance. The two
clusters that differ the least between each other
will be merged and create a new group, one "level" higher.
Correspondence Analysis (CA) uses Chi-squared distance This is superior
because it ignores differences in exposure between
images, eliminating the need to rescale between images.
Didays method A disadvantage of the K-means method is that the
final grouping is very dependent of what seeds are
initially chosen. Diday surpassed this by appplying
the K-means technique multiple times with different
seeds. Then, cross-tabuluating the results, and
using only the clusters that were repeatedly formed.
K-Means Clustering K-Means is a method of clustering that devides the data
into a user defined number of groups. Two random images
"seeds" are chosen, and their centers of gravity are
computed. A partition is drawn down the middle between
the centers, the new centers of gravity are computed,
and the process is repeated for a given number of times.
The final result is VERY dependent on which image seeds
are the first chosen. Because our faces data set is
manufactured. We know exactly which images are identical,
except the random noise, and the exact number of groups.
The output discussed was obtained with 8 classes, using
factors 1-3, and an even factor weight of 1.0 between
those three factors.
Principal Component Analysis (PCA) computes the distance between data vectors
with Euclidean distances.
Wards method
average linkage
centroid method
complete linkage
single linkage