TACIT Clustering Overview
TACIT Clustering Tools
List of TACIT Clustering Tools
Overview of Clustering Tools
Cluster analysis techniques automatically sort texts into groups based on similarities in the text itself, allowing researchers to identify new ways of grouping texts based on similarities that they have not pre-determined.
TACIT includes cluster analysis plugins that interface with two of the most widely used cluster analysis algorithms: k-means clustering and hierarchical clustering. The K-means clustering tool (MacQueen, 1967) aims to cluster texts into a user-specified number of cluster (or groups) such that the texts included in each cluster are the nearest to the cluster’s centroid (the prototypical document of that cluster), and have the farthest distance from other clusters’ centroids.
Hierarchical clustering (Johnson, 1967) by comparison does not require a prespecified number of clusters. Instead, this technique identifies the optimal number of clusters by starting with the assumption that all data points (in this case documents) belong to one cluster. The algorithm then splits this root cluster into smaller child clusters based on the degree of similarity between the documents. These child clusters are recursively divided further until only singleton clusters remain. which have the longest distance.