Part I - Concepts Quantitatively Compared and Explored

Subthemes: Introduction to Themes, for Analysis

This series of blog posts will serve as notes to myself on my long-term, evolving project. The motivation for that project is to explore whether and how concepts can be compared quantitatively to each other by their relative importance (centrality) or cognitive density (complexity). For example, can we say anything meaningful and empirical on those two dimensions to compare the concepts of “Silence” and “Destruction”? Stated differently, can data suggest one of those concepts is better known, more widely used, broader in associations – more central to English speakers, or whether one is deeper in thoughtfulness, more inter-connections, an abundance of associations – more complex?

Part II -- Metrics Suggesting Centrality or Complexity of Themes

Subthemes: Metrics

Part I of this series of five posts (working notes for myself in fact) provides an overview of the Themes analyzed in this introductory investigation of centrality and complexity. This second post lays out the metrics that underlie the analysis. I present all 20 of them, with the brief introduction of each metric covering similar topics and order: first, why I gathered the metric, next the source of the metric, and then any calculations that were carried out. Finally, a two-by-two layout of plots shows the results for the four preceding metrics, each of whom loosely shares a common attribute.

Part III - Find Similar Themes with K-means Clustering

Subthemes: K-means clustering

The first post in this series of five explains the source of the 24 Themes written about on the blog Themes from Art and used in this series. The second post describes the set of 20 metrics collected for each of those Themes. This work-in-process post starts with the Themes and metrics, and seeks to find most-similar Themes by software methodology called k-means clustering.

Part IV -- Find Similar Themes with Agglomerative Hierarchical Clustering

Subthemes: Hierarchical Clustering

Part I of this working series introduced Themes, and Part II introduced certain metrics pertaining to those Themes. Part III started the analysis of Theme centrality and complexity by an unsupervised clustering algorithm, k-means clustering. It requires the analyst to specify the number of clusters, and figuring out the optimal number of clusters can often be hard. Part III side-stepped that challenge because the k-means algorithm produces data that allows you to calculate the Euclidean distance between closest, most-similar Themes.

Part V - Find Similar Themes with Reinert Textual Data Clustering

Subthemes: Reinert Textual Data Clustering

Concluding this fifth post in a series of notes on Theme centrality and complexity, I turn to the Reinert textual clustering method. To prepare for its analysis, I removed portions of blog posts that are similar in all posts, as well as R’s blogdown headers and material I didn’t write, and then combined all the posts for a Theme into a single “text.” Accordingly, 24 texts. Then I used R’s rainette package to perform Reinert textual clustering of those 24 texts.