TACIT Dictionary

From CSSL
Jump to: navigation, search

Overview

Word count algorithms calculate information about word frequency within a corpus through dictionaries of user-generated topic categories. It is important to note that TACIT dictionaries do not support phrases.

Dictionary Formatting

Category Creation

At the beginning of your dictionary .txt file, you need to indicate the categories that your dictionary's words will reflect. To do this, the first line of the file should have the  % symbol. For each subsequent line, put the number you will use to indicate the category followed by a space and then the name of the category. Once you have listed all of your categories, add another  % to the next line. For example, the beginning of your file should look like this:
 %
1 funct
2 pronoun
3 ppron
4 article
%

On the next line, you will start listing the words that are included in the categories with one word per line. To indicate which category or categories that word belongs to, tab over once and then add the category number. If the word belongs to multiple categories, you can list multiple category numbers after the word separated by one tab in between each number. For example:
"about 1 16 17" indicates that the word about should be counted for categories 1, 16 and 17.

Stemming

Fill in information here about why you might want to stem.
You can stem your dictionary ahead of time (as is the case with LIWC-style dictionaries) or you can request that the program stem your dictionary before running analyses.

Snowball Stemming

snowball stemming explanation

LIWC-style Stemming

LIWC stemming explanation compared to snowball stemming

Using Multiple Dictionaries

TACIT allows users to analyze text using multiple dictionaries simultaneously. However, category number assignment must be consistent across all included dictionaries. Specifically, if category 2 is pronoun in Dictionary_1, the program will assume that category 2 is also pronoun for Dictionary_2. However, both dictionaries do not require that all categories be present. You may introduce additional set of categories in the second dictionary file, as long as the category numbers used are unique from those found in the first dictionary file.

External Dictionary Sources

Many researchers have created validated dictionaries that assess a wide variety of topics.

Non-Exhaustive List of Research Topic Dictionary Resources:
Body Image Wilson, 2006.
Financial Sentiment Loughran & McDonald, 2011.
Lasswell Dictionary Namenwirth & Weber, 1987.
LIWC (LInguistic Word Count) Pennebaker, Francis, & Booth, 2001: Full dictionary available for research purposes only directly from James Pennebaker here
Harvard IV-4 dictionary.
Dolch Sight Words Dolch.
Regressive Imagery Dictionary (RID) Martindale: List of topic words can be found in python code here.
Whissell Dictionary of Affect in Language Cynthia Whissell:
Affective Norms for English Words (ANEW) here Bradley & Lang: Set of normative emotional ratings for English Words (similar to IAPS ratings, but for words instead of images).
Common Nouns Quintans
Social Ties Pressman & Cohen, 2007
PANAS-X Watson & Clark, 1994
Opinion Sentiment Lexicon Hu & Liu, 2004.
Common Nouns Quintans.
Common Nouns Quintans.
Common Nouns Quintans

Dictionary Selection

Selecting dictionary location, list of basic dictionary requirements and link to dictionary overview page