TACIT Senate Crawler

From CSSL
Jump to: navigation, search

Overview

The US Congress Crawler collects speech transcription data from the Library of Congress THOMAS Website for present day speeches to as far back as the 101st Congress. Given the congress number, senator/representative details and other filtering options, the US Congress Crawler automatically scans and collects speech transcription data and writes that data into text files for analysis. This crawler also saves additional information about the speeches in a summary csv file (e.g., the name of the speaker, the date of the speech).

Transcripts of the speeches given at the US congress have been used to assess a wide variety of social and political psychological research questions including the link between congressional prosociality and approval ratings (Frimer, Aquino, Gebauer, Zhu, & Oakes, 2015), differences in partisan emotional states (Wojcik, Hovasapian, Graham, Motyl, & Ditto, 2015), and partisan language use differences (Yu, Kaufmann, & Diermeier, 2008).

Basic Tutorial: Collecting data using the Congress Crawler

Selecting Congress Members

You will need to determine which Congress speeches you would like the crawler to collect. First, select either Senators or Representatives from the Member of Congress panel. Then, under the Input Parameters panel, select the Congress you would like to crawl from the drop down menu. Next, click the Add button to the right of the Senator box to select the senator(s) or representative(s) in that Congress for which you wish to collect data. Speeches can be collected for any individual congressman, all congressmen, or for all congressmen with a particular party affiliation (i.e., Republican, Democrat, Independent).

Types of Congressional Records

Within the Limit Records panel, you can specify which types of congressional records you would like to crawl: Senate records, House Records, Extension of Remarks, and/or Daily Digest.Descriptions taken directly from the THOMAS website of each type of document are included here.

House and Senate Records: "The Congressional Record is not an exact record of the proceedings and debate in the House and Senate. As previously stated, it is a substantially verbatim report. In addition to debate, the Record contains communications from the President and the Executive Branch, memorials, petitions, and various information (including amendments and cosponsors) on legislation introduced and/or passed. Committee activities usually are not reported in the body of the Record other than the mentioning of reports made to the House or Senate or notices of meetings.

In addition, Members of both Houses are allowed to edit the transcript of their remarks before publication in the daily Record, permanent Record, or both. Also, by unanimous consent, House Members may be granted leave to revise and extend their remarks. Senators may be given permission to have inserted in the Record, at the point where they stopped speaking, any unfinished remarks. Remarks and extraneous material not necessarily pertaining to legislation may also be inserted, subject to certain limitations."

Extensions of Remarks: "This section is used by Members of the House to include additional legislative statements not delivered on the House floor as well as extraneous materials such as the text of speeches delivered outside Congress, letters from and tributes to constituents, and newspaper or magazine articles. (Remarks not delivered by Senators that are to be inserted in the Record are usually found in the "Additional Statements" section of the Senate proceedings, effective February 10, 1970.)"

Daily Digest : The Daily Digest "was established by the Legislative Reorganization Act of 1946 (Public Law 601, 79th Congress) to provide a concise and convenient account of actions taken by the House, Senate committees, and subcommittees during the previous day and activities scheduled the next day.In practice, the Daily Digest contains a summary of work of the day covered in the body of the Record, organized under Highlights, Senate Chamber Action, Senate Committee Meetings, House Chamber Action, House Committee Meetings, and Joint Committee Meetings and a list of committee meetings scheduled for the next day. Friday issues, or the last issue of the week, contain, in addition, a section entitled Congressional Program Ahead which outlines the plans of each Chamber and its committees for the coming week.

Limiting Number of Files Created

To download a subset of speeches from a selected congressman (thus decreasing the number of files created), check the box next to Limit Records per Senator in the Limit Records panel. Next, enter the number of speeches you would like in the No.of.Records per Senator box. If Sort Records by Date is set to yes , the crawler will return the number of speeches requested in chronological order starting from the most recent speech on record. If no is selected, the crawler will return the number of speeches requested as a random subset of that Senator's records.

Specifying Output Folder

To specify an output folder where crawled files will be saved, click on the Browse button to the right of the Output Location bar and select a folder. If you create a new folder within this menu and change its name from "New Folder", click on any other folder and then click back on your newly created & renamed folder to select it. After specifying all parameters, click the green and white play buttonlocated in the top right corner of the window to run the program. Output information will display in the console panel at the bottom of the tool.

Understanding Congress Crawler Output

The Console panel at the bottom of the tool provides information about the congress member selected, the website url being crawled, and the names of the files being created while the crawler is running. Once complete, the console will print the total number of files created. The data gathered for each speech will be saved in .txt text file format, with a separate file for each speech and separate folders for each senator. The generated text file name format is: Congress#-Senator last name-party affiliation-state affiliation-date-title (e.g., 114-Ayotte-R-NH-June 10 2015-NATIONAL DEFENS-355.txt)

The program will also create a overview file called senate-summary-[date-time].csv . This .csv file provides a summary of all files created with additional documentation about each file including (when available): Congress number, Speech Date, Senator Name, Political Affiliation, US State Affiliation, Title of Speech, and the downloaded file name.