TACIT Supreme Court Case Crawler

From CSSL
Jump to: navigation, search

Overview

The US Supreme Court Crawler is a program that automatically scans and collects court case transcription data from the IIT Chicago-Kent College of Law Supreme Court Case Website and writes that data into text files that are readable by automated text analysis programs.This crawler also saves meta data about each file (e.g., majority author, date decided; see output section below) and includes options to download case audio mp3 files.

Basic Tutorial: Collecting data using the US Supreme Court Crawler

Selecting Supreme Court Cases

First, you will need to determine which Supreme Court cases you would like the crawler to collect by selecting a Filter Type.

Selecting the Term filter type will allow you to select cases by calendar year. Cases are grouped by single year and also by decade.

Selecting the Issues filter type will allow you to select cases by topic or social issue of interest.

Specifying Output Folder

To specify an output folder where crawled files will be saved, click on the Browse button to the right of the Output Location bar and select a folder. If you create a new folder within this menu and change its name from "New Folder", click on any other folder and then click back on your newly created & renamed folder to select it. After specifying all parameters, click the green and white play button (Image 1) located in the top right corner of the window to run the program. Output information will display in the console panel at the bottom of the tool.

Audio Files

To download the .mp3 audio recordings of the court cases along with the transcriptions, check the box nex to "Download Audio". As these files tend to be larger, you also have the option to check the "Truncate" box which will only download the first mb of each audio file. Note that this does not compress the original full audio file and selecting this option will only provide the beginning of the full file.

Understanding Supreme Court Crawler Output

The Console panel at the bottom of the tool will provide information about the Supreme Court term/issue selected, the website url being crawled, and the names of the files being created while the crawler is running. Once complete, the data gathered will be in .txt text file format, with a separate .txt for each document. The generated text file name format is:The court case transcription output will be saved in .txt text file format, with a separate .txt for each document/case. File Names xxxx. Documents will be organized by xxx.

The program will also output a file called supremecourt-summary-[date-time].csv . This .csv file provides a summary of all files created with additional documentation about each file including (when available): case name, docket number, date argued, date decided, majority author, vote, file type (transcript or audio), and the downloaded file name.