TACIT Latin Crawler

From CSSL
Jump to: navigation, search

Overview

The Latin Crawler tool collects text from The Latin Library Website and writes that data into text files that are readable by automated text analysis programs.

Basic Tutorial: Collecting data using Latin Crawler

Author Selection

First you will need to specify the author(s) for which you would like to request documents by clicking on the "Add" button to the right of the Author Details box. Authors can be selected one at a time, or you can select multiple authors by clicking their names while holding down the "ctrl" button. You can select a range of authors by holding down "ctrl" "shift".You can also remove authors by selecting their name in the Author Details box and clicking the "Remove" button.

Specifying Output Folder

To specify an output folder where crawled files will be saved, click on the Browse button to the right of the Output Location bar and select a folder. If you create a new folder within this menu and change its name from "New Folder", click on any other folder and then click back on your newly created & renamed folder to select it. After specifying all parameters, click the green and white play button (Image 1) located in the top right corner of the window to run the program. Output information will display in the console panel at the bottom of the tool. Note: The program will create a new folder for each author requested and it will save all files by that author within that folder. Sub-folders will also be generated for document files from collected works.

Understanding Latin Crawler Output

The data output will be in .txt text file format, with a separate text file for each document. Documents will be organized by Author, with subfolders for different collected works. All generated text files are named after the document crawled (e.g., ad Amicum Suum Consolatoria.txt)