The New York Times Crawler

This is an application I developed for my Information Retrieval (EECS 498) class at the University of Michigan.

Description: The New York Times Crawler will output a list of the most popular events, phrases, and people given a specific time period and category. Try it out!

Instructions: Select a time interval and a category. The application will start a query to the database (using the nytimes developer apis). The application will then use the New York Times abstracts within the time interval and extract key terms and phrases (utilizing proper nouns and other extraction techniques). It will then query the nytimes TimesTags API to determine if the feature is talking about a known person or organization, and will enqueue that term and all relevant terms (found by TimesTags) to be searched. This application was created using PHP, AJAX/Javascript, and Perl

Examples for proof of concept:
8/8/2008 - 8/14/2008 Sports: Beijing olympics
1/19/2009 - 1/21/2009 U.S: Obama's Inauguration
1/27/1986 - 1/29/1986 all: Challenger Crash

 

An existing query is being executed. Only one crawl can be executed at a time.

See results from current crawl

 

Courtesy The New York Times