HarvestMan - The HarvestMan Web Crawler

HarvestMan : News | About | Releases | Project page
| FAQ | Architecture | Downloads | Projects | Links & Related Projects
|

foss india awards icon

|
Google
Web this site

Welcome

Welcome to the project page of the HarvestMan web crawler.

Companion Website (new)

HarvestMan has a new companion website, thanks to Tom Smith. The new site has more current information including a Wiki which is updated frequently.

News (Updated May 08 2008)

Read the latest news about HarvestMan.

Development Code

Browse or download the bleeding edge source code.

About HarvestMan

HarvestMan is a web crawler application written in the Python programming language. HarvestMan can be used to download files from websites, according to a number of user-specified rules. The latest version of HarvestMan supports as much as 60 plus customization options. HarvestMan is a console (command-line) application.

HarvestMan is the only open source, multithreaded web-crawler program written in the Python language. HarvestMan is released under the GNU General Public License.

Current Release

The latest release of HarvestMan is 1.4.6.

  • Read the Changelog for this release
  • Download the files for this release

More information is available on the releases page.

Architecture

See the architecture of HarvestMan.

HarvestMan Configuration

HarvestMan is typically run by reading options from a configuration file. The configuration file is in the XML format. By default it is named config.xml. This overrides an older text format, where configuration options were represented as name/value pairs in a text file. This page describes the older format in detail.

Here is a sample config file of HarvestMan.

HarvestMan command-line options

HarvestMan also accepts command-line options. The Command line FAQ describes the most important command-line options for HarvestMan.

Developers

The original developer of HarvestMan is Anand B Pillai. Anand is a software professional, based in Bangalore, India..

History

For an interesting article on the history of HarvestMan, read this interview.

Downloads

Check the download page for HarvestMan downloads.

Contacts

Email address.