https://travis-ci.org/MacHu-GWU/crawlib-project.svg?branch=master https://codecov.io/gh/MacHu-GWU/crawlib-project/branch/master/graph/badge.svg https://img.shields.io/pypi/v/crawlib.svg https://img.shields.io/pypi/l/crawlib.svg https://img.shields.io/pypi/pyversions/crawlib.svg https://img.shields.io/badge/Star_Me_on_GitHub!--None.svg?style=social

Welcome to crawlib Documentation

Crawl library provides crawler project building block to simplify:

  1. url encoding.
  2. html parse.
  3. error handling.
  4. download html and file.
  5. request cache.
  6. duplicate filter.
  7. width first crawl strategy.

In addition, it is a web crawl framework for width first style crawling.

For example, suppose the target data is organized in a tree structure, for instance, State -> City -> Zipcode -> Street -> Address. Then crawlib is born for it.

Here is an Example Project for scraping data from https://crawlib.readthedocs.io/_static/state-list.html.

Install

crawlib is released on PyPI, so all you need is:

$ pip install crawlib

To upgrade to latest version:

$ pip install --upgrade crawlib

About the Author

(\ (\
( -.-)o    I am a lovely Rabbit!
o_(")(")

Sanhe Hu is a very active Python Developer Since 2010. Research area includes Machine Learning, Big Data Infrastructure, Block Chain, Business Intelligent, Open Cloud, Distribute System. Love photography, vocal, outdoor, arts, game, and also the best Python.