Welcome to `crawlib` Documentation ¶

Crawl library provides crawler project building block to simplify:

url encoding.
html parse.
error handling.
download html and file.
request cache.
duplicate filter.
width first crawl strategy.

In addition, it is a web crawl framework for width first style crawling.

For example, suppose the target data is organized in a tree structure, for instance, State -> City -> Zipcode -> Street -> Address. Then crawlib is born for it.

Here is an Example Project for scraping data from https://crawlib.readthedocs.io/_static/state-list.html.

Install ¶

crawlib is released on PyPI, so all you need is:

$ pip install crawlib

To upgrade to latest version:

$ pip install --upgrade crawlib

About the Author ¶

(\ (\
( -.-)o    I am a lovely Rabbit!
o_(")(")

Sanhe Hu is a very active Python Developer Since 2010. Research area includes Machine Learning, Big Data Infrastructure, Block Chain, Business Intelligent, Open Cloud, Distribute System. Love photography, vocal, outdoor, arts, game, and also the best Python.

My Github: https://github.com/MacHu-GWU
My HomePage: http://www.sanhehu.org/

Welcome to `crawlib` Documentation ¶

Quick Links ¶

Install ¶

Table of Content ¶

About the Author ¶

API Document ¶

crawlib

Navigation

Related Topics

Welcome to crawlib Documentation¶

Quick Links¶

Install¶

Table of Content¶

About the Author¶

API Document¶

Welcome to `crawlib` Documentation ¶

Quick Links ¶

Install ¶

Table of Content ¶

About the Author ¶

API Document ¶