Contents
Welcome to crawlib
Documentation¶
Crawl library provides crawler project building block to simplify:
- url encoding.
- html parse.
- error handling.
- download html and file.
- request cache.
- duplicate filter.
- width first crawl strategy.
In addition, it is a web crawl framework for width first style crawling.
For example, suppose the target data is organized in a tree structure, for instance, State -> City -> Zipcode -> Street -> Address. Then crawlib
is born for it.
Here is an Example Project for scraping data from https://crawlib.readthedocs.io/_static/state-list.html.
Install¶
crawlib
is released on PyPI, so all you need is:
$ pip install crawlib
To upgrade to latest version:
$ pip install --upgrade crawlib
About the Author¶
(\ (\
( -.-)o I am a lovely Rabbit!
o_(")(")
Sanhe Hu is a very active Python Developer Since 2010. Research area includes Machine Learning, Big Data Infrastructure, Block Chain, Business Intelligent, Open Cloud, Distribute System. Love photography, vocal, outdoor, arts, game, and also the best Python.
- My Github: https://github.com/MacHu-GWU
- My HomePage: http://www.sanhehu.org/