decorator¶

There are three major popular libraries widely used for making http request:

And there are two major popular library widely used for extracting data from html:

This module bridge the gap.

crawlib.html_parser.decorator.auto_decode_and_soupify(encoding=None, errors='strict')¶

This decorator assume that there are three argument in keyword syntax:

if soup is not available, it will automatically be generated from

html.
if html is not available, it will automatically be generated from

response.

Usage:

@auto_decode_and_soupify()
def parse(response, html, soup):
    ...

中文文档

此装饰器会自动检测函数中名为 response, html, soup 的参数, 并在 html, soup 未给出的情况下, 自动生成所期望的值. 被此装饰器装饰的函数必须要有以上提到的三个参数. 并且在使用时, 必须使用keyword的形式进行输入.

crawlib.html_parser.decorator.soupify(html)¶: Convert html to BeautifulSoup. It solves api change in bs4.3.

crawlib.html_parser.decorator.validate_implementation_for_auto_decode_and_soupify(func)¶: Validate that auto_decode_and_soupify() is applicable to this function. If not applicable, a NotImplmentedError will be raised.

crawlib