深度剖析使用python抓取网页正文的源码

2019-10-06 14:15:42于丽

        group = remove_image (group)
        group = remove_video (group)
        text_a,text_b= remove_any_tag_but_a (group)
        temp = (text_b - text_a) - 8
        group_value.append (temp)
    left,right = sum_max (group_value)
    return left,right, len('n'.join(tmp[:left])), len ('n'.join(tmp[:right]))

def extract (content):
    content = remove_empty_line(remove_js_css(content))
    left,right,x,y = method_1 (content)
    return 'n'.join(content.split('n')[left:right])

代码 从最后一个函数开始调用。