python网络编程学习笔记(七)：HTML和XHTML解析(HTMLParser、BeautifulSo

from htmlentitydefs import entitydefs
import HTMLParser
class TitleParser(HTMLParser.HTMLParser):
    def __init__(self):
        self.taglevels=[]
        self.handledtags=['title','body']
        self.processing=None
        HTMLParser.HTMLParser.__init__(self)
    def handle_starttag(self,tag,attrs):
        if tag in self.handledtags:
            self.data=''
            self.processing=tag
        if tag =='a':
            for name,value in attrs:
                if name=='href':
                    print '连接地址：'+value
    def handle_data(self,data):
        if self.processing:
            self.data +=data
    def handle_endtag(self,tag):
        if tag==self.processing:
            print str(tag)+':'+str(tp.gettitle())
            self.processing=None
    def handle_entityref(self,name):
        if entitydefs.has_key(name):
            self.handle_data(entitydefs[name])
        else:
            self.handle_data('&'+name+';')

    def handle_charref(self,name):
        try:
            charnum=int(name)
        except ValueError:
            return
        if charnum<1 or charnum>255:
            return
        self.handle_data(chr(charnum))

def gettitle(self):
return self.data
fd=open('test1.html')
tp=TitleParser()
tp.feed(fd.read())

运行结果为：
title: XHTML 与" HTML 4.01 "标准没有太多的不同
连接地址：http://pypi.python.org/pypi
body:

i love÷ you×

4/11 首页上一页 2 3 4 5 6 7 下一页尾页

python网络编程学习笔记(七)：HTML和XHTML解析(HTMLParser、BeautifulSo

Python ArcPy实现批量拼接长时间序列栅格图像

Python 中OS module的使用详解

Python Matplotlib基本用法详解

Python range() 函数用法详解

Python分割单词和转换命名法的实现

Python 中OS module的使用详解

使用Pytorch构建第一个神经网络模型附案例实战

Python实现关键路径和七格图计算详解

python3中SQLMap安装教程

kali最新国内更新源sources

Python ArcPy实现批量拼接长时间序列栅格图像

Python 中OS module的使用详解

Python Matplotlib基本用法详解

Python range() 函数用法详解

Python分割单词和转换命名法的实现

Python 中OS module的使用详解

使用Pytorch构建第一个神经网络模型附案例实战

Python实现关键路径和七格图计算详解

python3中SQLMap安装教程

kali最新国内更新源sources

python网络编程学习笔记(七)：HTML和XHTML解析(HTMLParser、BeautifulSo

Python ArcPy实现批量拼接长时间序列栅格图像

Python 中OS module的使用详解

Python Matplotlib基本用法详解

Python range() 函数用法详解

Python分割单词和转换命名法的实现

Python 中OS module的使用详解

使用Pytorch构建第一个神经网络模型 附案例实战

Python实现关键路径和七格图计算详解

python3中SQLMap安装教程

kali最新国内更新源sources

Python ArcPy实现批量拼接长时间序列栅格图像

Python 中OS module的使用详解

Python Matplotlib基本用法详解

Python range() 函数用法详解

Python分割单词和转换命名法的实现

Python 中OS module的使用详解

使用Pytorch构建第一个神经网络模型 附案例实战

Python实现关键路径和七格图计算详解

python3中SQLMap安装教程

kali最新国内更新源sources

使用Pytorch构建第一个神经网络模型附案例实战

使用Pytorch构建第一个神经网络模型附案例实战