Python爬虫框架Scrapy安装使用步骤

2019-10-06 18:55:45于海丽

<1>输入GPG密钥
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 627220E7
<2>创建/etc/apt/sources.list.d/scrapy.list 文件
echo 'deb http://archive.scrapy.org/ubuntu scrapy main' | sudo tee /etc/apt/sources.list.d/scrapy.list
<3>更新包列表,安装scrapy版本,其中VERSION用实际的版本代替,如scrapy-0.22
sudo apt-get update && sudo apt-get install scrapy-VERSION

3、Scrapy依赖库的安装
ubuntu12.04下scrapy依赖库的安装
ImportError: No module named w3lib.http
pip install w3lib
ImportError: No module named twisted
pip install twisted
ImportError: No module named lxml.html
pip install lxml
解决:error: libxml/xmlversion.h: No such file or directory

apt-get install libxml2-dev libxslt-dev 
apt-get install python-lxml
解决:ImportError: No module named cssselect

pip install cssselect 
ImportError: No module named OpenSSL
pip install pyOpenSSL 

4、定制自己的爬虫开发
切换到文件目录,开启新的工程
scrapy startproject test