Python使用代理抓取网站图片（多线程）

                f.close()
            except:
                continue
    def run(self):
        self.downloadimg()

if __name__ == "__main__":
    getThreads = []
    checkThreads = []
    imgurlList('http://www.ivsky.com')
    getPicThreads = []

#对每个目标网站开启一个线程负责抓取代理
for i in range(len(targets)):
t = ProxyGet(targets[i])
getThreads.append(t)

for i in range(len(getThreads)):
getThreads[i].start()

for i in range(len(getThreads)):
getThreads[i].join()

print '.'*10+"总共抓取了%s个代理" %len(rawProxyList) +'.'*10

#开启20个线程负责校验，将抓取到的代理分成20份，每个线程校验一份
for i in range(20):
t = ProxyCheck(rawProxyList[((len(rawProxyList)+19)/20) * i:((len(rawProxyList)+19)/20) * (i+1)])
checkThreads.append(t)

for i in range(len(checkThreads)):
checkThreads[i].start()

for i in range(len(checkThreads)):
checkThreads[i].join()

print '.'*10+"总共有%s个代理通过校验" %len(checkedProxyList) +'.'*10

#开启20个线程随机取一个代理下载图片
for i in range(20):
t = getPic(imgurl_list[((len(imgurl_list)+19)/20) * i:((len(imgurl_list)+19)/20) * (i+1)])
getPicThreads.append(t)

for i in range(len(getPicThreads)):
getPicThreads[i].start()

for i in range(len(getPicThreads)):
getPicThreads[i].join()

print '.'*10+"总共有%s个图片下载" %len(imgurl_list) +'.'*10

#代理排序持久化
f= open("proxy_list.txt",'w+')
for proxy in sorted(checkedProxyList,cmp=lambda x,y:cmp(x[3],y[3])):
#print "checked proxy is: %s:%st%st%s" %(proxy[0],proxy[1],proxy[2],proxy[3])
f.write("%s:%st%st%sn"%(proxy[0],proxy[1],proxy[2],proxy[3]))
f.close()

二、测试结果：

# ls
proxy_getpic.py
# python proxy_getpic.py
代理服务器目标网站： http://www.cnproxy.com/proxy1.html
代理服务器目标网站： http://www.cnproxy.com/proxy2.html
代理服务器目标网站： http://www.cnproxy.com/proxy3.html
代理服务器目标网站： http://www.cnproxy.com/proxy4.html
代理服务器目标网站： http://www.cnproxy.com/proxy5.html
代理服务器目标网站： http://www.cnproxy.com/proxy6.html
代理服务器目标网站： http://www.cnproxy.com/proxy7.html
代理服务器目标网站： http://www.cnproxy.com/proxy8.html
..........总共抓取了800个代理..........

4/5 首页上一页 2 3 4 5 下一页尾页

Python使用代理抓取网站图片（多线程）

Python ArcPy实现批量拼接长时间序列栅格图像

Python 中OS module的使用详解

Python Matplotlib基本用法详解

Python range() 函数用法详解

Python分割单词和转换命名法的实现

Python 中OS module的使用详解

使用Pytorch构建第一个神经网络模型附案例实战

Python实现关键路径和七格图计算详解

python3中SQLMap安装教程

kali最新国内更新源sources

Python ArcPy实现批量拼接长时间序列栅格图像

Python 中OS module的使用详解

Python Matplotlib基本用法详解

Python range() 函数用法详解

Python分割单词和转换命名法的实现

Python 中OS module的使用详解

使用Pytorch构建第一个神经网络模型附案例实战

Python实现关键路径和七格图计算详解

python3中SQLMap安装教程

kali最新国内更新源sources

Python使用代理抓取网站图片（多线程）

Python ArcPy实现批量拼接长时间序列栅格图像

Python 中OS module的使用详解

Python Matplotlib基本用法详解

Python range() 函数用法详解

Python分割单词和转换命名法的实现

Python 中OS module的使用详解

使用Pytorch构建第一个神经网络模型 附案例实战

Python实现关键路径和七格图计算详解

python3中SQLMap安装教程

kali最新国内更新源sources

Python ArcPy实现批量拼接长时间序列栅格图像

Python 中OS module的使用详解

Python Matplotlib基本用法详解

Python range() 函数用法详解

Python分割单词和转换命名法的实现

Python 中OS module的使用详解

使用Pytorch构建第一个神经网络模型 附案例实战

Python实现关键路径和七格图计算详解

python3中SQLMap安装教程

kali最新国内更新源sources

使用Pytorch构建第一个神经网络模型附案例实战

使用Pytorch构建第一个神经网络模型附案例实战