snoopy 强大的PHP采集类使用实例代码

2019-04-09 13:13:49于海丽

$accept http 接受类型 (image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*)
$error 哪里报错, 如果有的话
$response_code 从服务器返回的响应代码
$headers 从服务器返回的头信息
$maxlength 最长返回数据长度
$read_timeout 读取操作超时 (requires PHP 4 Beta 4+)
设置为0为没有超时
$timed_out 如果一次读取操作超时了,本属性返回 true (requires PHP 4 Beta 4+)
$maxframes 允许追踪的框架最大数量
$status 抓取的http的状态
$temp_dir 网页服务器能够写入的临时文件目录 (/tmp)
$curl_path cURL binary 的目录, 如果没有cURL binary就设置为 false

以下是demo

include "Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->proxy_host = "//www.jb51.net";
$snoopy->proxy_port = "80";
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)";
$snoopy->referer = "//www.jb51.net";
$snoopy->cookies["SessionID"] = 238472834723489l;
$snoopy->cookies["favoriteColor"] = "RED";
$snoopy->rawheaders["Pragma"] = "no-cache";
$snoopy->maxredirs = 2;
$snoopy->offsiteok = false;
$snoopy->expandlinks = false;
$snoopy->user = "joe";
$snoopy->pass = "bloe";
if($snoopy->fetchtext("//www.jb51.net"))
{
echo "<PRE>".htmlspecialchars($snoopy->results)."</PRE>n";
}
else
echo "error fetching document: ".$snoopy->error."n";

以下是一些代码片段:
1、获取指定url内容

<?
$url = "//www.jb51.net";
include("snoopy.php");
$snoopy = new Snoopy;
$snoopy->fetch($url); //获取所有内容
echo $snoopy->results; //显示结果
//可选以下
$snoopy->fetchtext //获取文本内容(去掉html代码)
$snoopy->fetchlinks //获取链接
$snoopy->fetchform //获取表单
?>

2 表单提交

<?php
$formvars["username"] = "admin";
$formvars["pwd"] = "admin";
$action = "//www.jb51.net";//</a>表单提交地址
$snoopy->submit($action,$formvars);//$formvars为提交的数组
echo $snoopy->results; //获取表单提交后的 返回的结果
//可选以下
$snoopy->submittext; //提交后只返回 去除html的 文本
$snoopy->submitlinks;//提交后只返回 链接
?>

既然已经提交的表单 那就可以做很多事情 接下来我们来伪装ip,伪装浏览器
3 伪装

<?php
$formvars["username"] = "admin";
$formvars["pwd"] = "admin";
$action = "//www.jb51.net";
include "snoopy.php";
$snoopy = new Snoopy;
$snoopy->cookies["PHPSESSID"] = 'fc106b1918bd522cc863f36890e6fff7'; //伪装sessionid
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)"; //伪装浏览器
$snoopy->referer = //www.jb51.net; //伪装来源页地址 http_referer
$snoopy->rawheaders["Pragma"] = "no-cache"; //cache 的http头信息
相关文章 大家在看