匹配csdn用户数据库与官方用户的重合度并将重叠部分的用户筛选

2019-04-08 18:46:03于丽

根据以上情况总结如下:
1、先导入数据后加索引的效率要比先加索引后导入数据的高一倍
2、InnoDB 在单进程数据插入上的效率要比MYISAM低很多
3、当perpage=50的情况下数据丢失率在1%以下

因为通过浏览器执行会有超时的问题,而且效率地下,故通过命令行方式运行,此过程中遇到一点小麻烦耽搁了不少时间
起初我执行如下代码:
php.exe E:usrwwwimportcsdndb.php
但是一直报错:call to undefined function mysql_connect
折腾发现没有载入php.ini
正确代码为:
php.exe -c E:/usr/local/apache2/php.ini importcsdndb.php

2、导入需要匹配的用户数据数据至本地
命令行进入msyql(不会的自己百度)
然后执行:mysql>source C:/Users/zhudong/Desktop/userdb.sql
3、对比筛选用户
对比程序写好了,切记在命令行下运行:

<?php
$link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle_username = fopen("E:/records_username.txt","a");
//$handle_email = fopen("E:/records_email.txt","a");
$username_num = $email_num = $uid = 0;
while ($uid<2181106) {
$nextuid=$uid+10000;
$query = mysql_query("SELECT * FROM pw_members WHERE uid>'$uid' AND uid<'$nextuid'");
while ($rt = mysql_fetch_array($query,MYSQL_ASSOC)) {
$username = $rt['username'];
$email = $rt['email'];
$query2 = mysql_query("SELECT * FROM scdn_userdb WHERE username='$username' OR email='$email'");
while ($rt2 = mysql_fetch_array($query2,MYSQL_ASSOC)) {
if ($rt['password'] = md5($rt2['password'])) {
if ($rt2['username'] == $username) {
$username_num++;
fwrite($handle_username,'OWN:'.$rt['uid'].'|'.$rt['username'].'|'.$rt['password'].'|'.$rt['email'].' CSDN:'.$rt2['username'].'|'.$rt2['password'].'|'.$rt2['email']."rn");
echo 'username_num='.$username_num."rn";
continue;
}
/*
if ($rt2['email'] == $email) {
$email_num++;
fwrite($handle_email,'OWN:'.$rt['uid'].'|'.$rt['username'].'|'.$rt['password'].'|'.$rt['email'].' CSDN:'.$rt2['username'].'|'.$rt2['password'].'|'.$rt2['email']."rn");
echo 'email_num='.$email_num."rn";
}
*/
}
}
mysql_free_result($query2);
}
$uid = $nextuid;
}
?>

您看到的以上的代码是非常蹩脚的,因为其效率特别低 ,几百万的数据,要跑10多个小时,怎么能忘记连表查询这么基本的东西呢,以下为修正后的方法

$link = mysql_connect('localhost', 'root', 'admin', true);
mysql_select_db('csdn',$link);
$handle_username = fopen("E:/records_username.txt","a");
while($uid<2181106) {//此处的数字为要对比用户库的最大ID
$nextuid= $uid+10000;
$query = mysql_query("SELECT m.uid,m.username,m.password,m.email,u.password as csdn_password,u.email as csdn_email FROM own_members m LEFT JOIN csdn_userdb u USING(username) WHERE m.uid>'$uid' AND m.uid<='$nextuid' AND u.username!=''");
相关文章 大家在看