MySQL中distinct语句的基本原理及其与group by的比较

2019-01-04 16:34:51丽君
Query OK, 0 rows affected (0.00 sec)

 

mysql> call p_test(100000)||
Query OK, 1 row affected (5.66 sec)

 

mysql> delimiter ;//改变mysql命令结束符为; mysql> select count(id) from test_test; //数据都进去了
+-----------+ | count(id) | +-----------+ | 100000 | +-----------+ 1 row in set (0.00 sec)

 

mysql> show variables like "%pro%"; //查看一下,记录执行的profiling是不是开启动了,默认是不开启的
+---------------------------+-------+ | Variable_name | Value | +---------------------------+-------+ | profiling | OFF | | profiling_history_size | 15 | | protocol_version | 10 | | slave_compressed_protocol | OFF | +---------------------------+-------+ 4 rows in set (0.00 sec)

 

mysql> set profiling=1; //开启
Query OK, 0 rows affected (0.00 sec)

2,测试

//做了4组测试 mysql> select distinct(num) from test_test; mysql> select num from test_test group by num; mysql> show profiles; //查看结果
+----------+------------+-------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+-------------------------------------------+ | 1 | 0.07298225 | select distinct(num) from test_test | | 2 | 0.07319975 | select num from test_test group by num | | 3 | 0.07313525 | select num from test_test group by num | | 4 | 0.07317725 | select distinct(num) from test_test | | 5 | 0.07275200 | select distinct(num) from test_test | | 6 | 0.07298600 | select num from test_test group by num | | 7 | 0.07500700 | select num from test_test group by num | | 8 | 0.07331325 | select distinct(num) from test_test | | 9 | 0.57831575 | create index num_index on test_test (num) | //在这儿的时候,我加了索引 | 10 | 0.00243550 | select distinct(num) from test_test | | 11 | 0.00121975 | select num from test_test group by num | | 12 | 0.00116550 | select distinct(num) from test_test | | 13 | 0.00107650 | select num from test_test group by num | +----------+------------+-------------------------------------------+ 13 rows in set (0.00 sec)

上面的1-8是4组数据,并且是没有加索引的,从中我们可以看出,distinct比group by 会好一点点
10-13是2组数据,是加了索引以后的,从中我们可以看出,group by 比distinct 会好一点点
一般情况,数据量比较大的表,关联字段都会加索引的,,并且加索引后检索时间只有以前的六分之一左右。