关于mysql count(*)优化

在工作中,往往遇到分页查询、统计之类,免不了使用count(*)。可是count(*)该如何优化?

测试数据库4核4g内存:表tb1, id自增,c2字符串uuid,c3值为1,2,3。结果均为5次中的最小时间。

表结构,如下:

mysql> desc tb1;
+-------+-----------------+------+-----+---------+----------------+
| Field | Type            | Null | Key | Default | Extra          |
+-------+-----------------+------+-----+---------+----------------+
| id    | bigint unsigned | NO   | PRI | NULL    | auto_increment |
| c2    | varchar(255)    | YES  |     | NULL    |                |
| c3    | int             | YES  |     | NULL    |                |
+-------+-----------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

表中数据共2086094行

mysql> SELECT count(*) FROM tb1 WHERE 1 = 1;
+----------+
| count(*) |
+----------+
|  2086094 |
+----------+
1 row in set (0.06 sec)

count 测试sql如下:

mysql> SELECT count(*)  FROM tb1  WHERE 1 = 1  AND (c2 LIKE 'a%' or c2 like 'b%' or c2 like 'c%') AND c3 = 3;
+----------+
| count(*) |
+----------+
|   126306 |
+----------+
1 row in set (0.47 sec)

可以看到用时高达0.47s。

第一步,给c2加索引

mysql> alter table tb1 add index idx_tb1_c2(c2);
Query OK, 0 rows affected (22.92 sec)
Records: 0  Duplicates: 0  Warnings: 0

效果如下:

mysql> SELECT count(*)  FROM tb1  WHERE 1 = 1  AND (c2 LIKE 'a%' or c2 like 'b%' or c2 like 'c%') AND c3 = 3;
+----------+
| count(*) |
+----------+
|   126306 |
+----------+
1 row in set (0.48 sec)

可以看到,并没有什么变化,explain一把,如图:

mysql> explain SELECT count(*)  FROM tb1  WHERE 1 = 1  AND (c2 LIKE 'a%' or c2 like 'b%' or c2 like 'c%') AND c3 = 3;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | tb1   | NULL       | ALL  | idx_tb1_c2    | NULL | NULL    | NULL | 2079297 |     3.61 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

结果显示没有用到索引,走的全表扫描,众所周知,当like最左固定的时候,是可以用到索引的

第二步,给c3加索引

mysql> alter table tb1 add index idx_tb1_c3(c3);
Query OK, 0 rows affected (9.75 sec)
Records: 0  Duplicates: 0  Warnings: 0

效果如下:

mysql> SELECT count(*)  FROM tb1  WHERE 1 = 1  AND (c2 LIKE 'a%' or c2 like 'b%' or c2 like 'c%') AND c3 = 3;
+----------+
| count(*) |
+----------+
|   126306 |
+----------+
1 row in set (1.86 sec)

可以看到,查询反而更慢了,explain一把,如图:

mysql> explain SELECT count(*)  FROM tb1  WHERE 1 = 1  AND (c2 LIKE 'a%' or c2 like 'b%' or c2 like 'c%') AND c3 =
 3;
+----+-------------+-------+------------+------+-----------------------+------------+---------+-------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys         | key        | key_len | ref   | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+-----------------------+------------+---------+-------+---------+----------+-------------+
|  1 | SIMPLE      | tb1   | NULL       | ref  | idx_tb1_c2,idx_tb1_c3 | idx_tb1_c3 | 5       | const | 1039648 |    36.06 | Using where |
+----+-------------+-------+------------+------+-----------------------+------------+---------+-------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

结果显示用到了索引 idx_tb1_c3

第三步,建立联合索引c2在前,c3在后

mysql> alter table tb1 add index idx_tb1_c2_c3(c2,c3);
Query OK, 0 rows affected (28.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

效果如下:

mysql> SELECT count(*)  FROM tb1  WHERE 1 = 1  AND (c2 LIKE 'a%' or c2 like 'b%' or c2 like 'c%') AND c3 = 3;
+----------+
| count(*) |
+----------+
|   126306 |
+----------+
1 row in set (10.58 sec)

可以看到,查询效率更慢了,explain一把,如图:

mysql> explain SELECT count(*)  FROM tb1  WHERE 1 = 1  AND (c2 LIKE 'a%' or c2 like 'b%' or c2 like 'c%') AND c3 = 
 3;
+----+-------------+-------+------------+------+-------------------------------------+------------+---------+-------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys                       | key        | key_len | ref   | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+-------------------------------------+------------+---------+-------+---------+----------+-------------+
|  1 | SIMPLE      | tb1   | NULL       | ref  | idx_tb1_c2,idx_tb1_c3,idx_tb1_c2_c3 | idx_tb1_c3 | 5       | const | 1039648 |    36.06 | Using where |
+----+-------------+-------+------------+------+-------------------------------------+------------+---------+-------+---------+----------+-------------+
1 row in set, 1 warning (0.03 sec)

结果显示用到索引idx_tb1_c3,新建的idx_tb1_c2_c3并没生效

第四步,建立联合索引,c3在前,c2在后

mysql> alter table tb1 add index idx_tb1_c3_c2(c3,c2);
Query OK, 0 rows affected (25.46 sec)
Records: 0  Duplicates: 0  Warnings: 0

效果如下:

mysql> SELECT count(*)  FROM tb1  WHERE 1 = 1  AND (c2 LIKE 'a%' or c2 like 'b%' or c2 like 'c%') AND c3 = 3;
+----------+
| count(*) |
+----------+
|   126306 |
+----------+
1 row in set (0.06 sec)

可以看到,查询效率非常的快,仅0.06秒,explain一把,如下:

mysql> explain SELECT count(*)  FROM tb1  WHERE 1 = 1  AND (c2 LIKE 'a%' or c2 like 'b%' or c2 like 'c%') AND c3 = 
 3;
+----+-------------+-------+------------+-------+---------------------------------------------------+---------------+---------+------+--------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys                                     | key           | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------------------------------------------+---------------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE      | tb1   | NULL       | range | idx_tb1_c2,idx_tb1_c3,idx_tb1_c2_c3,idx_tb1_c3_c2 | idx_tb1_c3_c2 | 773     | NULL | 250480 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------------------------------------------+---------------+---------+------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

总结:

第三步,为什么用不到索引,众所周知,当为联合索引时,b+树遵循最左匹配原则,c2叶子节点上会维护一个链表或b+树,存放c3的数据,但c3的数据比较少,所以仅会用到链表,即使用到索引,查询效果也不理想。第四步,为什么会用到索引,同理,c3的数据比较少,所以b+树的高度非常低,然后在c3的叶子节点维护一个b+树,为什么是b+树呢,因为c2的数据非常多。所以当我们在建立联合索引时,尽可能把穷举少的列放在前面。

以上就是全部内容,各位大佬们有好的建议,欢迎在评论区留言。


版权声明:本文为wyanyi原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。