在执行查询之前进行数据筛选 (MySQL)

2 年 ago

清, 扬

3 minutes

为什么你想要做那样的事情呢？

查询这个大约有120万行的表时，花了大约30分钟的时间。
在指定的搜索条件列上没有创建索引。
因为只要能够提取出一定数量最近的数据就可以了，所以现在会先使用已经创建索引的列来筛选数据，然后再执行查询，以便能够更快地获取结果。

2. 检验条件

・mysql版本为15.1分发5.5.60-MariaDB
・创建测试数据请参考以下链接（约400万行）
https://hacknote.jp/archives/29823/

MariaDB [testdata]> select * from item limit 10;
+----+----------+--------------------------------+-------+---------------------+
| id | name     | description                    | price | created_at          |
+----+----------+--------------------------------+-------+---------------------+
|  1 | 商品1    | 114dcf1df424f20df52d9c3519f0eb |  2060 | 2014-04-19 09:10:26 |
|  2 | 商品2    | 8e8e0723e233352a433b3bd19056be |  4092 | 2014-05-09 08:00:23 |
|  3 | 商品3    | 2d6757249469b20c21e69029a62037 |  6562 | 2014-05-05 11:08:28 |
|  4 | 商品4    | 86471dd71d806267f90795e0b03ff5 |  9148 | 2014-05-23 04:51:02 |
|  6 | 商品6    | ffd69fd3852808f148c818c8bf4f08 |  6075 | 2014-01-21 17:21:25 |
|  7 | 商品7    | ceb66f2d0c8757af11ec02aac16ca5 |  2736 | 2014-04-01 16:32:49 |
|  8 | 商品8    | 2d8f9d8ffb7668c122c2deeabbb5d4 |  6374 | 2014-03-07 21:41:21 |
|  9 | 商品9    | 6ad922ec4c5ceb931b03adc7ed6a6a |  4347 | 2014-06-09 03:00:25 |
| 13 | 商品13   | b2da4cefd264e6da237865e6432917 |  4686 | 2014-03-20 18:34:54 |
| 14 | 商品14   | 64bfba2c8450aca731b513fd1b8ef8 |  4938 | 2014-05-21 09:55:24 |
+----+----------+--------------------------------+-------+---------------------+
10 rows in set (0.00 sec)

MariaDB [testdata]>

MariaDB [testdata]> select count(*) from item;
+----------+
| count(*) |
+----------+
|  4194304 |
+----------+
1 row in set (14.00 sec)

MariaDB [testdata]>

MariaDB [testdata]> desc item;
+-------------+------------------+------+-----+---------+----------------+
| Field       | Type             | Null | Key | Default | Extra          |
+-------------+------------------+------+-----+---------+----------------+
| id          | int(11)          | NO   | PRI | NULL    | auto_increment |
| name        | varchar(10)      | YES  |     | NULL    |                |
| description | varchar(30)      | YES  |     | NULL    |                |
| price       | int(10) unsigned | YES  |     | NULL    |                |
| created_at  | datetime         | YES  |     | NULL    |                |
+-------------+------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

MariaDB [testdata]>

MariaDB [testdata]> show index from item\G
*************************** 1. row ***************************
        Table: item
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: A
  Cardinality: 3861004
     Sub_part: NULL
       Packed: NULL
         Null:
   Index_type: BTREE
      Comment:
Index_comment:
1 row in set (0.00 sec)

MariaDB [testdata]>

由于id是主键，因此它被创建了索引。

3. 进行全文搜索

提取价格为1000的数据。

MariaDB [testdata]> explain select id,name,price from item where price = 1000;
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|    1 | SIMPLE      | item  | ALL  | NULL          | NULL | NULL    | NULL | 3861004 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.01 sec)

MariaDB [testdata]>

从讲解的结果来看，可以确认这是一个没有利用索引的查询。

MariaDB [testdata]> select id,name,price from item where price = 1000;
+---------+---------------+-------+
| id      | name          | price |
+---------+---------------+-------+
|    5642 | 商品5642      |  1000 |
|   17140 | 商品17140     |  1000 |
~ 省略 ~
| 4584289 | 商品4584289   |  1000 |
| 4587464 | 商品4587464   |  1000 |
+---------+---------------+-------+
448 rows in set (11.90 sec)

MariaDB [testdata]>

结果在11.90秒内返回了。 11.90 le.)

4. 先根据id筛选出10万条数据，然后进行搜索。

与3相似，提取price为1000的数据。

MariaDB [testdata]> explain select id,name,price from ( select id,name,price from item order by id desc limit 100000 ) as IDDESHIBORU where price = 1000;
+------+-------------+------------+-------+---------------+---------+---------+------+--------+-------------+
| id   | select_type | table      | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
+------+-------------+------------+-------+---------------+---------+---------+------+--------+-------------+
|    1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL    | NULL    | NULL | 100000 | Using where |
|    2 | DERIVED     | item       | index | NULL          | PRIMARY | 4       | NULL | 100000 |             |
+------+-------------+------------+-------+---------------+---------+---------+------+--------+-------------+
2 rows in set (0.00 sec)

MariaDB [testdata]>

可以确认行数已经达到了10万行。

MariaDB [testdata]> select id,name,price from ( select id,name,price from item order by id desc limit 100000 ) as IDDESHIBORU where price = 1000;
+---------+---------------+-------+
| id      | name          | price |
+---------+---------------+-------+
| 4587464 | 商品4587464   |  1000 |
| 4584289 | 商品4584289   |  1000 |
| 4582423 | 商品4582423   |  1000 |
| 4574711 | 商品4574711   |  1000 |
| 4573402 | 商品4573402   |  1000 |
| 4569544 | 商品4569544   |  1000 |
| 4568591 | 商品4568591   |  1000 |
| 4564896 | 商品4564896   |  1000 |
| 4541937 | 商品4541937   |  1000 |
| 4541451 | 商品4541451   |  1000 |
| 4540470 | 商品4540470   |  1000 |
| 4531149 | 商品4531149   |  1000 |
| 4530580 | 商品4530580   |  1000 |
| 4513801 | 商品4513801   |  1000 |
| 4512960 | 商品4512960   |  1000 |
| 4499883 | 商品4499883   |  1000 |
+---------+---------------+-------+
16 rows in set (0.12 sec)

MariaDB [testdata]>

0.12秒内得到了结果。

5. 概括

当数据库的数据变得相当庞大时，查询的执行时间就会变长。
在这种情况下，通过使用子查询来缩小数据范围再进行执行可以加快速度。
此外，虽然现在没有使用，但我认为最好能够尽量使索引对查询条件起到作用。