表分区教程

  首先要知道,表分区对于提高查询性能并不是绝对的,提高查询性能的前提是查询条件使用了分区字段,道理很简单,知道要查询的数据在哪个区就可以直接去该区查,而不需要把所有分区的记录都扫描一遍。

  表分区SQL很简单,示例如下:

-- 表分区SQL示例
-- 按uid字段进行分区,其中1~10为一个分区,11~20为一个分区,以此类推,当uid≥41时,记录保存到“partition_41_to_n”分区。
-- 需要注意的是username字段将会无法建立唯一索引,只能通过程序来防止账号重复注册(可以使用Redis锁在注册时把账号锁定)。
CREATE TABLE `user`
(
  `uid`      int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT '用户UID(自增主键)',
  `username` varchar(32)  NOT NULL DEFAULT '' COMMENT '登录账号',
  `password` varchar(128) NOT NULL DEFAULT '' COMMENT '登录密码',
  PRIMARY KEY (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='用户表'
/*!50100 PARTITION BY RANGE (uid)
(PARTITION partition_1_to_10 VALUES LESS THAN (11) ENGINE = InnoDB,
 PARTITION partition_11_to_20 VALUES LESS THAN (21) ENGINE = InnoDB,
 PARTITION partition_21_to_30 VALUES LESS THAN (31) ENGINE = InnoDB,
 PARTITION partition_31_to_40 VALUES LESS THAN (41) ENGINE = InnoDB,
 PARTITION partition_41_to_n VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;

  表分区和不分区的查询性能对比实验

  先创建测试表以及生成测试数据的存储过程

-- 创建测试表(分区)
CREATE TABLE `part_tab`
(
  `c1` int(11) DEFAULT NULL,
  `c2` varchar(30) DEFAULT NULL,
  `c3` date        DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
/*!50100 PARTITION BY RANGE (year(c3))
(PARTITION p0 VALUES LESS THAN (1995) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (1996) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (1997) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (1998) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (1999) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (2000) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (2001) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (2002) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (2003) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (2004) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN (2010) ENGINE = InnoDB,
 PARTITION p11 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;


-- 创建测试表(不分区)
CREATE TABLE `no_part_tab`
(
  `c1` int(11) DEFAULT NULL,
  `c2` varchar(30) DEFAULT NULL,
  `c3` date        DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;


-- 创建存储过程,用于生成测试数据(800万条记录)
CREATE PROCEDURE `generate_test_data` ()
BEGIN
DECLARE
v INT DEFAULT 0;
WHILE v < 8000000 DO
INSERT INTO `part_tab`
VALUES
(
v,
'表分区和不分区的查询性能对比实验',
ADDDATE(
'1995-01-01',
(RAND(v) * 36520) MOD 3652
)
);
SET v = v + 1;
END
WHILE;
END

-- 调用存储过程(生成800万条记录耗时非常长,请耐心等待)
CALL generate_test_data ();

-- 存储过程只在分区测试表生成测试数据,需要复制到不分区测试表
INSERT INTO `no_part_tab` SELECT * FROM `part_tab`;

  执行SQL对比查询效率

-- 分区,耗时0.545秒
SELECT COUNT(*) FROM `part_tab`    WHERE DATE('1995-01-01') <= `c3` AND `c3` <= DATE('1995-12-31');


-- 不分区,耗时4.909秒
SELECT COUNT(*) FROM `no_part_tab` WHERE DATE('1995-01-01') <= `c3` AND `c3` <= DATE('1995-12-31');


-- 使用EXPLAIN分析两条SQL查看rows列(预估要扫描的记录数)可以发现分区表为79万条,而不分区表为775万条
EXPLAIN SELECT COUNT(*) FROM `part_tab`    WHERE DATE('1995-01-01') <= `c3` AND `c3` <= DATE('1995-12-31');
EXPLAIN SELECT COUNT(*) FROM `no_part_tab` WHERE DATE('1995-01-01') <= `c3` AND `c3` <= DATE('1995-12-31');

  从上面的性能对比实验可以看出,分区和不分区的性能差距是相当大的。

Copyright © 2024 码农人生. All Rights Reserved