引言
MySQL 8.0作为MySQL数据库的一个重要版本,在性能、功能和用户体验方面都带来了显著的改进。从2018年发布至今,该版本已经成为了众多企业应用的核心数据库解决方案。本文将深入剖析MySQL 8.0的核心新特性,包括窗口函数、公用表表达式(CTE)、性能模式等关键功能,并结合实际业务场景演示查询优化技巧,帮助DBA和开发人员充分发挥数据库性能潜力。
MySQL 8.0核心新特性概览
1. 窗口函数的引入
MySQL 8.0最大的突破之一就是支持了窗口函数(Window Functions),这使得复杂的分析查询变得更加简洁高效。窗口函数允许在结果集上执行计算,而不需要使用复杂的子查询或连接操作。
-- 示例:使用窗口函数计算每个部门的平均工资,并显示员工工资与部门平均工资的差异
SELECT
employee_id,
department,
salary,
AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
salary - AVG(salary) OVER (PARTITION BY department) as salary_diff
FROM employees
ORDER BY department, salary DESC;
2. 公用表表达式(CTE)的支持
公用表表达式(Common Table Expressions)的引入让复杂的递归查询变得简单明了。CTE允许定义临时结果集,可以在查询中多次引用。
-- 示例:使用CTE计算员工的层级结构
WITH RECURSIVE employee_hierarchy AS (
-- 基础查询:查找所有顶级管理者
SELECT
employee_id,
manager_id,
employee_name,
0 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- 递归查询:查找下属员工
SELECT
e.employee_id,
e.manager_id,
e.employee_name,
eh.level + 1
FROM employees e
INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy ORDER BY level, employee_name;
3. 性能模式(Performance Schema)的增强
MySQL 8.0对性能模式进行了重大改进,提供了更详细的监控和分析能力。新的性能模式增强了对锁等待、文件IO、语句执行等的监控。
窗口函数详解与实战应用
窗口函数的基本语法
窗口函数的基本语法如下:
function_name() OVER (
[PARTITION BY column_list]
[ORDER BY column_list [ASC|DESC]]
[ROWS|RANGE clause]
)
常用窗口函数类型
1. 排名函数
排名函数包括ROW_NUMBER()、RANK()、DENSE_RANK()等:
-- 按销售额排序,显示排名
SELECT
product_name,
sales_amount,
ROW_NUMBER() OVER (ORDER BY sales_amount DESC) as row_num,
RANK() OVER (ORDER BY sales_amount DESC) as rank_num,
DENSE_RANK() OVER (ORDER BY sales_amount DESC) as dense_rank_num
FROM sales_data
ORDER BY sales_amount DESC;
2. 聚合函数
窗口聚合函数可以对窗口内的数据进行计算:
-- 计算每个产品的累计销售总额
SELECT
product_id,
sale_date,
amount,
SUM(amount) OVER (
PARTITION BY product_id
ORDER BY sale_date
ROWS UNBOUNDED PRECEDING
) as cumulative_sales
FROM product_sales
ORDER BY product_id, sale_date;
3. 偏移函数
LAG()和LEAD()函数用于访问当前行前后行的数据:
-- 比较本月与上月的销售额变化
SELECT
month,
sales_amount,
LAG(sales_amount, 1) OVER (ORDER BY month) as prev_month_sales,
sales_amount - LAG(sales_amount, 1) OVER (ORDER BY month) as sales_change
FROM monthly_sales
ORDER BY month;
实际业务场景应用
场景一:电商销售分析
-- 分析商品销售趋势和排名
WITH monthly_product_stats AS (
SELECT
product_id,
YEAR(sale_date) as sale_year,
MONTH(sale_date) as sale_month,
SUM(amount) as monthly_sales,
COUNT(*) as transaction_count
FROM sales
WHERE sale_date >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
GROUP BY product_id, YEAR(sale_date), MONTH(sale_date)
),
product_performance AS (
SELECT
product_id,
sale_year,
sale_month,
monthly_sales,
transaction_count,
ROW_NUMBER() OVER (PARTITION BY sale_year, sale_month ORDER BY monthly_sales DESC) as monthly_rank,
AVG(monthly_sales) OVER (PARTITION BY product_id ORDER BY sale_year, sale_month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as three_month_avg
FROM monthly_product_stats
)
SELECT
p.product_name,
pp.sale_year,
pp.sale_month,
pp.monthly_sales,
pp.monthly_rank,
pp.three_month_avg,
ROUND(
(pp.monthly_sales - LAG(pp.monthly_sales, 1) OVER (PARTITION BY pp.product_id ORDER BY pp.sale_year, pp.sale_month))
/ LAG(pp.monthly_sales, 1) OVER (PARTITION BY pp.product_id ORDER BY pp.sale_year, pp.sale_month) * 100, 2
) as growth_rate
FROM product_performance pp
JOIN products p ON pp.product_id = p.product_id
ORDER BY pp.sale_year DESC, pp.sale_month DESC, pp.monthly_rank;
场景二:用户行为分析
-- 分析用户活跃度和留存率
WITH user_activity AS (
SELECT
user_id,
DATE(activity_date) as activity_day,
COUNT(*) as daily_activities,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY activity_date) as user_day
FROM user_activities
WHERE activity_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
GROUP BY user_id, DATE(activity_date)
),
user_retention AS (
SELECT
ua.user_id,
ua.activity_day,
ua.daily_activities,
ua.user_day,
LAG(ua.user_day, 1) OVER (PARTITION BY ua.user_id ORDER BY ua.activity_day) as prev_day,
CASE
WHEN LAG(ua.user_day, 1) OVER (PARTITION BY ua.user_id ORDER BY ua.activity_day) = ua.user_day - 1
THEN 1 ELSE 0
END as consecutive_day
FROM user_activity ua
)
SELECT
ur.user_id,
ur.activity_day,
ur.daily_activities,
ur.user_day,
ur.consecutive_day,
AVG(ur.daily_activities) OVER (PARTITION BY ur.user_id ORDER BY ur.activity_day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as seven_day_avg,
PERCENT_RANK() OVER (ORDER BY ur.daily_activities DESC) as activity_percentile
FROM user_retention ur
ORDER BY ur.user_id, ur.activity_day;
公用表表达式(CTE)深度解析
CTE的基本概念和语法
公用表表达式是SQL中一种临时的结果集,可以在查询中多次引用。MySQL 8.0支持递归CTE,这使得处理层次结构数据变得更加简单。
-- 非递归CTE示例
WITH department_stats AS (
SELECT
d.department_name,
COUNT(e.employee_id) as employee_count,
AVG(e.salary) as avg_salary
FROM departments d
LEFT JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_id, d.department_name
),
company_summary AS (
SELECT
'Total' as summary_type,
SUM(employee_count) as total_employees,
AVG(avg_salary) as company_avg_salary
FROM department_stats
)
SELECT
ds.department_name,
ds.employee_count,
ROUND(ds.avg_salary, 2) as avg_salary,
cs.total_employees,
ROUND(cs.company_avg_salary, 2) as company_avg_salary
FROM department_stats ds
CROSS JOIN company_summary cs
ORDER BY ds.employee_count DESC;
递归CTE的实际应用
组织架构查询
-- 查询员工及其所有下属(包括多层结构)
WITH RECURSIVE employee_tree AS (
-- 基础查询:顶级管理者
SELECT
employee_id,
manager_id,
employee_name,
0 as level,
CAST(employee_id AS CHAR(1000)) as path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- 递归查询:下级员工
SELECT
e.employee_id,
e.manager_id,
e.employee_name,
et.level + 1,
CONCAT(et.path, '->', e.employee_id)
FROM employees e
INNER JOIN employee_tree et ON e.manager_id = et.employee_id
)
SELECT
employee_id,
manager_id,
employee_name,
level,
path
FROM employee_tree
ORDER BY level, employee_name;
产品分类层次查询
-- 查询产品及其所有子分类
WITH RECURSIVE category_hierarchy AS (
-- 基础查询:顶级分类
SELECT
category_id,
parent_category_id,
category_name,
0 as level,
CAST(category_id AS CHAR(1000)) as path
FROM categories
WHERE parent_category_id IS NULL
UNION ALL
-- 递归查询:子分类
SELECT
c.category_id,
c.parent_category_id,
c.category_name,
ch.level + 1,
CONCAT(ch.path, '->', c.category_id)
FROM categories c
INNER JOIN category_hierarchy ch ON c.parent_category_id = ch.category_id
)
SELECT
ch.category_id,
ch.parent_category_id,
ch.category_name,
ch.level,
ch.path,
COUNT(p.product_id) as product_count
FROM category_hierarchy ch
LEFT JOIN products p ON ch.category_id = p.category_id
GROUP BY ch.category_id, ch.parent_category_id, ch.category_name, ch.level, ch.path
ORDER BY ch.level, ch.category_name;
CTE性能优化技巧
1. 合理使用WITH子句
-- 优化前:重复计算
SELECT
e.employee_id,
e.employee_name,
(SELECT COUNT(*) FROM departments d WHERE d.department_id = e.department_id) as dept_count,
(SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) as dept_avg_salary
FROM employees e;
-- 优化后:使用CTE避免重复计算
WITH dept_stats AS (
SELECT
department_id,
COUNT(*) as employee_count,
AVG(salary) as avg_salary
FROM employees
GROUP BY department_id
)
SELECT
e.employee_id,
e.employee_name,
ds.employee_count as dept_count,
ROUND(ds.avg_salary, 2) as dept_avg_salary
FROM employees e
JOIN dept_stats ds ON e.department_id = ds.department_id;
2. 递归CTE的性能考虑
-- 优化递归CTE:添加深度限制和条件过滤
WITH RECURSIVE employee_tree AS (
SELECT
employee_id,
manager_id,
employee_name,
0 as level,
CAST(employee_id AS CHAR(1000)) as path
FROM employees
WHERE manager_id IS NULL AND active = 1
UNION ALL
SELECT
e.employee_id,
e.manager_id,
e.employee_name,
et.level + 1,
CONCAT(et.path, '->', e.employee_id)
FROM employees e
INNER JOIN employee_tree et ON e.manager_id = et.employee_id
WHERE et.level < 10 AND e.active = 1 -- 限制递归深度和状态
)
SELECT
employee_id,
manager_id,
employee_name,
level
FROM employee_tree
ORDER BY level, employee_name;
性能模式(Performance Schema)详解
Performance Schema核心功能
Performance Schema是MySQL 8.0中最重要的性能监控工具,它提供了实时的数据库活动监控能力。
1. 系统表结构分析
-- 查看Performance Schema中的关键表
SELECT
TABLE_NAME,
TABLE_TYPE,
ENGINE
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'performance_schema'
AND TABLE_NAME LIKE '%events%'
ORDER BY TABLE_NAME;
-- 查看当前连接状态
SELECT
CON_ID,
USER,
HOST,
DB,
COMMAND,
TIME,
STATE,
INFO
FROM performance_schema.events_waits_current
WHERE EVENT_NAME LIKE 'wait/io/file/%'
ORDER BY TIME DESC;
2. 锁等待监控
-- 监控锁等待情况
SELECT
r.REQUESTING_ENGINE_LOCK_ID,
r.REQUESTED_ENGINE_LOCK_TYPE,
r.REQUESTING_THREAD_ID,
t.PROCESSLIST_ID,
t.PROCESSLIST_USER,
t.PROCESSLIST_HOST,
t.PROCESSLIST_DB,
t.PROCESSLIST_COMMAND,
t.PROCESSLIST_TIME,
t.PROCESSLIST_STATE,
t.PROCESSLIST_INFO
FROM performance_schema.data_lock_waits r
JOIN performance_schema.threads t ON r.REQUESTING_THREAD_ID = t.THREAD_ID
ORDER BY t.PROCESSLIST_TIME DESC;
3. SQL语句执行监控
-- 查看慢查询和高消耗的SQL语句
SELECT
DIGEST_TEXT,
COUNT_STAR,
AVG_TIMER_WAIT/1000000000000 as avg_time_ms,
SUM_TIMER_WAIT/1000000000000 as total_time_ms,
MAX_TIMER_WAIT/1000000000000 as max_time_ms,
AVG_ROWS_EXAMINED,
AVG_ROWS_SENT
FROM performance_schema.events_statements_summary_by_digest
WHERE COUNT_STAR > 100
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 20;
实际性能调优案例
案例一:解决查询性能瓶颈
-- 分析慢查询的具体执行计划
SELECT
DIGEST_TEXT,
SCHEMA_NAME,
COUNT_STAR,
AVG_TIMER_WAIT/1000000000000 as avg_time_ms,
SUM_TIMER_WAIT/1000000000000 as total_time_ms,
FIRST_SEEN,
LAST_SEEN
FROM performance_schema.events_statements_summary_by_digest
WHERE DIGEST_TEXT LIKE '%SELECT%FROM orders%WHERE%'
ORDER BY AVG_TIMER_WAIT DESC;
-- 优化前的查询
EXPLAIN SELECT
o.order_id,
o.customer_id,
o.order_date,
SUM(oi.quantity * oi.unit_price) as total_amount
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_date >= '2023-01-01'
GROUP BY o.order_id, o.customer_id, o.order_date;
-- 优化后的查询:添加合适的索引
CREATE INDEX idx_orders_date_customer ON orders(order_date, customer_id);
-- 优化后的查询执行计划
EXPLAIN SELECT
o.order_id,
o.customer_id,
o.order_date,
SUM(oi.quantity * oi.unit_price) as total_amount
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_date >= '2023-01-01'
GROUP BY o.order_id, o.customer_id, o.order_date;
案例二:监控并发连接问题
-- 监控连接和线程状态
SELECT
THREAD_ID,
PROCESSLIST_ID,
PROCESSLIST_USER,
PROCESSLIST_HOST,
PROCESSLIST_DB,
PROCESSLIST_COMMAND,
PROCESSLIST_TIME,
PROCESSLIST_STATE,
PROCESSLIST_INFO
FROM performance_schema.threads
WHERE PROCESSLIST_COMMAND != 'Sleep'
AND PROCESSLIST_TIME > 30
ORDER BY PROCESSLIST_TIME DESC;
-- 分析等待事件
SELECT
EVENT_NAME,
COUNT_STAR,
SUM_TIMER_WAIT/1000000000000 as total_time_ms,
AVG_TIMER_WAIT/1000000000000 as avg_time_ms
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE EVENT_NAME LIKE 'wait/io/file/%'
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
查询优化实战技巧
1. 索引优化策略
复合索引设计
-- 创建复合索引优化查询性能
-- 原始查询
SELECT * FROM orders
WHERE customer_id = 123 AND order_date >= '2023-01-01';
-- 优化后的索引
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
-- 更进一步的复合索引
CREATE INDEX idx_orders_customer_date_status ON orders(customer_id, order_date, status);
覆盖索引的应用
-- 创建覆盖索引,避免回表查询
-- 优化前:需要回表获取数据
SELECT customer_id, order_date, total_amount
FROM orders
WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31';
-- 优化后:创建覆盖索引
CREATE INDEX idx_orders_cover ON orders(order_date, customer_id, total_amount);
-- 使用覆盖索引的查询
SELECT customer_id, order_date, total_amount
FROM orders
WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31';
2. 查询重写优化
子查询优化
-- 优化前:使用子查询
SELECT * FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id
);
-- 优化后:使用窗口函数
SELECT employee_id, employee_name, salary, dept_avg_salary
FROM (
SELECT
employee_id,
employee_name,
salary,
AVG(salary) OVER (PARTITION BY department_id) as dept_avg_salary
FROM employees
) emp_with_avg
WHERE salary > dept_avg_salary;
连接优化
-- 优化前:使用NOT EXISTS
SELECT * FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
);
-- 优化后:使用LEFT JOIN
SELECT c.*
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.customer_id IS NULL;
3. 分区表优化
-- 创建分区表优化大表查询
CREATE TABLE sales_data (
sale_id BIGINT PRIMARY KEY,
sale_date DATE NOT NULL,
product_id INT,
quantity INT,
amount DECIMAL(10,2),
customer_id INT
)
PARTITION BY RANGE (YEAR(sale_date)) (
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022),
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- 分区表查询优化
SELECT SUM(amount) as total_sales, COUNT(*) as transaction_count
FROM sales_data
WHERE sale_date >= '2023-01-01' AND sale_date < '2023-12-31'
AND product_id IN (1001, 1002, 1003);
性能监控和调优工具
1. Performance Schema监控脚本
-- 创建性能监控视图
CREATE OR REPLACE VIEW performance_monitor AS
SELECT
CONCAT(t.PROCESSLIST_USER, '@', t.PROCESSLIST_HOST) as user_host,
t.PROCESSLIST_ID,
t.PROCESSLIST_COMMAND,
t.PROCESSLIST_TIME,
t.PROCESSLIST_STATE,
LEFT(t.PROCESSLIST_INFO, 200) as sql_text,
es.COUNT_STAR as statement_count,
es.AVG_TIMER_WAIT/1000000000000 as avg_time_ms,
es.SUM_TIMER_WAIT/1000000000000 as total_time_ms
FROM performance_schema.threads t
LEFT JOIN performance_schema.events_statements_summary_by_thread_by_event_name es
ON t.THREAD_ID = es.THREAD_ID
WHERE t.PROCESSLIST_COMMAND != 'Sleep'
AND t.PROCESSLIST_TIME > 10
ORDER BY es.SUM_TIMER_WAIT DESC;
-- 查询当前活跃的慢查询
SELECT * FROM performance_monitor
WHERE total_time_ms > 1000;
2. 自定义监控脚本
-- 监控锁等待情况的脚本
DELIMITER //
CREATE PROCEDURE monitor_lock_waits()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE wait_time BIGINT;
DECLARE thread_id BIGINT;
DECLARE user_host VARCHAR(512);
DECLARE cur CURSOR FOR
SELECT
r.REQUESTING_THREAD_ID,
t.PROCESSLIST_USER,
t.PROCESSLIST_HOST,
t.PROCESSLIST_TIME
FROM performance_schema.data_lock_waits r
JOIN performance_schema.threads t ON r.REQUESTING_THREAD_ID = t.THREAD_ID
WHERE t.PROCESSLIST_TIME > 60;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur;
read_loop: LOOP
FETCH cur INTO thread_id, user_host, wait_time;
IF done THEN
LEAVE read_loop;
END IF;
-- 记录锁等待信息到监控表
INSERT INTO lock_wait_monitor (thread_id, user_host, wait_time, created_at)
VALUES (thread_id, CONCAT(user_host, ':', wait_time), NOW());
END LOOP;
CLOSE cur;
END//
DELIMITER ;
最佳实践总结
1. 窗口函数使用最佳实践
- 合理选择窗口函数类型:根据业务需求选择适当的排名函数
- 注意分区和排序的组合:正确使用PARTITION BY和ORDER BY子句
- 避免重复计算:利用CTE或临时表减少重复的窗口函数计算
2. CTE优化技巧
- 控制递归深度:设置合理的递归层数限制
- 添加过滤条件:在递归查询中添加必要的WHERE条件
- 合理使用WITH子句:避免过度使用CTE导致查询复杂化
3. 性能监控建议
- 定期检查Performance Schema:建立日常性能监控机制
- 关注慢查询:及时发现和优化执行时间长的SQL语句
- 监控资源使用:关注CPU、内存、磁盘IO等系统资源使用情况
结论
MySQL 8.0通过引入窗口函数、CTE、增强的Performance Schema等功能,为数据库性能优化提供了更强大的工具。这些新特性不仅简化了复杂查询的编写,还大大提升了数据库的监控和调优能力。
在实际应用中,建议:
- 充分利用窗口函数处理分析型查询
- 合理使用CTE简化递归查询逻辑
- 建立完善的Performance Schema监控体系
- 结合业务场景选择合适的优化策略
通过深入理解和有效运用这些新特性,DBA和开发人员能够显著提升数据库性能,为企业应用提供更稳定、高效的数据服务。随着MySQL 8.0的持续发展,我们期待看到更多创新特性的出现,进一步推动数据库技术的发展和应用。
本文详细介绍了MySQL 8.0的核心新特性及其在实际业务中的应用方法,涵盖了窗口函数、CTE、Performance Schema等关键技术,并提供了丰富的实战案例和优化技巧。建议读者结合具体业务场景,灵活运用这些技术来提升数据库性能。

评论 (0)