数据分布不均衡导致性能问题

2023-10-10 18:48

本文主要是介绍数据分布不均衡导致性能问题,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

今晚(2016/04/14)数据库版本11.2.0.4 遇到一个奇葩案例,虽然之前也遇到过非常多奇葩案例,
但是限于当时条件,无法收集案例,谁叫他奶奶的银行,证券,电信不允许泄密啊。还好今晚这个案例可以拿出来分享。故事是这样的,下面这个SQL要跑几十分钟select count(distinct a.user_name), count(distinct a.invest_id)from base_data_login_info@agent awhere a.str_day <= '20160304'and a.str_day >= '20160301'and a.channel_id in (select channel_rlatfrom tb_user_channel a, tb_channel_info bwhere a.channel_id = b.channel_idand a.user_id = 5002)and a.platform = a.platform;Plan hash value: 2367445948-------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                      |     1 |   130 |   754   (2)| 00:00:10 |        |      |
|   1 |  SORT GROUP BY       |                      |     1 |   130 |            |          |        |      |
|*  2 |   HASH JOIN          |                      |  4067K|   504M|   754   (2)| 00:00:10 |        |      |
|*  3 |    HASH JOIN         |                      | 11535 |   360K|   258   (1)| 00:00:04 |        |      |
|*  4 |     TABLE ACCESS FULL| TB_USER_CHANNEL      | 11535 |   157K|    19   (0)| 00:00:01 |        |      |
|   5 |     TABLE ACCESS FULL| TB_CHANNEL_INFO      | 11767 |   206K|   238   (0)| 00:00:03 |        |      |
|   6 |    REMOTE            | BASE_DATA_LOGIN_INFO |   190K|    17M|   486   (1)| 00:00:06 |  AGENT | R->S |
-------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------2 - access("A"."CHANNEL_ID"="CHANNEL_RLAT")3 - access("A"."CHANNEL_ID"="B"."CHANNEL_ID")4 - filter("A"."USER_ID"=5002)Remote SQL Information (identified by operation id):
----------------------------------------------------6 - SELECT "USER_NAME","INVEST_ID","STR_DAY","CHANNEL_ID","PLATFORM" FROM "BASE_DATA_LOGIN_INFO" "A" WHERE "STR_DAY"<='20160304' AND "STR_DAY">='20160301' AND "PLATFORM" IS NOT NULL (accessing 'AGENT' ) 我瞄了一眼执行计划,初步一看执行计划正常啊。然后赶紧问问dblink的表有多大, in 里面 a, b 分别有多大
tb_user_channel  1W
tb_channel_info  1W
base_data_login_info 19W 过滤剩下4w这些表都不大,最大一个才19w行,怎么也不可能跑几十分钟啊。然后我开始怀疑是不是dblink的表产生了性能问题。
为了排除dblink的表产生性能问题,我让哥们在本地创建一个一模一样的表,结果还是慢,速度根本没有一丁点改变。大爷的,老虎不发威,当我病猫啊。之前都是瞄一眼搞定一个SQL优化。大爷的这次栽了。
以我优化了几万个SQL的功力,恩这个鸟SQL我得花1分钟搞定它。于是我让哥们跑下面这个SQLselect count(*)from base_data_login_info@agent awhere a.str_day <= '20160304'and a.str_day >= '20160301'and a.channel_id in (select channel_rlatfrom tb_user_channel a, tb_channel_info bwhere a.channel_id = b.channel_idand a.user_id = 5002)and a.platform = a.platform;秒杀,没看错,是秒杀 大爷的 奇怪了,这SQL居然秒杀了。 然后我再让哥们跑下面这个SQL select count(a.user_name)from base_data_login_info@agent awhere a.str_day <= '20160304'and a.str_day >= '20160301'and a.channel_id in (select channel_rlatfrom tb_user_channel a, tb_channel_info bwhere a.channel_id = b.channel_idand a.user_id = 5002)and a.platform = a.platform;秒杀,于是再让哥们跑下面SQLselect count(a.user_name), count(a.invest_id)from base_data_login_info@agent awhere a.str_day <= '20160304'and a.str_day >= '20160301'and a.channel_id in (select channel_rlatfrom tb_user_channel a, tb_channel_info bwhere a.channel_id = b.channel_idand a.user_id = 5002)and a.platform = a.platform;秒杀,你大爷的,再跑一下下面这个SQLselect count(distinct a.user_name), count(a.invest_id)from base_data_login_info@agent awhere a.str_day <= '20160304'and a.str_day >= '20160301'and a.channel_id in (select channel_rlatfrom tb_user_channel a, tb_channel_info bwhere a.channel_id = b.channel_idand a.user_id = 5002)and a.platform = a.platform;又秒杀了,卧槽,我感觉女神就在我面前了,我再加一个distinct看看还能不能秒杀select count(distinct a.user_name), count(distinct a.invest_id)from base_data_login_info@agent awhere a.str_day <= '20160304'and a.str_day >= '20160301'and a.channel_id in (select channel_rlatfrom tb_user_channel a, tb_channel_info bwhere a.channel_id = b.channel_idand a.user_id = 5002)and a.platform = a.platform;这次死了,SQL跑不动了,太他妈奇葩了,看文章的兄弟们,你们觉得是不是很奇葩。说了这么多,遇到这种奇葩的问题怎么解决呢?首先要解决问题啊,不能让这个SQL跑得慢,搞不定问题,那哥也不用混了,道森也不用开了,倒闭得了。其次嘛再找出根本问题,防止下一次遇到同类问题,顺便也让网友看看我写的案例,各位网友就当黄色小说看看得了。先来解决这个问题,给了兄弟下面这个SQLwith t1 as 
(select /*+ materialize */a.user_name, a.invest_idfrom base_data_login_info@agent awhere a.str_day <= '20160304' and a.str_day >= '20160301'and a.channel_id in (select channel_rlat from tb_user_channel a, tb_channel_info b where a.channel_id = b.channel_id and a.user_id = 5002)and a.platform = a.platform)
select count(distinct user_name) ,count(distinct invest_id) from t1;Plan hash value: 901326807-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                     | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                          |     1 |    54 |  1621   (1)| 00:00:20 |        |      |
|   1 |  TEMP TABLE TRANSFORMATION |                          |       |       |            |          |        |      |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6720_EB8EA |       |       |            |          |        |      |
|*  3 |    HASH JOIN RIGHT SEMI    |                          |   190K|    22M|   744   (1)| 00:00:09 |        |      |
|   4 |     VIEW                   | VW_NSO_1                 | 11535 |   304K|   258   (1)| 00:00:04 |        |      |
|*  5 |      HASH JOIN             |                          | 11535 |   360K|   258   (1)| 00:00:04 |        |      |
|*  6 |       TABLE ACCESS FULL    | TB_USER_CHANNEL          | 11535 |   157K|    19   (0)| 00:00:01 |        |      |
|   7 |       TABLE ACCESS FULL    | TB_CHANNEL_INFO          | 11767 |   206K|   238   (0)| 00:00:03 |        |      |
|   8 |     REMOTE                 | BASE_DATA_LOGIN_INFO     |   190K|    17M|   486   (1)| 00:00:06 |  AGENT | R->S |
|   9 |   SORT GROUP BY            |                          |     1 |    54 |            |          |        |      |
|  10 |    VIEW                    |                          |   190K|     9M|   878   (1)| 00:00:11 |        |      |
|  11 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6720_EB8EA |   190K|     9M|   878   (1)| 00:00:11 |        |      |
-----------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------3 - access("A"."CHANNEL_ID"="CHANNEL_RLAT")5 - access("A"."CHANNEL_ID"="B"."CHANNEL_ID")6 - filter("A"."USER_ID"=5002)Remote SQL Information (identified by operation id):
----------------------------------------------------8 - SELECT "USER_NAME","INVEST_ID","STR_DAY","CHANNEL_ID","PLATFORM" FROM "BASE_DATA_LOGIN_INFO" "A" WHERE "STR_DAY"<='20160304' AND "STR_DAY">='20160301' AND "PLATFORM" IS NOT NULL (accessing 'AGENT' )SQL秒杀了。 with as /*+ materialize */  这个绝招 道森的人都知道。不信你看我博客去啊(百度 csdn 落落的专栏)。我估计过不了多久整个 数据库圈的人全都知道了。光解决问题,那不行啊,必须找出问题根本原因啊,这样才好装逼装大神装大师嘛。首先从执行计划上分析跑得快的SQL以及执行计划 select count(a.user_name), count(distinct a.invest_id)from base_data_login_info@agent awhere a.str_day <= '20160304'and a.str_day >= '20160301'and a.channel_id in (select channel_rlatfrom tb_user_channel a, tb_channel_info bwhere a.channel_id = b.channel_idand a.user_id = 5002)and a.platform = a.platformPlan hash value: 4282421321------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                 | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Inst   |IN-OUT|
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                      |     1 |    40 |       |  2982   (1)| 00:00:36 |        |      |
|   1 |  SORT AGGREGATE         |                      |     1 |    40 |       |            |          |        |      |
|   2 |   VIEW                  | VW_DAG_0             | 41456 |  1619K|       |  2982   (1)| 00:00:36 |        |      |
|   3 |    HASH GROUP BY        |                      | 41456 |  4250K|    20M|  2982   (1)| 00:00:36 |        |      |
|*  4 |     HASH JOIN RIGHT SEMI|                      |   190K|    19M|       |   744   (1)| 00:00:09 |        |      |
|   5 |      VIEW               | VW_NSO_1             | 11535 | 80745 |       |   258   (1)| 00:00:04 |        |      |
|*  6 |       HASH JOIN         |                      | 11535 |   360K|       |   258   (1)| 00:00:04 |        |      |
|*  7 |        TABLE ACCESS FULL| TB_USER_CHANNEL      | 11535 |   157K|       |    19   (0)| 00:00:01 |        |      |
|   8 |        TABLE ACCESS FULL| TB_CHANNEL_INFO      | 11767 |   206K|       |   238   (0)| 00:00:03 |        |      |
|   9 |      REMOTE             | BASE_DATA_LOGIN_INFO |   190K|    17M|       |   486   (1)| 00:00:06 |  AGENT | R->S |
------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------4 - access("A"."CHANNEL_ID"="CHANNEL_RLAT")6 - access("A"."CHANNEL_ID"="B"."CHANNEL_ID")7 - filter("A"."USER_ID"=5002)Remote SQL Information (identified by operation id):
----------------------------------------------------9 - SELECT "USER_NAME","INVEST_ID","STR_DAY","CHANNEL_ID","PLATFORM" FROM "BASE_DATA_LOGIN_INFO" "A" WHERE "STR_DAY"<='20160304' AND "STR_DAY">='20160301' AND "PLATFORM" IS NOT NULL (accessing 'AGENT' )跑得慢的SQL以及执行计划select count(distinct a.user_name), count(distinct a.invest_id)from base_data_login_info@agent awhere a.str_day <= '20160304'and a.str_day >= '20160301'and a.channel_id in (select channel_rlatfrom tb_user_channel a, tb_channel_info bwhere a.channel_id = b.channel_idand a.user_id = 5002)and a.platform = a.platformPlan hash value: 2367445948-------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                      |     1 |   130 |   754   (2)| 00:00:10 |        |      |
|   1 |  SORT GROUP BY       |                      |     1 |   130 |            |          |        |      |
|*  2 |   HASH JOIN          |                      |  4067K|   504M|   754   (2)| 00:00:10 |        |      |
|*  3 |    HASH JOIN         |                      | 11535 |   360K|   258   (1)| 00:00:04 |        |      |
|*  4 |     TABLE ACCESS FULL| TB_USER_CHANNEL      | 11535 |   157K|    19   (0)| 00:00:01 |        |      |
|   5 |     TABLE ACCESS FULL| TB_CHANNEL_INFO      | 11767 |   206K|   238   (0)| 00:00:03 |        |      |
|   6 |    REMOTE            | BASE_DATA_LOGIN_INFO |   190K|    17M|   486   (1)| 00:00:06 |  AGENT | R->S |
-------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------2 - access("A"."CHANNEL_ID"="CHANNEL_RLAT")3 - access("A"."CHANNEL_ID"="B"."CHANNEL_ID")4 - filter("A"."USER_ID"=5002)Remote SQL Information (identified by operation id):
----------------------------------------------------6 - SELECT "USER_NAME","INVEST_ID","STR_DAY","CHANNEL_ID","PLATFORM" FROM "BASE_DATA_LOGIN_INFO" "A" WHERE "STR_DAY"<='20160304' AND "STR_DAY">='20160301' AND "PLATFORM" IS NOT NULL (accessing 'AGENT' )       如果没有优化过几千几万个SQL,哪里能练出火眼金睛,注意看跑得慢的SQL是HASH JOIN,跑得快的SQL是 HASH JOIN RIGHT SEMI也就是说跑得慢的SQL是 HASH JOIN(inner join),跑得快的 SQL 是 HASH SEMI JOIN (semi join) 说人话就是跑得慢的SQL变成内连接了,跑得快的SQL是半连接(in/exists)。明明SQL是半连接啊,咋变成内连接了呢,这涉及到优化器内部原理和大学课程里面的关系代数了这里就不装逼了,免得到时候一个个看不懂来问我烦死了。问题又来了,就几万跟十几万的进行HASH JOIN 应该很快啊,如果跑的慢那只有一个解释,2个表的关联列数据分布都非常不均衡 19W 表连接列SQL> select channel_id,count(*) from base_data_login_info group by channel_id order by 2;CHANNEL_ID               COUNT(*)
-------------------------------------------------- ----------
011a1                 2
003a1                 3
021a1                 3
006a1                12
024h2                16
013a1                19
007a1                24
012a1                25
005a1                27
EPT01                36
028h2               109
008a1               139
029a1               841
009a1               921
014a1              1583
000a1              1975
a0001              2724
004a1              5482
001a1             16329
026h2             160162in里面的关联列数据分布select channel_rlat, count(*)from tb_user_channel a, tb_channel_info bwhere a.channel_id = b.channel_idand a.user_id = 5002group by channel_rlatorder by 2 descchannel_rlat  count(*)
026h2         10984
024h2         7
002h2         6
023a2         2
007s001022001 1
007s001022002 1
007s001024007 1
007s001024009 1
007s001022009 1
001s001006    1
001s001008    1
001s001001001 1
001s001001003 1
001s001001007 1
001s001001014 1
007s001018003 1
007s001018007 1
007s001019005 1
007s001019008 1
001s001002011 1
007s001011003 1
007s001034    1
007s001023005 1果然,不出本大仙所料,这尼玛走内连接的 HASH JOIN 不死人才怪 
026h2             160162 与 026h2         10984 进行关联完全就是一个笛卡尔积10046 trace 文件已经 告诉了答案 HASH JOIN 返回 410996039  ,这尼玛就是一个小型笛卡尔积了  Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------1          1          1  SORT GROUP BY (cr=3643 pr=0 pw=0 time=1236559678 us)410996039  410996039  410996039   HASH JOIN  (cr=3643 pr=0 pw=0 time=406365130 us cost=1006 size=66968010 card=458685)11535      11535      11535    HASH JOIN  (cr=945 pr=0 pw=0 time=199182 us cost=258 size=369120 card=11535)11535      11535      11535     TABLE ACCESS FULL TB_USER_CHANNEL (cr=67 pr=0 pw=0 time=21452 us cost=19 size=161490 card=11535)11771      11771      11771     TABLE ACCESS FULL TB_CHANNEL_INFO (cr=878 pr=0 pw=0 time=30291 us cost=238 size=211806 card=11767)45122      45122      45122    TABLE ACCESS FULL BASE_DATA_LOGIN_INFO (cr=2698 pr=0 pw=0 time=218144 us cost=747 size=2447922 card=21473)看不懂的人可以做个实验create table a as select * from dba_objects;create table b as select * from dba_objects;然后你去跑下面的SQL,慢慢等结果把select count(distinct owner), count(distinct object_name)from awhere owner in (select owner from b);然而你跑下面这些SQL都可以秒杀select count(owner), count(distinct object_name)from awhere owner in (select owner from b);select count(distinct owner), count(distinct object_name)from awhere object_id in (select object_id from b);那么怎么对跑得慢的SQL进行等价改写呢?select count(distinct owner), count(distinct object_name)from awhere owner in (select owner from b);答案如下:select count(distinct owner), count(distinct object_name)from (select owner, object_namefrom awhere owner in (select owner from b)and rownum > 0);
思考为啥11g CBO会 改写为 inner join 呢? 
select xxx from 1的表  where owner in (select owner from n 的表) 改写为 inner join 前面不需要加 distinct
select xxx from n的表  where owner in (select owner from 1的表) 改写为 inner join 前面要加 distinct 
我们的SQL 是 select count(distinct ),count(distinct) 
所以 CBO 直接改写为 select count(distinct a.owner),count(distinct object_name) from a,b where a.owner=b.owner;
这样就引起了 小笛卡尔积,所以就慢了 那么这个问题在 12c里面被纠正了,有兴趣自己玩个12c 试一试
不管优化器多聪明,始终没有人聪明 看不懂这篇文章的人努力吧

 

这篇关于数据分布不均衡导致性能问题的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/182356

相关文章

Vue3 的 shallowRef 和 shallowReactive:优化性能

大家对 Vue3 的 ref 和 reactive 都很熟悉,那么对 shallowRef 和 shallowReactive 是否了解呢? 在编程和数据结构中,“shallow”(浅层)通常指对数据结构的最外层进行操作,而不递归地处理其内部或嵌套的数据。这种处理方式关注的是数据结构的第一层属性或元素,而忽略更深层次的嵌套内容。 1. 浅层与深层的对比 1.1 浅层(Shallow) 定义

性能测试介绍

性能测试是一种测试方法,旨在评估系统、应用程序或组件在现实场景中的性能表现和可靠性。它通常用于衡量系统在不同负载条件下的响应时间、吞吐量、资源利用率、稳定性和可扩展性等关键指标。 为什么要进行性能测试 通过性能测试,可以确定系统是否能够满足预期的性能要求,找出性能瓶颈和潜在的问题,并进行优化和调整。 发现性能瓶颈:性能测试可以帮助发现系统的性能瓶颈,即系统在高负载或高并发情况下可能出现的问题

Hadoop集群数据均衡之磁盘间数据均衡

生产环境,由于硬盘空间不足,往往需要增加一块硬盘。刚加载的硬盘没有数据时,可以执行磁盘数据均衡命令。(Hadoop3.x新特性) plan后面带的节点的名字必须是已经存在的,并且是需要均衡的节点。 如果节点不存在,会报如下错误: 如果节点只有一个硬盘的话,不会创建均衡计划: (1)生成均衡计划 hdfs diskbalancer -plan hadoop102 (2)执行均衡计划 hd

好题——hdu2522(小数问题:求1/n的第一个循环节)

好喜欢这题,第一次做小数问题,一开始真心没思路,然后参考了网上的一些资料。 知识点***********************************无限不循环小数即无理数,不能写作两整数之比*****************************(一开始没想到,小学没学好) 此题1/n肯定是一个有限循环小数,了解这些后就能做此题了。 按照除法的机制,用一个函数表示出来就可以了,代码如下

hdu1043(八数码问题,广搜 + hash(实现状态压缩) )

利用康拓展开将一个排列映射成一个自然数,然后就变成了普通的广搜题。 #include<iostream>#include<algorithm>#include<string>#include<stack>#include<queue>#include<map>#include<stdio.h>#include<stdlib.h>#include<ctype.h>#inclu

性能分析之MySQL索引实战案例

文章目录 一、前言二、准备三、MySQL索引优化四、MySQL 索引知识回顾五、总结 一、前言 在上一讲性能工具之 JProfiler 简单登录案例分析实战中已经发现SQL没有建立索引问题,本文将一起从代码层去分析为什么没有建立索引? 开源ERP项目地址:https://gitee.com/jishenghua/JSH_ERP 二、准备 打开IDEA找到登录请求资源路径位置

安卓链接正常显示,ios#符被转义%23导致链接访问404

原因分析: url中含有特殊字符 中文未编码 都有可能导致URL转换失败,所以需要对url编码处理  如下: guard let allowUrl = webUrl.addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed) else {return} 后面发现当url中有#号时,会被误伤转义为%23,导致链接无法访问

购买磨轮平衡机时应该注意什么问题和技巧

在购买磨轮平衡机时,您应该注意以下几个关键点: 平衡精度 平衡精度是衡量平衡机性能的核心指标,直接影响到不平衡量的检测与校准的准确性,从而决定磨轮的振动和噪声水平。高精度的平衡机能显著减少振动和噪声,提高磨削加工的精度。 转速范围 宽广的转速范围意味着平衡机能够处理更多种类的磨轮,适应不同的工作条件和规格要求。 振动监测能力 振动监测能力是评估平衡机性能的重要因素。通过传感器实时监

黑神话,XSKY 星飞全闪单卷性能突破310万

当下,云计算仍然是企业主要的基础架构,随着关键业务的逐步虚拟化和云化,对于块存储的性能要求也日益提高。企业对于低延迟、高稳定性的存储解决方案的需求日益迫切。为了满足这些日益增长的 IO 密集型应用场景,众多云服务提供商正在不断推陈出新,推出具有更低时延和更高 IOPS 性能的云硬盘产品。 8 月 22 日 2024 DTCC 大会上(第十五届中国数据库技术大会),XSKY星辰天合正式公布了基于星

缓存雪崩问题

缓存雪崩是缓存中大量key失效后当高并发到来时导致大量请求到数据库,瞬间耗尽数据库资源,导致数据库无法使用。 解决方案: 1、使用锁进行控制 2、对同一类型信息的key设置不同的过期时间 3、缓存预热 1. 什么是缓存雪崩 缓存雪崩是指在短时间内,大量缓存数据同时失效,导致所有请求直接涌向数据库,瞬间增加数据库的负载压力,可能导致数据库性能下降甚至崩溃。这种情况往往发生在缓存中大量 k