在一个风和日丽阳光明媚的早上,刚到公司屁股还没有坐热,就接到报告:XX系统一条SQL执行效率很低,赶紧上去经过一番犀利操作,啪啪啪敲下熟悉的命令,获取SQL信息如下:
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID dkusf44y9g1yv, child number 0
-------------------------------------
SELECT A.NO_ID,A.BILL_ID1 FROM MM_USER_INFO A, MM_ATOBSTESTSTATE_INFO B
WHERE A.NO_ID=B.NO_ID AND B.DONETIME < ADD_MONTHS(SYSDATE,-6) AND
B.RUN_IDNO=W AND A.KKK_SERVICE_ID=1111
Plan hash value: 3801554394 ------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | 64561 (100)| | | |
|* 1 | HASH JOIN | | 255K| 11M| 9744K| 64561 (2)| 00:12:55 | | |
| 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| MM_USER_INFO | 255K| 6741K| | 26342 (1)| 00:05:17 | ROWID | ROWID |
|* 3 | INDEX RANGE SCAN | IDX_MMUSER_SERVICEID | 255K| | | 649 (1)| 00:00:08 | | |
| 4 | PARTITION RANGE ALL | | 475K| 8821K| | 37038 (2)| 00:07:25 | 1 | 17 |
|* 5 | TABLE ACCESS FULL | MM_ATOBSTESTSTATE_INFO | 475K| 8821K| | 37038 (2)| 00:07:25 | 1 | 17 |
--------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NO_ID"="B"."NO_ID")
3 - access("A"."KKK_SERVICE_ID"=1111)
5 - filter(("B"."RUN_IDNO"=W AND "B"."DONETIME" 26 rows selected.
|
从执行计划上大体可以看出,这条语句走HASHJOIN,主要慢在ID=2和ID=5。值得注意一点:这里的执行计划是指标都是估算的,估算的东西就是可能不准确,特别是执行计划有问题的时候,正是因为各种不准的指标才导致执行计划走的不好,那么我们要分析问题的根源,最好可以通过PredicateInformation里的条件去计算真实的Rows,或用gather_plan_statistics或altersession setstatistics_level=all去看A-Rows,A-Time等信息,从而准确判断慢在哪一步,最终找出慢的根源。
通过上述理论指导,下面赶紧看下真实执行计划(这里有真实的Starts,A-Rows等):
Planhash value: 3801554394
------------------------------------------------------------------------------------------------------------------------------------------------------------
|Id | Operation | Name |Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 842 |00:01:18.55 | 670K| 570K| | | |
|* 1 | HASH JOIN | | 1 | 255K| 842 |00:01:18.55 | 670K| 570K| 130M| 13M| 126M (0)|
| 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| UR_USER_INFO | 1 | 255K| 2113K|00:00:41.52 | 299K| 200K| | | |
|* 3 | INDEX RANGE SCAN | IDX_USERINFO_SERV_ID | 1 | 255K| 2113K|00:00:07.14 | 12844 | 12843 | | | |
| 4 | PARTITION RANGE ALL | | 1 | 475K| 842 |00:00:34.24 | 370K| 370K| | | |
|* 5 | TABLE ACCESS FULL | UR_CRMTOBOSSSTATE_INFO | 17 | 475K| 842 |00:00:34.23 | 370K| 370K| | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------
PredicateInformation (identified by operation id):
---------------------------------------------------
1- access("A"."ID_NO"="B"."ID_NO")
3- access("A"."MASTER_SERV_ID"=2063)
5- filter(("B"."RUN_CODE"=W AND"B"."OP_TIME"
通过查看真实执行计划,一眼定位出的确慢在ID=2和ID=5两个步骤,其中ID=2回表慢,从索引查找7s回表变为41s(也就是回表操作需要34s),而ID=5是全表慢。其中ID=5的估算行数是47.5w行,实际上只有842行(这就是估算的行和真实行差别超大,一般统计信息不准),因此适合建立索引,通过下面分析适合建立组合索引。
SQL> select count(*) from MM_ATOBSTESTSTATE_INFO
2 where DONETIME < ADD_MONTHS(SYSDATE, -6);
COUNT(*)
----------
14539090
SQL> select/*+parallel(16)*/ count(*) from MM_ATOBSTESTSTATE_INFO
2 where RUN_IDNO = W;
COUNT(*)
----------
59675
SQL> select/*+parallel(16)*/ count(*) from MM_ATOBSTESTSTATE_INFO
2 where RUN_IDNO = W and DONETIME < ADD_MONTHS(SYSDATE,-6);
COUNT(*)
----------
842
|
索引创建如下:
CREATE INDEX "OPERTIADM"."IDX1_MM_ATOBSTESTSTATE_INFO" ON "OPERTIADM"."MM_ATOBSTESTSTATE_INFO" ("RUN_IDNO","DONETIME")
LOCAL TABLESPACE "TBS_IDX_TKO" parallel 16;
alter index "OPERTIADM"."IDX1_MM_ATOBSTESTSTATE_INFO" noparallel;
|
然后令人郁闷的事情发生了,执行计划,它竟然没有变:
Execution Plan
----------------------------------------------------------
Plan hash value: 3801554394
--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 255K| 11M| | 64561 (2)| 00:12:55 | | |
|* 1 | HASH JOIN | | 255K| 11M| 9744K| 64561 (2)| 00:12:55 | | |
| 2 | TABLE ACCESS BY GLOBAL INDEX ROWID| MM_USER_INFO | 255K| 6741K| | 26342 (1)| 00:05:17 | ROWID | ROWID |
|* 3 | INDEX RANGE SCAN | IDX_MMUSER_SERVICEID | 255K| | | 649 (1)| 00:00:08 | | |
| 4 | PARTITION RANGE ALL | | 475K| 8821K| | 37038 (2)| 00:07:25 | 1 | 17 |
|* 5 | TABLE ACCESS FULL | MM_ATOBSTESTSTATE_INFO | 475K| 8821K| | 37038 (2)| 00:07:25 | 1 | 17 |
--------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."NO_ID"="B"."NO_ID")
3 - access("A"."KKK_SERVICE_ID"=1111)
5 - filter("B"."RUN_IDNO"=W AND "B"."DONETIME"<="" add_months(sysdate@!,-6))
|
回头一想也正常,执行计划未变,主要是ID=5的cardinality估算不准确。既然cardinality不准确,那么就收集统计信息,收集完统计信息的执行计划如下:
Plan hash value: 1403561594 ---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 0 | | 0 |00:00:00.01 | 0 | 0 |
|* 1 | TABLE ACCESS BY LOCAL INDEX ROWID | MM_USER_INFO | 1 | 1 | 842 |00:00:10.81 | 302K| 5 |
| 2 | NESTED LOOPS | | 1 | 1349K| 1685 |00:00:10.81 | 301K| 5 |
| 3 | PARTITION RANGE SUBQUERY | | 1 | 1349K| 842 |00:00:10.80 | 300K| 5 |
| 4 | TABLE ACCESS BY LOCAL INDEX ROWID| MM_ATOBSTESTSTATE_INFO | 6 | 1349K| 842 |00:00:00.01 | 842 | 0 |
|* 5 | INDEX RANGE SCAN | IDX1_MM_ATOBSTESTSTATE_INFO | 6 | 1349K| 842 |00:00:00.01 | 77 | 0 |
| 6 | PARTITION RANGE ITERATOR | | 842 | 1 | 842 |00:00:00.01 | 1741 | 0 |
|* 7 | INDEX UNIQUE SCAN | PK_MM_USER_INFO | 842 | 1 | 842 |00:00:00.01 | 1741 | 0 |
---------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"."KKK_SERVICE_ID"=1111)
5 - access("B"."RUN_IDNO"=W AND "B"."DONETIME" 7 - access("A"."NO_ID"="B"."NO_ID")
|
现在走NESTEDLOOPS了,而且使用了刚建立的索引,并且原来慢的ID=2变成被驱动了,走UNIQUESCAN,但是还是需要10s多,效率没有啥提升,通过执行计划可以看出主要慢在 PARTITIONRANGE SUBQUERY上,这个查询转换(QueryTransformation)的玩意,没有啥好办法了,只能关闭这个功能对应的参数,当然,最好的是使用SQLPROFILE绑定到这条语句,在语句级进行修改,类似加了下面的HINTS:
SELECT/*+leading(b) use_nl(a) index(b IDX1_MM_ATOBSTESTSTATE_INFO)
opt_param(_subquery_pruning_enabled false)*/ A.NO_ID, A.BILL_ID1
FROM MM_USER_INFO A, MM_ATOBSTESTSTATE_INFO B WHERE A.NO_ID = B.NO_ID
AND B.DONETIME < ADD_MONTHS(SYSDATE, -6) AND B.RUN_IDNO = W
AND A.KKK_SERVICE_ID = 1111
|
然后用上述加了HINTS的SQL执行计划,绑定到原SQL,最终SQL执行计划如下:
Plan hash value: 4290111086 ------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 0 | | 0 |00:00:00.01 | 0 | |* 1 | TABLE ACCESS BY LOCAL INDEX ROWID | MM_USER_INFO | 1 | 1 | 842 |00:00:00.02 | 3436 | | 2 | NESTED LOOPS | | 1 | 1349K| 1685 |00:00:00.02 | 2594 | | 3 | PARTITION RANGE ALL | | 1 | 1349K| 842 |00:00:00.01 | 853 | | 4 | TABLE ACCESS BY LOCAL INDEX ROWID| MM_ATOBSTESTSTATE_INFO | 17 | 1349K| 842 |00:00:00.01 | 853 | |* 5 | INDEX RANGE SCAN | IDX1_MM_ATOBSTESTSTATE_INFO | 17 | 1349K| 842 |00:00:00.01 | 88 | | 6 | PARTITION RANGE ITERATOR | | 842 | 1 | 842 |00:00:00.01 | 1741 | |* 7 | INDEX UNIQUE SCAN | PK_MM_USER_INFO | 842 | 1 | 842 |00:00:00.01 | 1741 | ------------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("A"."KKK_SERVICE_ID"=1111) 5 - access("B"."RUN_IDNO"=W AND "B"."DONETIME" 7 - access("A"."NO_ID"="B"."NO_ID")
|
非常完美,使用到我们建立的索引,并且消除了PARTITIONRANGESUBQUERY,执行效率由原来的10s多降低为0.01s,效率提升上千倍。SQL优化的方式N种,唯有准确找到问题根源才能快速解决,这里我通过分析真实的执行计划快速找到问题的ROOTCAUSE,从而解决之。
总结下这个案例:先通过分析得知需要建立索引,但是建立索引后执行计划未变,发现是cardinality估算不准,那么收集统计信息,收集完毕后,走索引和NL,但是却出现了PARTITIONRANGE SUBQUERY影响效率,通过收集统计信息等方式已经解决不了,那么只能在语句级先关闭这个参数,通过SQLPROFILE绑定,从而达到解决问题的目的。
附:关于PARITITION的详细内容可以参考
VLDBand Partitioning Guide :
https://docs.oracle.com/en/database/oracle/oracle-database/19/vldbg/index.html
里面的AdvancedPartition Pruning Techniques有关于PARTITIONRANGE SUBQUERY的内容。