Mining the AWR to Identify Performance Trends
This blog post was originally released on Pythian blog
Sometimes it’s useful to check how performance of a SQL statement changes over time. The diagnostic pack features provide some really useful information to answer these questions. The data is there, but it not always easy to retrieve it, especially if you want to see how the performance changes over time. I’ve been using three really simple scripts to retrieve this information from the AWR. These scripts help me answer the following questions:
- How does the performance of a particular SQL change over time?
- How do wait times of a particular wait event change over time?
- How does a particular statistic change over time?
_ Please note, the scripts provided here require diagnostic pack licenses and it’s your task to make sure you have them before running the scripts._
SQL performance
I use script <a awr_sqlid_perf_trend.sql to check how performance of the SQL changes over time. The script summarizes the data from DBA_HIST_SQLSTAT and reports the average statistics for a single execution of the query during the reporting interval. It requires 3 input parameters:
- SQL ID
- Days to report. It will summarize all AWR snapshots starting with “trunc(sysdate)-{days to report}+1”, so if you pass “1”, it will summarize all snapshots from today, if “2” - than it’s yesterday and today are included.
- Interval in hours. “24” will provide one row for each day, “6” will give 4 rows a day.
Nothing shows it better than an example. Below you see how I’m checking execution statistics for sql_id fd7rrqkn1k2xb by summarizing the AWR information captured in last 2 weeks and reporting the average values for 2-day intervals. Then I’m taking a little closer look at the last 4 days for the same SQL by summarizing data over 6hour intervals. Note, the time column shows the beginning of the interval.
SQL> @awr_sqlid_perf_trend.sql fd7rrqkn1k2xb 14 48
TIME EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
16.10.2013 00:00:00 351 195.571 74.995 .097 .000 .000 .000 134417.570 21319182.291 293731.556 304434.305
18.10.2013 00:00:00 364 91.225 47.474 1.687 .000 .000 .002 141140.228 20364053.544 270107.745 273343.709
20.10.2013 00:00:00 542 20.686 9.378 .004 .000 .000 .000 146436.875 4597922.220 3.168 .000
22.10.2013 00:00:00 531 25.060 12.086 .161 .000 .000 .000 146476.605 6026729.224 23999.684 23998.859
24.10.2013 00:00:00 542 51.611 40.675 1.880 .000 .000 .000 146814.220 21620264.039 287994.862 287994.701
26.10.2013 00:00:00 534 39.949 26.688 1.050 .000 .000 .000 147099.275 14081016.607 159704.463 159704.418
28.10.2013 00:00:00 245 37.837 29.384 1.150 .000 .000 .000 147135.216 15505533.959 179244.437 179244.367
7 rows selected.
SQL> @awr_sqlid_perf_trend.sql fd7rrqkn1k2xb 4 6
TIME EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC DISK_READS_1EXEC DIRECT_WRITES_1EXEC
------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
26.10.2013 00:00:00 72 19.209 9.439 .000 .000 .000 .000 147076.000 4623816.597 .111 .000
26.10.2013 06:00:00 72 15.391 9.401 .000 .000 .000 .000 147086.403 4624153.819 .000 .000
26.10.2013 12:00:00 72 14.022 9.351 .000 .000 .000 .000 147099.000 4624579.639 .000 .000
26.10.2013 18:00:00 55 48.174 35.723 1.575 .000 .000 .000 147099.000 19192781.582 243584.055 243584.055
27.10.2013 00:00:00 72 76.723 43.350 2.116 .000 .000 .000 147099.000 23258326.875 314445.111 314445.111
27.10.2013 06:00:00 72 64.921 43.914 2.084 .000 .000 .000 147107.542 23258506.028 315673.000 315673.000
27.10.2013 12:00:00 72 52.567 43.383 2.041 .000 .000 .000 147116.000 23258739.403 315673.000 315673.000
27.10.2013 18:00:00 47 25.522 18.095 .523 .000 .000 .000 147117.532 9382873.851 80597.702 80597.362
28.10.2013 00:00:00 65 17.645 9.384 .000 .000 .000 .000 147120.000 4625354.262 .000 .000
28.10.2013 06:00:00 19 17.571 9.451 .000 .000 .000 .000 147122.421 4625411.263 .000 .000
28.10.2013 12:00:00 6 14.083 9.645 .000 .000 .000 .000 147208.167 4629315.167 .000 .000
28.10.2013 18:00:00 48 42.173 35.208 1.509 .000 .000 .000 147236.375 18606643.833 229433.750 229433.750
29.10.2013 00:00:00 72 53.015 43.517 2.022 .000 .000 .000 147245.125 23265547.847 314507.319 314507.083
29.10.2013 06:00:00 30 52.181 43.638 1.932 .000 .000 .000 147250.300 23265839.767 303949.000 303949.000
29.10.2013 12:00:00 5 59.576 43.836 1.177 .000 .000 .000 144049.800 23267109.200 227814.000 227814.000
15 rows selected.
I’ve checked this SQL because the users reported inconsistent performance. It can also be observed in the outputs above. Take a look! The number of rows processed during each execution of the SQL doesn’t change - it’s always around 147K, but look at the disk reads and the direct writes! These values can be around zero, but then they suddenly jump up to 300K, and when they do, the buffer gets increase too and the CPU time goes up from 9 seconds to 43. Based on the information above it looks like there could be two different execution plans involved and bind variable peeking could be causing one or the other plan to become the active plan. Additionally you can use the same script to check how execution statistics for the same SQL change over time. Does the elapsed time increase? Do the number of processed rows or number of buffer gets per execution change?
Wait event performance
Script awr_wait_trend.sql can be used to show the changes in wait counts and wait durations for a particular event over time. Similarly to the previous script it also requires 3 parameters, only instead of SQL ID you pass the name of the wait event. This time the data comes from DBA_HIST_SYSTEM_EVENT. I typically use this script in two situations:
- To check if a particular wait event performs worse when an overall performance problem is reported (usually I’m looking at IO events)
- Illustrate how the implemented change improved the situation.
The example below shows how the performance of log file parallel write event changed over 3 weeks. On october 19th we moved the redo logs to dedicated high performance LUNs. Before that the 2 members of each redo log group were located on a saturated LUN together with all the data files.
SQL> @awr_wait_trend.sql 'log file parallel write' 21 24
TIME EVENT_NAME TOTAL_WAITS TOTAL_TIME_S AVG_TIME_MS
------------------- ---------------------------- --------------- -------------- --------------
09.10.2013 00:00:00 log file parallel write 4006177 31667.591 7.905
10.10.2013 00:00:00 log file parallel write 3625342 28296.640 7.805
11.10.2013 00:00:00 log file parallel write 3483249 31032.324 8.909
12.10.2013 00:00:00 log file parallel write 3293462 33351.490 10.127
13.10.2013 00:00:00 log file parallel write 2871091 36413.925 12.683
14.10.2013 00:00:00 log file parallel write 3763916 30262.718 8.040
15.10.2013 00:00:00 log file parallel write 3018760 28262.172 9.362
16.10.2013 00:00:00 log file parallel write 3303205 31062.276 9.404
17.10.2013 00:00:00 log file parallel write 3012105 31831.491 10.568
18.10.2013 00:00:00 log file parallel write 2692697 26981.966 10.020
19.10.2013 00:00:00 log file parallel write 1038399 512.950 .494
20.10.2013 00:00:00 log file parallel write 959443 427.554 .446
21.10.2013 00:00:00 log file parallel write 1520444 606.580 .399
22.10.2013 00:00:00 log file parallel write 1618490 655.873 .405
23.10.2013 00:00:00 log file parallel write 1889845 751.216 .398
24.10.2013 00:00:00 log file parallel write 1957384 760.656 .389
25.10.2013 00:00:00 log file parallel write 2204260 853.691 .387
26.10.2013 00:00:00 log file parallel write 2205783 856.731 .388
27.10.2013 00:00:00 log file parallel write 2033199 785.785 .386
28.10.2013 00:00:00 log file parallel write 2439092 923.368 .379
29.10.2013 00:00:00 log file parallel write 2233614 840.628 .376
21 rows selected.
Visualizing the data from output like that is easy too!
System Statistics
The last script from this set is awr_stat_trend.sql. It does the same thing with the system statistics collected in DBA_HIST_SYSSTAT as previous scripts did to the performance of SQLs and wait events. The parameters are similar again - the name of the system statistic, days to report and the interval. I usually use the query to check how the redo size or the number of physical reads change over time, but there’s huge number of statistics available (638 different statistics in 11.2.0.3) and that’s why I’m sure you’ll find your own reasons to use this script.
SQL> @awr_stat_trend.sql 'redo size' 3 4
TIME STAT_NAME VALUE
------------------- ------------------------- -----------------
27.10.2013 00:00:00 redo size 1739466208
27.10.2013 04:00:00 redo size 2809857936
27.10.2013 08:00:00 redo size 648511376
27.10.2013 12:00:00 redo size 533287888
27.10.2013 16:00:00 redo size 704832684
27.10.2013 20:00:00 redo size 819854908
28.10.2013 00:00:00 redo size 2226799060
28.10.2013 04:00:00 redo size 3875182764
28.10.2013 08:00:00 redo size 1968024072
28.10.2013 12:00:00 redo size 1125339352
28.10.2013 16:00:00 redo size 1067175300
28.10.2013 20:00:00 redo size 936404908
29.10.2013 00:00:00 redo size 1758952428
29.10.2013 04:00:00 redo size 3949193948
29.10.2013 08:00:00 redo size 1715444632
29.10.2013 12:00:00 redo size 1008385144
29.10.2013 16:00:00 redo size 544946804
17 rows selected.
AWR is a gold mine, but you need the right tools for digging. I hope the scripts will be useful for you too! P.S. You might have noticed the scripts are published on GitHub, let me know if you find any issues using them and perhaps one day you’ll find new versions for the script.
Update (4-Nov-2013)
I’ve added the instance numbers to the outputs in all three scripts. This is how it looks now:
SQL> @awr_sqlid_perf_trend.sql 0fxqsqwhkn03b 14 48
INST TIME EXECUTIONS ELAPSED_TIME_S_1EXEC CPU_TIME_S_1EXEC IOWAIT_S_1EXEC CLWAIT_S_1EXEC APWAIT_S_1EXEC CCWAIT_S_1EXEC ROWS_PROCESSED_1EXEC BUFFER_GETS_1EXEC DISK_READS_1EXEC DIRECT_WRITES_1EXEC
----- ------------------- ----------- -------------------- ---------------- -------------- -------------- -------------- -------------- -------------------- ----------------- ----------------- -------------------
1 28.10.2013 00:00:00 840 .611 .014 .595 .007 .000 .000 1.000 1085.583 128.724 .000
30.10.2013 00:00:00 1466 .491 .011 .479 .005 .000 .000 1.000 976.001 88.744 .000
01.11.2013 00:00:00 542 .798 .023 .760 .025 .000 .000 1.000 896.978 114.196 .000
03.11.2013 00:00:00 544 .750 .021 .719 .017 .000 .000 1.000 1098.213 134.941 .000
2 28.10.2013 00:00:00 1638 .498 .017 .474 .013 .000 .000 1.001 953.514 96.287 .000
30.10.2013 00:00:00 1014 .745 .022 .712 .019 .000 .000 1.000 1034.249 131.057 .000
01.11.2013 00:00:00 1904 .633 .011 .624 .002 .000 .000 1.000 1045.668 104.568 .000
03.11.2013 00:00:00 810 .602 .017 .581 .010 .000 .000 1.000 929.778 108.998 .000
8 rows selected.