Maris Elsins' blog

Meaning of "Disk Reads" Values in DBA_HIST_SQLSTAT

Nov 1, 2013 • Oracle Database, Performance, Internals •
This blog post was originally released on Pythian blog

This post relates to my previous writing on mining the AWR. I noticed that it’s very easy to misinterpret the DISK_READS_TOTAL and DISK_READS_DELTA columns in DBA_HIST_SQLSTAT. Let’s see what the documentation says:
- DISK_READS_TOTAL - Cumulative number of disk reads for this child cursor
- DISK_READS_DELTA - Delta number of disk reads for this child cursor
You might think it’s clear enough and that’s exactly what I thought too. The number of disk reads is the number of IO requests to the storage. But is it really true?
Read on →
There's Always Another Bug Hiding Just Around the Corner

Aug 15, 2013 • Oracle Database, Bugs, Troubleshooting, Internals

This blog post was originally released on Pythian blog

We were using a 10.2.0.3 database, and it had been running without any issues for several years. What could possibly go wrong? Anything! Suddenly, we started getting ORA-07445: exception encountered: core dump [qercoStart()+156] [SIGSEGV] [Address not mapped to object] a few times a minute in the alert log. A closer investigation revealed that one of the popular SQLs in the application couldn’t complete anymore. It looked like a bug, since only the SQL was failing.

Read on →
Be Warned: cmclean.sql is Dangerous!

Jul 18, 2013 • Performance, Concurrent Processing, Tools •

This blog post was originally released on Pythian blog

I’m sure one of the most popular scripts for Apps DBAs on My Oracle Support is cmclean.sql from MOS Article ID 134007.1 “Concurrent Processing - CMCLEAN.SQL - Non Destructive Script to Clean Concurrent Manager Tables”. DBAs usually use the script to clean up stale data from concurrent processing tables (FND_CONCURRENT_%) after incidents like a crash of the database or concurrent processing node. This script sets correct completion phase and status codes for terminated concurrent requests and sets correct control codes for terminated concurrent manager processes. Despite the assuring “Non Destructive” claim in the title of the MOS Article there is a possibility to lose concurrent request schedules when cmclean.sql is executed.

Read on →
Performance Settings of Concurrent Managers

Apr 30, 2013 • Oracle e-Business Suite, Concurrent Processing, Performance, Internals •
This blog post was originally released on Pythian blog

This is the second article in a series about internals and performance of concurrent managers. In this post, we’ll take a look at three important settings that affect the performance of the concurrent managers: number of processes, “sleep seconds”, and “cache size”. This article might be a bit on the theoretical side, but it should provide a good understanding of how these settings actually affect the behavior and performance of concurrent managers. Most of the statements in this article build off of information from my previous post: The Internal Workflow of e-Business Suite Concurrent Manager Process. It may be helpful to take a look at it before continuing with this one.

Life cycle of a Concurrent Request

The interesting thing about tuning concurrent managers is the fact that we don’t tune a particular query or a running process, but we actually tune the pending time of concurrent requests. The goal of the tuning is to make sure concurrent requests start executing soon enough after the time they have been scheduled for. Let’s take a look at the life cycle of a concurrent request:

The Life cycle of a Concurrent Request (CR)

Based on the diagram above, the pending time of the request is the interval between the time the request was scheduled to start and the time it actually started. This time can be split in two parts:
1. Pending for Conflict Resolution Manager (CRM) - Here the CRM checks the incompatibility rules effective for the pending concurrent request against other running requests. The CRM allows the request to execute only when all incompatible requests have completed.
2. Pending for Concurrent Manager (CM) - This is the time spent waiting for an available concurrent manager process. It also includes the time the CM process takes to fetch the request from FND_CONCURRENT_REQUESTS table and to start executing it. “Pending for CM” is the interval that can be tuned by altering the number of manager processes, “sleep seconds” and the “cache size” settings.
Read on →
Which Partitions Does The SQL Statement Access?

Apr 5, 2013 • Oracle Database, Troubleshooting •
This blog post was originally released on Pythian blog.

This is a short blog post on how one can prove that a particular partition of the table or index is accessed by a specific query.

The solution to this question seemed interesting enough to share with you all. Here’s a short description of what I had:
- A SQL with sql_id = 9kuvj1g38hja2 is the TOP 1 statement in “User I/O” section of AWR report.
- The statement queries data from multiple tables. One of them is a fairly large table named “ALERT”. It is range partitioned by date, and each year has it’s own partition (Y2013, Y2012, …).
- The SQL takes 5 – 10 seconds to complete and is executed 500 – 5000 times each hour (sometimes even ~10 of them are running simultaneously).
- ~95% of elapsed time is spent on db file sequential reads.
- Diagnostic pack is licensed (it is required for this method as ASH is used).
We also had a statement so that none of the queries should be reading data older then 1 year, but looking at the statistics I suspected there was something wrong. So here’s how I found out older data from the ALERT table was accessed:
1. As the query is executed very often, it has many appearances in ASH.
2. As the statement does lots of db file sequential reads, there are many p1 (file_id) and p2 (block_id) combinations available from ASH.
3. file_id and block_id can be joined to DBA_EXTENTS to find out the segment which the block belongs to. That is where the partition names being accessed by the SQL where found out.
The challenge with this approach was the fact that I had more then 100K samples for 9kuvj1g38hja2 waiting on db file sequential reads in DBA_HIST_ACTIVE_SESS_HISTORY, and there were millions of extents in the database. How do I join them efficiently? I can fetch only the segments we’re interested in from DBA_EXTENTS before joining it all to DBA_HIST_ACTIVE_SESS_HISTORY (Thanks Yury for this great hint)! So this is the query that solved the mystery:
```
with
exts as (select /*+ materialize */ partition_name, file_id, block_id, blocks
         from dba_extents where owner='SECRET' and segment_name='ALERT'),
stats as (select /*+ materialize */ p1, p2, count(*) cnt
          from DBA_HIST_ACTIVE_SESS_HISTORY where sql_id='9kuvj1g38hja2'
           and event='db file sequential read' group by p1, p2)
select partition_name, sum(s.cnt) read_count
  from exts e, stats s
 where e.file_id=s.p1
   and s.p2 between e.block_id and e.block_id+e.blocks-1
 group by partition_name
 order by 2;
```
It’s short and simple, and it also answered the question I asked. Is old data accessed by the SQL statement? Yes.
```
PARTITION_NAME                 READ_COUNT
------------------------------ ----------
Y2006                                 111
Y2007                                5645
Y2008                                5698
Y2013                               10388
Y2009                               13473
Y2010                               21600
Y2011                               37186
Y2012                               56184

 8 rows selected
```
Be careful when interpreting the results! If the SQL does not report a partition as being accessed, it doesn’t mean it was not accessed. ASH is based on sampling – maybe we got “lucky” and samples didn’t cover reads from that partition? Maybe all blocks belonging to the partition were in buffer cache and no reads were needed? But, if the SQL reported a partition as being accessed, we can be 100% sure it really was, and for me that was enough.

« Older Newer »

Life cycle of a Concurrent Request