Monitoring Workflow Agent Listeners

Hey, the blog is new and I haven’t yet gotten used to it, therefore I absolutely don’t know what you would like to read about :) But it’s also good because I can choose the topics freely. This one will be about monitoring the Workflow Agent Listeners or how to make sure Agent listeners are really doing their job.

The question is simple: How do we know if Agent listeners are really working as expected?

We can take a look of the status icons is Oracle Application Manager

Status icon of Agent Listeners

… the icon is green so the started Agent listeners are working. You can also click on the icon and see the status of each Agent listener separately …

Status of each Agent listener

But is that enough to say that it’s all right with the Agent listeners? if we want to be sure about it, we have to check if all the job that has to be done by the Agent listeners is really done.

As Agent listeners are based on Oracle queues, we’ll check if the queue serving particular Agent listener contains many pending messages. If it is - it’s clear that something is not right there.

We’ll need to query 3 tables/views to get the information you need for checking how much work is left to be done for the listeners:

applsys.fnd_svc_components - table which stores information about service components whic also the Agent listeners are;
applsys.wf_agents - Agent definitions for Business Event System (ETRM)
dba_queues - Describes the operational characteristics of every queue in a database (Oracle 10g R2 documentation)

Here’s the select that shows which queue corresponds to each Agent listener:

select t.component_name,
       p.owner,
       p.queue_table,
       t.correlation_id
  from applsys.fnd_svc_components t,
       applsys.wf_agents o,
       dba_queues p
 where t.inbound_agent_name || t.outbound_agent_name = o.name
   and p.owner || '.' || p.name = o.queue_name
   and t.component_type like 'WF_%AGENT%';

There are some interesting parts, like t.inbound_agent_name || t.outbound_agent_name = o.name in the query, that’s because the Agent listener will be either inbound or outbound, but not both at the same time, so we can concatenate both fields (as one of them is NULL) to make it shorter.

The output of the query will be like this:

Agent listener related queue tables

I’ll use this information to construct queries for checking pending messages in the queues dynamically:

select 'select ''' || t.component_name || ' (queue_table: ' ||
       p.QUEUE_TABLE || ')''||'' Count: ''||count(*) c from ' || p.owner || '.' ||
       p.QUEUE_TABLE ||
       ' where deq_time is null and nvl(delay,enq_time)<sysdate-1/24 ' ||
       nvl2(t.correlation_id,
            'and corrid like ''' || t.correlation_id || ''' ',
            null) || 'having count(*)>0;'
  from applsys.fnd_svc_components t, applsys.wf_agents o, dba_queues p
 where t.inbound_agent_name || t.outbound_agent_name = o.name
   and p.OWNER || '.' || p.name = o.queue_name
   and t.component_type like 'WF_%AGENT%';

This will give you queries for checking how many messages that should have been processed an hour ago there are for each Agent listener in it’s queue. Like these:

select 'ECX Inbound Agent Listener (queue_table: ECX_INQUEUE)' ||
       ' Count: ' || count(*) c
  from APPS.ECX_INQUEUE
 where deq_time is null
   and nvl(delay, enq_time) < sysdate - 1 / 24 having count(*) > 0;

… or …

select 'Workflow Deferred Notification Agent Listener (queue_table: WF_DEFERRED)' ||
       ' Count: ' || count(*) c
  from APPLSYS.WF_DEFERRED
 where deq_time is null
   and nvl(delay, enq_time) < sysdate - 1 / 24
   and corrid like 'oracle.apps.wf.notification.%'
having count(*) > 0;

After executing all the dynamically generated select statements, the output will turn your attention only to potentially problematic ones.

Workflow Java Deferred Agent Listener (queue_table: WF_JAVA_DEFERRED) Count: 3493

I have integrated this kind of check into monitoring system we have so that I get a message to my mailbox when my attention is needed.

Of course, this just makes sure Agent Listeners get their “tasks”, but does not mean they complete them successfuly… On the other hand, if an Agent listener works with errors - it is terminated after a number of errors that is definable through AOM, and if that happens, new messages will pile up in the queues and this monitoring will notice it.