From: shorti on
DB2 V9.5 on AIX,

We received an error on one of our machines that the DB2 active log
filesystem was full. I am assuming it was due to some uncommitted
transaction so I am trying to hunt down who it was. To do this I was
trying to determine what active log file was hung out in order to get
a time period of how long the transaction has been open so I can find
it among the thousands of transactions. I did confirm we filled up
the active log directory and not the archive log directory.

I see some potential lock holds in the application snapshot but even
with that it will be difficult to determine which one was held for a
long period of time since I dont see anything in the snapshot that
tells me when the first uncommitted transaction started.

Do any of these entries in the diag log or the entries I received from
the database snapshot give any information about the longest
uncommitted transaction or the active log that could not archive? I
have researched the logs stated in some of the entries below and I
dont see any indication that they truly are the log that had the
uncommitted transaction:

2010-07-15-04.24.58.172043+000 I942782A418 LEVEL: Error
PID : 2203654 TID : 9254 PROC : db2sysc 0
INSTANCE: afuser NODE : 000 DB : ALPHADB
EDUID : 9254 EDUNAME: db2loggr (ALPHADB) 0
FUNCTION: DB2 UDB, data protection services, sqlpgadf, probe:1970
RETCODE : ZRC=0x850F000C=-2062614516=SQLO_DISK "Disk full."
DIA8312C Disk was full.

2010-07-15-04.24.58.172374+000 I943201A390 LEVEL: Info
PID : 2203654 TID : 9254 PROC : db2sysc 0
INSTANCE: afuser NODE : 000 DB : ALPHADB
EDUID : 9254 EDUNAME: db2loggr (ALPHADB) 0
FUNCTION: DB2 UDB, data protection services, sqlpgadf, probe:1970
DATA #1 : <preformatted>
Last log file not required: S0016020.LOG.

2010-07-15-04.24.58.172554+000 I943592A397 LEVEL: Info
PID : 2203654 TID : 9254 PROC : db2sysc 0
INSTANCE: afuser NODE : 000 DB : ALPHADB
EDUID : 9254 EDUNAME: db2loggr (ALPHADB) 0
FUNCTION: DB2 UDB, data protection services, sqlpgasn, probe:190
DATA #1 : <preformatted>
Error -2062614516 from add active log is ignored.

---------------------------------------------------------------------------

From Snapshot:

Node number = 0
File number of first active log = 16034
File number of last active log = 16048
File number of current active log = 16035
File number of log being archived = 15838

First active log file = S0016034.LOG

-----------------------------------------------------------------------------------------------------------

The db2diag.log shows S0015718.LOG was the last file archived before
hitting the first "Disk was full" error and an sql -968. But I see
that we continued to archive until the machine was rebooted about 30
minutes later. I also see the archive logging before hand seems
steady and reasonable as well so nothing to indicate a log held open.
I want to also mention a online database table/index reorg was going
on and had started about 20 minutes prior to seeing the first error.
I know the online reorg will accumulate a lot more logging. I dont
suspect the reorg being a problem but it would have escalated the
number of logs being accumulated so probably irritated the problem of
an open transaction. We monitor the archive logging space during
online reorg to ensure we do not run out of space...that is why I know
we didnt run out of archive space also.

Any help on tracking this down with the logging I have would be
helpful. Thanks!
From: Lennart Jonsson on
On 2010-07-20 00:16, shorti wrote:
> DB2 V9.5 on AIX,
>
> We received an error on one of our machines that the DB2 active log
> filesystem was full. I am assuming it was due to some uncommitted
> transaction so I am trying to hunt down who it was. To do this I was
> trying to determine what active log file was hung out in order to get
> a time period of how long the transaction has been open so I can find
> it among the thousands of transactions. I did confirm we filled up
> the active log directory and not the archive log directory.
>

Perhaps something like:

select
x.agent_id,
substr(z.corr_TOKEN,1,12) as ip,
cast(y.stmt_text as varchar(800)) as stmt,
x.locks_held,
x.APPL_IDLE_TIME,
x.UOW_START_TIME
from sysibmadm.SNAPAPPL x
join sysibmadm.snapstmt y
on x.agent_id = y.agent_id
join sysibmadm.SNAPAPPL_INFO z
on x.agent_id = z.agent_id
where UOW_STOP_TIME is null
and TIMESTAMPDIFF(8,CHAR(current_timestamp - UOW_START_TIME)) > 8
order by UOW_START_TIME;

You might be interested in other attributes, and it is assumed that the
right monitors are on.

/Lennart

[...]