From: hsn_ on
> > db2pd -stack -all hangs producing no output.
> Hmm, this should not happen either. Very odd.
db2pd requires instance started:
C:\IBM\SQLLIB\BIN>db2stop
SQL1064N Zpracování pøíkazu DB2STOP probìhlo úspì¹nì.

C:\IBM\SQLLIB\BIN>db2pd -stack all
Unable to attach to database manager. Please ensure db2start has been
run.

but db2start hangs before instance is fully started so db2pd probably
wait until db2 finishes its startup sequence. If you want i can
procmon db2pd to see what it is waiting for.

> Now, I'm confused. When does db2start fail? Only, if you set DIAGLEVEL to 4
> and delete the .db2diag.rotate.lck and db2diag.*.log files?
Yes.
Now i tested it again and it fails too if db2diag.0.log exists but it
is zero byte long. so no need to delete it and rotatelock.

> It does not fail, if you don't delete the files?
Yes.

> So you can actually set it to 4 and start db2 somehow?
If db2diag.0.log is longer than 0 bytes than it starts successfully.

otherwise you need to kill hanging db2start and then do
db2 update dbm cfg using diaglevel 3
which will update diaglevel back to 3 but it takes very long time -
about 15 minutes to finish. Then you can start db2 without problem
again.

> I tried it, but I'm still not able to reproduce it. But I tried it on Win32.
my os is windows xp 32 bit
> What is your 'db2level' output?
my db2 is 9.7.2 but that error is in 9.5 too. our customer report is
from person running 9.5.3.

> Can you please also post the output of 'db2 get dbm cfg'?
C:\IBM\SQLLIB\BIN>db2 get dbm cfg

Konfigurace správce databází

Typ uzlu = Databázový server s lokálními a vzdálenými klienty

Verze konfigurace správce databází = 0x0d00

Max. celkový poèet otevøených souborù (MAXTOTFILOP) = 16000
Rychlost CPU (ms/instrukce) (CPUSPEED) =
2,519169e-007

Max. poèet souèasnì aktivních databází (NUMDB) = 8
Podpora federovaného databázového systému (FEDERATED) = NO
Název transakèního monitoru (TP_MON_NAME) =

Výchozí nákladový úèet (DFT_ACCOUNT_STR) =

Cesta pro instalaci sady JDK (JDK_PATH) = C:\IBM
\SQLLIB\java\jd
k

Úroveò zachycení diagnostických chyb (DIAGLEVEL) = 3
Úroveò upozornìní (NOTIFYLEVEL) = 3
Cesta adresáøe diagnostických údajù (DIAGPATH) =
Velikost rotujícího ¾urnálu db2diag a ¾urnálu upozornìní (MB)
(DIAGSIZE) = 20

Výchozí pøepínaèe monitoru databází
Fond vyrovnávacích pamìtí (DFT_MON_BUFPOOL) = OFF
Zámky (DFT_MON_LOCK) = ON
Øazení (DFT_MON_SORT) = OFF
Pøíkazy (DFT_MON_STMT) = OFF
Tabulky (DFT_MON_TABLE) = OFF
Èasové znaèky (DFT_MON_TIMESTAMP) = ON
Transakce (DFT_MON_UOW) = OFF
Sledování naru¹ení instance a databází (HEALTH_MON) = ON

Název skupiny SYSADM (SYSADM_GROUP) =
Název skupiny SYSCTRL (SYSCTRL_GROUP) =
Název skupiny SYSMAINT (SYSMAINT_GROUP) =
Název skupiny SYSMON (SYSMON_GROUP) =

Modul plug-in pro jméno u¾ivatele a heslo klienta (CLNT_PW_PLUGIN) =
Modul plug-in zabezpeèení Kerberos (CLNT_KRB_PLUGIN) = IBMkrb5
Modul plug-in skupiny (GROUP_PLUGIN) =
Modul plug-in GSS pro lokální autorizaci (LOCAL_GSSPLUGIN) =
Re¾im modulu plug-in serveru (SRV_PLUGIN_MODE) = UNFENCED
Seznam modulù plug-in GSS serveru(SRVCON_GSSPLUGIN_LIST)=
Modul plug-in pro jméno u¾ivatele a heslo serveru (SRVCON_PW_PLUGIN)
=
Ovìøování pøipojení serveru (SRVCON_AUTH) =
NOT_SPECIFIED
Správce klastru (CLUSTER_MGR) =

Ovìøování správce databází (AUTHENTICATION) = SERVER
Alternativní ovìøování (ALTERNATE_AUTH_ENC) =
NOT_SPECIFIED
Katalogizace povolena bez oprávnìní (CATALOG_NOAUTH) = NO
Ovìøení v¹ech klientù (TRUST_ALLCLNTS) = YES
Zpùsob ovìøení klientù (TRUST_CLNTAUTH) = CLIENT
Vynechání federovaného ovìøování (FED_NOAUTH) = NO

Výchozí cesta databáze (DFTDBPATH) = C:

Velikost haldy monitoru databází (4kB) (MON_HEAP_SZ) =
AUTOMATIC(66)
Velikost haldy prostøedí JVM (4kB) (JAVA_HEAP_SZ) = 2048
Velikost vyrovnávací pamìti dozoru (4kB) AUDIT_BUF_SZ) = 0
Velikost sdílené pamìti instance (4kB) (INSTANCE_MEMORY) =
AUTOMATIC(399591)
Výchozí velikost záloh. vyr.pamìti (4kB) (BACKBUFSZ) = 1024
Výchozí velikost obnov. vyr.pamìti (4kB) (RESTBUFSZ) = 1024

Velikost zásobníku agentù (AGENT_STACK_SZ) = 128
Minimum potvrzené soukromé pamìti (4kB) (MIN_PRIV_MEM) = 32
Práh soukromé pamìti (4kB) (PRIV_MEM_THRESH) = 20000

Práh haldy pro øazení (4kB) (SHEAPTHRES) = 0

Podpora mezipamìti adresáøù (DIR_CACHE) = YES

Velikost haldy pro vrstvu podpory apl.(4kB) (ASLHEAPSZ) = 15
Max. velikost bloku I/O klienta (bajty) (RQRIOBLK) = 32767
Velikost haldy pro dotazy (4kB) (QUERY_HEAP_SZ) = 1000

Vliv obslu¾ných programù na výkon (UTIL_IMPACT_LIM) = 10

Priorita agentù (AGENTPRI) = SYSTEM
Velikost fondu agentù (NUM_POOLAGENTS) =
AUTOMATIC(100)
Výchozí poèet agentù ve fondu (NUM_INITAGENTS) = 0
Max. poèet agentù pro koordinaci (MAX_COORDAGENTS) =
AUTOMATIC(200)
Max. poèet klientských pøipojení (MAX_CONNECTIONS) =
AUTOMATIC(MAX_COORDAG
ENTS)

Udr¾ování chránìného procesu (KEEPFENCED) = YES
Poèet chránìných procesù ve fondu (FENCED_POOL) =
AUTOMATIC(MAX_COORDAG
ENTS)
Výchozí poèet chránìných procesù (NUM_INITFENCED) = 0

Doba pro znovuvytvoøení indexu (INDEXREC) = RESTART

Název databáze správce transakcí (TM_DATABASE) = 1ST_CONN
Interval pro resynchronizaci (s) (RESYNC_INTERVAL) = 180

Název SPM (SPM_NAME) = RADIM
Velikost ¾urnálu SPM (SPM_LOG_FILE_SZ) = 256
Omezení poètu agentù SPM (SPM_MAX_RESYNC) = 20
Cesta k ¾urnálu SPM (SPM_LOG_PATH) =

Název pracovní stanice NetBIOS (NNAME) =

Název slu¾by TCP/IP (SVCENAME) = 50000
Re¾im zji¹»ování (DISCOVER) = SEARCH
Instance serveru zji¹»ování (DISCOVER_INST) = ENABLE

Soubor databáze klíèù serveru SSL (SSL_SVR_KEYDB) =
Soubor pro doèasné ukládání serveru SSL (SSL_SVR_STASH) =
Popis certifikátu serveru SSL (SSL_SVR_LABEL) =
Název slu¾by SSL (SSL_SVCENAME) =
Specifikace ¹ifrování protokolu SSL (SSL_CIPHERSPECS) =
Verze protokolu SSL (SSL_VERSIONS) =
Soubor databáze klíèù klienta SSL (SSL_CLNT_KEYDB) =
Soubor pro doèasné ukládání klienta SSL (SSL_CLNT_STASH) =

Max. stupeò paralelizmu pro dotazy (MAX_QUERYDEGREE) = 1
Povolení paralelizmu v rámci oblasti (INTRA_PARALLEL) = NO

Poè. vnitø. kom. vyrov. pamìtí (4kB) (FCM_NUM_BUFFERS) =
AUTOMATIC(1024)
Poèet vnitøních komunikaèních kanálù (FCM_NUM_CHANNELS) =
AUTOMATIC(512)
Prodleva db2start/db2stop (min) (START_STOP_TIME) = 15

> As soon as I'm able to reproduce the problem, I'll send the data and the
> problem description to the owner of this component.
you can send him this report anyway he might be able to find what is
going on. This depends on logfile size >0. It is not tied to logrotate
function because it hangs with (diagsize 0) too.
From: Helmut Tessarek on
> db2pd requires instance started:
> C:\IBM\SQLLIB\BIN>db2pd -stack all
> Unable to attach to database manager. Please ensure db2start has been
> run.

Yes, you are right. Totally forgot about that.

> but db2start hangs before instance is fully started so db2pd probably
> wait until db2 finishes its startup sequence. If you want i can
> procmon db2pd to see what it is waiting for.

No, that's ok.

> If db2diag.0.log is longer than 0 bytes than it starts successfully.

At least there is a workaround. You always can add some characters to the
file... :-)

>> I tried it, but I'm still not able to reproduce it. But I tried it on Win32.
> my os is windows xp 32 bit

I also tried it on WinXP 32bit with DB2 9.7.2.

> �rove� zachycen� diagnostick�ch chyb (DIAGLEVEL) = 3
> �rove� upozorn�n� (NOTIFYLEVEL) = 3
> Cesta adres��e diagnostick�ch �daj� (DIAGPATH) =
> Velikost rotuj�c�ho �urn�lu db2diag a �urn�lu upozorn�n� (MB)
> (DIAGSIZE) = 20

Hmm, I am using my own diagpath (d:\db2dump). I also removed my diagpath value
and tried to reproduce the problem again. Without success.

> you can send him this report anyway he might be able to find what is
> going on. This depends on logfile size >0. It is not tied to logrotate
> function because it hangs with (diagsize 0) too.

I can send him this report, but if he is not able to reproduce the problem,
then I doubt that he can do something. I've been trying it now on 3 different
OS with 4 different DB2 versions/releases.
I have not been able to reproduce the problem even once.

--
Helmut K. C. Tessarek
DB2 Performance and Development

/*
Thou shalt not follow the NULL pointer for chaos and madness
await thee at its end.
*/
From: hsn_ on
> >> I tried it, but I'm still not able to reproduce it. But I tried it on Win32.
> my os is windows xp 32 bit SP3
dont waste your time on trying to re-create it issue, i found root
cause. Its simple to fix as you can see from procmon traces. Root
cause is pretty simple as you can see from my procmon trace.
http://rapidshare.com/files/397732121/Logfile.PML
its trace from db2 instance stopped, just db2dasstm running. it nicely
illustrates error. At beginning of capture, there is existing
db2diag.log file and everything is running smoothly. In middle i
deleted diag log fail and suddenly we see that it is started to fail.
You can look what happened. Wrong sequence is following.

queryopen
queryopen
createFile - creates new 0 bytes long file
lockFile offset 0, length 1, exclusive, dontwait. Fails because file
is zero sized and you cant lock 1 byte range on zero sized file.

Procedure for diag file locking needs to be changed. probably best way
will be to use file-wide locks instead of range locks, or query file
size before using range lock.
From: hsn_ on
same problem with 1 byte long lock on 0 sized file is on file
".db2diag.rotate.lck". It needs to be 1 byte long to make it work.