From: Olegas on
Hello there,

I'm looking for some suggestions in regard to the issue below. If this is
not appropriate group, please redirect me.

I'm trying to track down root cause for a fairly elusive problem that occurs
once in awhile and only when full page heap is enabled. The problem is not
reproducible on demand when we enable full page heap, but symptoms suggest a
heap corruption. Not sure if this is some type of timing or synchronization
issue.

Basically, we get crash dumps from our customers that show:
# 0 Id: 1f0.810 Suspend: 1 Teb: 7ffdf000 Unfrozen
ChildEBP RetAddr Args to Child
0012fa64 7c90df5a 7c91b24b 00000064 00000000 ntdll!KiFastSystemCallRet
0012fa68 7c91b24b 00000064 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc
0012faf0 7c901046 0197e178 7c912cce 7c97e178
ntdll!RtlpWaitForCriticalSection+0x132
0012faf8 7c912cce 7c97e178 00000000 006ff238
ntdll!RtlEnterCriticalSection+0x46
0012fb34 7c80b4af 00000001 00000000 0012fb70 ntdll!LdrLockLoaderLock+0xea
0012fb8c 7c80b5ad 00400000 02c5cdf8 00000104 kernel32!GetModuleFileNameW+0x89
0012fbb8 73e69cc8 00400000 0012fcdc 00000104 kernel32!GetModuleFileNameA+0x4b
0012fee8 73e69c67 006ff238 73ddcf57 00400000
mfc42!CWinApp::SetCurrentHandles+0x45
0012fef0 73ddcf57 00400000 00000000 00152ffc mfc42!AfxWinInit+0x4f
0012ff10 0065f893 00400000 00000000 00152ffc mfc42!AfxWinMain+0x2c
0012ff24 0056f6f2 00400000 00000000 00152ffc MyApp!WinMain+0x15
[appmodul.cpp @ 30]
0012ffc0 7c817077 80000001 0145e0c4 7ffdd000 MyApp!WinMainCRTStartup+0x134
0012fff0 00000000 0056f5be 00000000 78746341 kernel32!BaseProcessStart+0x23

1 Id: 1f0.8c8 Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr Args to Child
02a0f650 7c90df4a 7c8648a2 00000002 02a0f838 ntdll!KiFastSystemCallRet
02a0f654 7c8648a2 00000002 02a0f838 00000001
ntdll!ZwWaitForMultipleObjects+0xc
02a0fa08 77c32f0f 02a0fa50 00000000 00000000
kernel32!UnhandledExceptionFilter+0x8b9
02a0fa24 77c3a3c7 00000000 02a0fa50 77c35cf5 msvcrt!_XcptFilter+0x161
02a0fa30 77c35cf5 02a0fa58 00000000 02a0fa58 msvcrt!_endthreadex+0xc0
02a0fa58 7c9032a8 02a0fb44 02a0ffa4 02a0fb60 msvcrt!_except_handler3+0x61
02a0fa7c 7c90327a 02a0fb44 02a0ffa4 02a0fb60 ntdll!ExecuteHandler2+0x26
02a0fb2c 7c90e48a 00000000 02a0fb60 02a0fb44 ntdll!ExecuteHandler+0x24
02a0fb2c 77c47631 00000000 02a0fb60 02a0fb44
ntdll!KiUserExceptionDispatcher+0xe
02a0fe2c 73e68a0e 02c72fe0 00000000 00000024 msvcrt!memset+0x41
02a0fe48 73e682ca 00000001 02c6eee8 00000000
mfc42!CThreadSlotData::SetValue+0xb9
02a0fe5c 73e6842a 73e688db 00000003 02171891
mfc42!CThreadLocalObject::GetData+0x55
02a0fe68 02171891 021752e8 00000003 80284006
mfc42!AFX_MAINTAIN_STATE2::AFX_MAINTAIN_STATE2+0x14
02a0fe80 02171a7b 02170000 00000003 00000000 MyDll!DllMain+0x102
[dllmodul.cpp @ 169]
02a0fea0 7c90118a 02170000 00000003 00000000 MyDll!_DllMainCRTStartup+0x50
02a0fec0 7c913a43 02171a2b 02170000 00000003 ntdll!LdrpCallInitRoutine+0x14
02a0ff38 7c80c136 7c968f03 02786f78 026fff78 ntdll!LdrShutdownThread+0xd7
02a0ff70 77c3a33b 00000000 02786f78 02a0ffb4 kernel32!ExitThread+0x3e
02a0ff80 77c3a3b5 00000000 7c968f03 02695000 msvcrt!_endthreadex+0x34
02a0ffb4 7c80b729 026fff78 7c968f03 02695000 msvcrt!_endthreadex+0xae

2 Id: 1f0.8e8 Suspend: 1 Teb: 7ffdc000 Unfrozen
ChildEBP RetAddr Args to Child
0300fc0c 7c90df5a 7c91b24b 00000064 00000000 ntdll!KiFastSystemCallRet
0300fc10 7c91b24b 00000064 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc
0300fc98 7c901046 0197e178 7c91e3b5 7c97e178
ntdll!RtlpWaitForCriticalSection+0x132
0300fca0 7c91e3b5 7c97e178 0300fd2c 00000004
ntdll!RtlEnterCriticalSection+0x46
0300fd18 7c90e457 0300fd2c 7c900000 00000000 ntdll!_LdrpInitialize+0xf0
00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x7

All 3 threads were created about 30 seconds ago, but consumed hardly any CPU
time. And it seems as the application is shutting down.
0:001> !runaway 7
User Mode Time
Thread Time
2:8e8 0 days 0:00:00.000
1:8c8 0 days 0:00:00.000
0:810 0 days 0:00:00.000
Kernel Mode Time
Thread Time
0:810 0 days 0:00:00.140
1:8c8 0 days 0:00:00.031
2:8e8 0 days 0:00:00.000
Elapsed Time
Thread Time
0:810 0 days 0:00:29.746
1:8c8 0 days 0:00:29.590
2:8e8 0 days 0:00:29.511

NT loader lock is owned by thread 1. Threads 0 and 2 are blocked on the NT
loader lock's critical section.
0:001> !cs -l
-----------------------------------------
DebugInfo = 0x7c97e1a0
Critical section = 0x7c97e178 (ntdll!LdrpLoaderLock+0x0)
LOCKED
LockCount = 0x2
OwningThread = 0x000008c8
RecursionCount = 0x1
LockSemaphore = 0x64
SpinCount = 0x00000000

So, an access violation takes place within thread 1's context. The access
violation occurs while setting memory block 02c72fe0 to 0 with size of 36
bytes. However, the memory block in question was allocated 4 bytes less than
what memset is using, which is why we get access violation.

0:001> !heap -p -a 02c72fe0
address 02c72fe0 found in
_DPH_HEAP_ROOT @ 141000
in busy allocation ( DPH_HEAP_BLOCK: UserAddr UserSize
- VirtAddr VirtSize)
142b24: 2c72fe0 20
- 2c72000 2000
7c918f21 ntdll!RtlAllocateHeap+0x00000e64
7c809a7f kernel32!LocalAlloc+0x00000058
73e689d7 mfc42!CThreadSlotData::SetValue+0x00000082
02171a7b MyDll!_DllMainCRTStartup+0x00000050
7c90118a ntdll!LdrpCallInitRoutine+0x00000014
7c913a43 ntdll!LdrShutdownThread+0x000000d7
7c80c136 kernel32!ExitThread+0x0000003e
77c3a33b msvcrt!_endthreadex+0x00000034
77c3a3b6 msvcrt!_endthreadex+0x000000af
7c80b729 kernel32!BaseThreadStart+0x00000037


The disassembly below matches the code snippet from AFXTLS.cpp.
73e689f0 e8c0d2fbff call mfc42!AfxThrowMemoryException (73e25cb5)
73e689f5 8b4608 mov eax,dword ptr [esi+8]
73e689f8 8b4f0c mov ecx,dword ptr [edi+0Ch]
73e689fb 2bc8 sub ecx,eax
73e689fd c1e102 shl ecx,2
73e68a00 51 push ecx
73e68a01 8b4e0c mov ecx,dword ptr [esi+0Ch]
73e68a04 8d0481 lea eax,[ecx+eax*4]
73e68a07 53 push ebx
73e68a08 50 push eax
73e68a09 e8748ff6ff call mfc42!memset (73dd1982)


void CThreadSlotData::SetValue(int nSlot, void* pValue)
<snip>
if (pData->pData == NULL)
AfxThrowMemoryException();

// initialize the newly allocated part
memset(pData->pData + pData->nCount, 0,
(m_nMax - pData->nCount) *
sizeof(LPVOID));

So, either (pData->pData + pData->nCount) points to a wrong memory block or
(m_nMax - pData->nCount) comes up with wrong byte count.

Does anyone have any suggestions on how to track down this issue?

Thank you,
Olegas
From: Joseph M. Newcomer on
It looks to me like you are seeing a problem in synchronization. You have not shown that
any heap function is on the stack at this point.

Note that heap corruption can occur at any time; when you get a message about heap
corruption, it is not saying "I have caused heap corruption" it is saying "I finally
discovered that some time in the unknown past, somebody has corrupted the heap". The
damage could be trillions of instructions in the past.

While this is a good bug report, unfortunately it isn't much help in tracking down what
the correuption is or who did it. These still rank as the nastiest possible bugs to find
and fix.

When you say "full page heap enabled" what are you doing to enable it?

joe

On Mon, 3 May 2010 12:44:41 -0700, Olegas <Olegas(a)community.nospam> wrote:

>Hello there,
>
>I�m looking for some suggestions in regard to the issue below. If this is
>not appropriate group, please redirect me.
>
>I�m trying to track down root cause for a fairly elusive problem that occurs
>once in awhile and only when full page heap is enabled. The problem is not
>reproducible on demand when we enable full page heap, but symptoms suggest a
>heap corruption. Not sure if this is some type of timing or synchronization
>issue.
>
>Basically, we get crash dumps from our customers that show:
># 0 Id: 1f0.810 Suspend: 1 Teb: 7ffdf000 Unfrozen
>ChildEBP RetAddr Args to Child
>0012fa64 7c90df5a 7c91b24b 00000064 00000000 ntdll!KiFastSystemCallRet
>0012fa68 7c91b24b 00000064 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc
>0012faf0 7c901046 0197e178 7c912cce 7c97e178
>ntdll!RtlpWaitForCriticalSection+0x132
>0012faf8 7c912cce 7c97e178 00000000 006ff238
>ntdll!RtlEnterCriticalSection+0x46
>0012fb34 7c80b4af 00000001 00000000 0012fb70 ntdll!LdrLockLoaderLock+0xea
>0012fb8c 7c80b5ad 00400000 02c5cdf8 00000104 kernel32!GetModuleFileNameW+0x89
>0012fbb8 73e69cc8 00400000 0012fcdc 00000104 kernel32!GetModuleFileNameA+0x4b
>0012fee8 73e69c67 006ff238 73ddcf57 00400000
>mfc42!CWinApp::SetCurrentHandles+0x45
>0012fef0 73ddcf57 00400000 00000000 00152ffc mfc42!AfxWinInit+0x4f
>0012ff10 0065f893 00400000 00000000 00152ffc mfc42!AfxWinMain+0x2c
>0012ff24 0056f6f2 00400000 00000000 00152ffc MyApp!WinMain+0x15
>[appmodul.cpp @ 30]
>0012ffc0 7c817077 80000001 0145e0c4 7ffdd000 MyApp!WinMainCRTStartup+0x134
>0012fff0 00000000 0056f5be 00000000 78746341 kernel32!BaseProcessStart+0x23
>
> 1 Id: 1f0.8c8 Suspend: 1 Teb: 7ffde000 Unfrozen
>ChildEBP RetAddr Args to Child
>02a0f650 7c90df4a 7c8648a2 00000002 02a0f838 ntdll!KiFastSystemCallRet
>02a0f654 7c8648a2 00000002 02a0f838 00000001
>ntdll!ZwWaitForMultipleObjects+0xc
>02a0fa08 77c32f0f 02a0fa50 00000000 00000000
>kernel32!UnhandledExceptionFilter+0x8b9
>02a0fa24 77c3a3c7 00000000 02a0fa50 77c35cf5 msvcrt!_XcptFilter+0x161
>02a0fa30 77c35cf5 02a0fa58 00000000 02a0fa58 msvcrt!_endthreadex+0xc0
>02a0fa58 7c9032a8 02a0fb44 02a0ffa4 02a0fb60 msvcrt!_except_handler3+0x61
>02a0fa7c 7c90327a 02a0fb44 02a0ffa4 02a0fb60 ntdll!ExecuteHandler2+0x26
>02a0fb2c 7c90e48a 00000000 02a0fb60 02a0fb44 ntdll!ExecuteHandler+0x24
>02a0fb2c 77c47631 00000000 02a0fb60 02a0fb44
>ntdll!KiUserExceptionDispatcher+0xe
>02a0fe2c 73e68a0e 02c72fe0 00000000 00000024 msvcrt!memset+0x41
>02a0fe48 73e682ca 00000001 02c6eee8 00000000
>mfc42!CThreadSlotData::SetValue+0xb9
>02a0fe5c 73e6842a 73e688db 00000003 02171891
>mfc42!CThreadLocalObject::GetData+0x55
>02a0fe68 02171891 021752e8 00000003 80284006
>mfc42!AFX_MAINTAIN_STATE2::AFX_MAINTAIN_STATE2+0x14
>02a0fe80 02171a7b 02170000 00000003 00000000 MyDll!DllMain+0x102
>[dllmodul.cpp @ 169]
>02a0fea0 7c90118a 02170000 00000003 00000000 MyDll!_DllMainCRTStartup+0x50
>02a0fec0 7c913a43 02171a2b 02170000 00000003 ntdll!LdrpCallInitRoutine+0x14
>02a0ff38 7c80c136 7c968f03 02786f78 026fff78 ntdll!LdrShutdownThread+0xd7
>02a0ff70 77c3a33b 00000000 02786f78 02a0ffb4 kernel32!ExitThread+0x3e
>02a0ff80 77c3a3b5 00000000 7c968f03 02695000 msvcrt!_endthreadex+0x34
>02a0ffb4 7c80b729 026fff78 7c968f03 02695000 msvcrt!_endthreadex+0xae
>
> 2 Id: 1f0.8e8 Suspend: 1 Teb: 7ffdc000 Unfrozen
>ChildEBP RetAddr Args to Child
>0300fc0c 7c90df5a 7c91b24b 00000064 00000000 ntdll!KiFastSystemCallRet
>0300fc10 7c91b24b 00000064 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc
>0300fc98 7c901046 0197e178 7c91e3b5 7c97e178
>ntdll!RtlpWaitForCriticalSection+0x132
>0300fca0 7c91e3b5 7c97e178 0300fd2c 00000004
>ntdll!RtlEnterCriticalSection+0x46
>0300fd18 7c90e457 0300fd2c 7c900000 00000000 ntdll!_LdrpInitialize+0xf0
>00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x7
>
>All 3 threads were created about 30 seconds ago, but consumed hardly any CPU
>time. And it seems as the application is shutting down.
>0:001> !runaway 7
> User Mode Time
> Thread Time
> 2:8e8 0 days 0:00:00.000
> 1:8c8 0 days 0:00:00.000
> 0:810 0 days 0:00:00.000
> Kernel Mode Time
> Thread Time
> 0:810 0 days 0:00:00.140
> 1:8c8 0 days 0:00:00.031
> 2:8e8 0 days 0:00:00.000
> Elapsed Time
> Thread Time
> 0:810 0 days 0:00:29.746
> 1:8c8 0 days 0:00:29.590
> 2:8e8 0 days 0:00:29.511
>
>NT loader lock is owned by thread 1. Threads 0 and 2 are blocked on the NT
>loader lock�s critical section.
>0:001> !cs -l
>-----------------------------------------
>DebugInfo = 0x7c97e1a0
>Critical section = 0x7c97e178 (ntdll!LdrpLoaderLock+0x0)
>LOCKED
>LockCount = 0x2
>OwningThread = 0x000008c8
>RecursionCount = 0x1
>LockSemaphore = 0x64
>SpinCount = 0x00000000
>
>So, an access violation takes place within thread 1�s context. The access
>violation occurs while setting memory block 02c72fe0 to 0 with size of 36
>bytes. However, the memory block in question was allocated 4 bytes less than
>what memset is using, which is why we get access violation.
>
>0:001> !heap -p -a 02c72fe0
> address 02c72fe0 found in
> _DPH_HEAP_ROOT @ 141000
> in busy allocation ( DPH_HEAP_BLOCK: UserAddr UserSize
>- VirtAddr VirtSize)
> 142b24: 2c72fe0 20
>- 2c72000 2000
> 7c918f21 ntdll!RtlAllocateHeap+0x00000e64
> 7c809a7f kernel32!LocalAlloc+0x00000058
> 73e689d7 mfc42!CThreadSlotData::SetValue+0x00000082
> 02171a7b MyDll!_DllMainCRTStartup+0x00000050
> 7c90118a ntdll!LdrpCallInitRoutine+0x00000014
> 7c913a43 ntdll!LdrShutdownThread+0x000000d7
> 7c80c136 kernel32!ExitThread+0x0000003e
> 77c3a33b msvcrt!_endthreadex+0x00000034
> 77c3a3b6 msvcrt!_endthreadex+0x000000af
> 7c80b729 kernel32!BaseThreadStart+0x00000037
>
>
>The disassembly below matches the code snippet from AFXTLS.cpp.
>73e689f0 e8c0d2fbff call mfc42!AfxThrowMemoryException (73e25cb5)
>73e689f5 8b4608 mov eax,dword ptr [esi+8]
>73e689f8 8b4f0c mov ecx,dword ptr [edi+0Ch]
>73e689fb 2bc8 sub ecx,eax
>73e689fd c1e102 shl ecx,2
>73e68a00 51 push ecx
>73e68a01 8b4e0c mov ecx,dword ptr [esi+0Ch]
>73e68a04 8d0481 lea eax,[ecx+eax*4]
>73e68a07 53 push ebx
>73e68a08 50 push eax
>73e68a09 e8748ff6ff call mfc42!memset (73dd1982)
>
>
>void CThreadSlotData::SetValue(int nSlot, void* pValue)
><snip>
> if (pData->pData == NULL)
> AfxThrowMemoryException();
>
> // initialize the newly allocated part
> memset(pData->pData + pData->nCount, 0,
> (m_nMax - pData->nCount) *
>sizeof(LPVOID));
>
>So, either (pData->pData + pData->nCount) points to a wrong memory block or
>(m_nMax - pData->nCount) comes up with wrong byte count.
>
>Does anyone have any suggestions on how to track down this issue?
>
>Thank you,
>Olegas
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Alexandre Grigoriev on
Was Mydll loaded before the failing thread started or after?

Set a breakpoint in MyDll!_DllMainCRTStartup in DLL_THREAD_DETACH handler
and see what's going on there.

"Olegas" <Olegas(a)community.nospam> wrote in message
news:248EDA04-69F7-4A08-B8A5-6423B0E45A51(a)microsoft.com...
> Hello there,
>
> I'm looking for some suggestions in regard to the issue below. If this is
> not appropriate group, please redirect me.
>
> I'm trying to track down root cause for a fairly elusive problem that
> occurs
> once in awhile and only when full page heap is enabled. The problem is not
> reproducible on demand when we enable full page heap, but symptoms suggest
> a
> heap corruption. Not sure if this is some type of timing or
> synchronization
> issue.
>
> Basically, we get crash dumps from our customers that show:
> # 0 Id: 1f0.810 Suspend: 1 Teb: 7ffdf000 Unfrozen
> ChildEBP RetAddr Args to Child
> 0012fa64 7c90df5a 7c91b24b 00000064 00000000 ntdll!KiFastSystemCallRet
> 0012fa68 7c91b24b 00000064 00000000 00000000
> ntdll!ZwWaitForSingleObject+0xc
> 0012faf0 7c901046 0197e178 7c912cce 7c97e178
> ntdll!RtlpWaitForCriticalSection+0x132
> 0012faf8 7c912cce 7c97e178 00000000 006ff238
> ntdll!RtlEnterCriticalSection+0x46
> 0012fb34 7c80b4af 00000001 00000000 0012fb70 ntdll!LdrLockLoaderLock+0xea
> 0012fb8c 7c80b5ad 00400000 02c5cdf8 00000104
> kernel32!GetModuleFileNameW+0x89
> 0012fbb8 73e69cc8 00400000 0012fcdc 00000104
> kernel32!GetModuleFileNameA+0x4b
> 0012fee8 73e69c67 006ff238 73ddcf57 00400000
> mfc42!CWinApp::SetCurrentHandles+0x45
> 0012fef0 73ddcf57 00400000 00000000 00152ffc mfc42!AfxWinInit+0x4f
> 0012ff10 0065f893 00400000 00000000 00152ffc mfc42!AfxWinMain+0x2c
> 0012ff24 0056f6f2 00400000 00000000 00152ffc MyApp!WinMain+0x15
> [appmodul.cpp @ 30]
> 0012ffc0 7c817077 80000001 0145e0c4 7ffdd000 MyApp!WinMainCRTStartup+0x134
> 0012fff0 00000000 0056f5be 00000000 78746341
> kernel32!BaseProcessStart+0x23
>
> 1 Id: 1f0.8c8 Suspend: 1 Teb: 7ffde000 Unfrozen
> ChildEBP RetAddr Args to Child
> 02a0f650 7c90df4a 7c8648a2 00000002 02a0f838 ntdll!KiFastSystemCallRet
> 02a0f654 7c8648a2 00000002 02a0f838 00000001
> ntdll!ZwWaitForMultipleObjects+0xc
> 02a0fa08 77c32f0f 02a0fa50 00000000 00000000
> kernel32!UnhandledExceptionFilter+0x8b9
> 02a0fa24 77c3a3c7 00000000 02a0fa50 77c35cf5 msvcrt!_XcptFilter+0x161
> 02a0fa30 77c35cf5 02a0fa58 00000000 02a0fa58 msvcrt!_endthreadex+0xc0
> 02a0fa58 7c9032a8 02a0fb44 02a0ffa4 02a0fb60 msvcrt!_except_handler3+0x61
> 02a0fa7c 7c90327a 02a0fb44 02a0ffa4 02a0fb60 ntdll!ExecuteHandler2+0x26
> 02a0fb2c 7c90e48a 00000000 02a0fb60 02a0fb44 ntdll!ExecuteHandler+0x24
> 02a0fb2c 77c47631 00000000 02a0fb60 02a0fb44
> ntdll!KiUserExceptionDispatcher+0xe
> 02a0fe2c 73e68a0e 02c72fe0 00000000 00000024 msvcrt!memset+0x41
> 02a0fe48 73e682ca 00000001 02c6eee8 00000000
> mfc42!CThreadSlotData::SetValue+0xb9
> 02a0fe5c 73e6842a 73e688db 00000003 02171891
> mfc42!CThreadLocalObject::GetData+0x55
> 02a0fe68 02171891 021752e8 00000003 80284006
> mfc42!AFX_MAINTAIN_STATE2::AFX_MAINTAIN_STATE2+0x14
> 02a0fe80 02171a7b 02170000 00000003 00000000 MyDll!DllMain+0x102
> [dllmodul.cpp @ 169]
> 02a0fea0 7c90118a 02170000 00000003 00000000 MyDll!_DllMainCRTStartup+0x50
> 02a0fec0 7c913a43 02171a2b 02170000 00000003
> ntdll!LdrpCallInitRoutine+0x14
> 02a0ff38 7c80c136 7c968f03 02786f78 026fff78 ntdll!LdrShutdownThread+0xd7
> 02a0ff70 77c3a33b 00000000 02786f78 02a0ffb4 kernel32!ExitThread+0x3e
> 02a0ff80 77c3a3b5 00000000 7c968f03 02695000 msvcrt!_endthreadex+0x34
> 02a0ffb4 7c80b729 026fff78 7c968f03 02695000 msvcrt!_endthreadex+0xae
>
> 2 Id: 1f0.8e8 Suspend: 1 Teb: 7ffdc000 Unfrozen
> ChildEBP RetAddr Args to Child
> 0300fc0c 7c90df5a 7c91b24b 00000064 00000000 ntdll!KiFastSystemCallRet
> 0300fc10 7c91b24b 00000064 00000000 00000000
> ntdll!ZwWaitForSingleObject+0xc
> 0300fc98 7c901046 0197e178 7c91e3b5 7c97e178
> ntdll!RtlpWaitForCriticalSection+0x132
> 0300fca0 7c91e3b5 7c97e178 0300fd2c 00000004
> ntdll!RtlEnterCriticalSection+0x46
> 0300fd18 7c90e457 0300fd2c 7c900000 00000000 ntdll!_LdrpInitialize+0xf0
> 00000000 00000000 00000000 00000000 00000000 ntdll!KiUserApcDispatcher+0x7
>
> All 3 threads were created about 30 seconds ago, but consumed hardly any
> CPU
> time. And it seems as the application is shutting down.
> 0:001> !runaway 7
> User Mode Time
> Thread Time
> 2:8e8 0 days 0:00:00.000
> 1:8c8 0 days 0:00:00.000
> 0:810 0 days 0:00:00.000
> Kernel Mode Time
> Thread Time
> 0:810 0 days 0:00:00.140
> 1:8c8 0 days 0:00:00.031
> 2:8e8 0 days 0:00:00.000
> Elapsed Time
> Thread Time
> 0:810 0 days 0:00:29.746
> 1:8c8 0 days 0:00:29.590
> 2:8e8 0 days 0:00:29.511
>
> NT loader lock is owned by thread 1. Threads 0 and 2 are blocked on the NT
> loader lock's critical section.
> 0:001> !cs -l
> -----------------------------------------
> DebugInfo = 0x7c97e1a0
> Critical section = 0x7c97e178 (ntdll!LdrpLoaderLock+0x0)
> LOCKED
> LockCount = 0x2
> OwningThread = 0x000008c8
> RecursionCount = 0x1
> LockSemaphore = 0x64
> SpinCount = 0x00000000
>
> So, an access violation takes place within thread 1's context. The access
> violation occurs while setting memory block 02c72fe0 to 0 with size of 36
> bytes. However, the memory block in question was allocated 4 bytes less
> than
> what memset is using, which is why we get access violation.
>
> 0:001> !heap -p -a 02c72fe0
> address 02c72fe0 found in
> _DPH_HEAP_ROOT @ 141000
> in busy allocation ( DPH_HEAP_BLOCK: UserAddr UserSize
> - VirtAddr VirtSize)
> 142b24: 2c72fe0 20
> - 2c72000 2000
> 7c918f21 ntdll!RtlAllocateHeap+0x00000e64
> 7c809a7f kernel32!LocalAlloc+0x00000058
> 73e689d7 mfc42!CThreadSlotData::SetValue+0x00000082
> 02171a7b MyDll!_DllMainCRTStartup+0x00000050
> 7c90118a ntdll!LdrpCallInitRoutine+0x00000014
> 7c913a43 ntdll!LdrShutdownThread+0x000000d7
> 7c80c136 kernel32!ExitThread+0x0000003e
> 77c3a33b msvcrt!_endthreadex+0x00000034
> 77c3a3b6 msvcrt!_endthreadex+0x000000af
> 7c80b729 kernel32!BaseThreadStart+0x00000037
>
>
> The disassembly below matches the code snippet from AFXTLS.cpp.
> 73e689f0 e8c0d2fbff call mfc42!AfxThrowMemoryException (73e25cb5)
> 73e689f5 8b4608 mov eax,dword ptr [esi+8]
> 73e689f8 8b4f0c mov ecx,dword ptr [edi+0Ch]
> 73e689fb 2bc8 sub ecx,eax
> 73e689fd c1e102 shl ecx,2
> 73e68a00 51 push ecx
> 73e68a01 8b4e0c mov ecx,dword ptr [esi+0Ch]
> 73e68a04 8d0481 lea eax,[ecx+eax*4]
> 73e68a07 53 push ebx
> 73e68a08 50 push eax
> 73e68a09 e8748ff6ff call mfc42!memset (73dd1982)
>
>
> void CThreadSlotData::SetValue(int nSlot, void* pValue)
> <snip>
> if (pData->pData == NULL)
> AfxThrowMemoryException();
>
> // initialize the newly allocated part
> memset(pData->pData + pData->nCount, 0,
> (m_nMax - pData->nCount) *
> sizeof(LPVOID));
>
> So, either (pData->pData + pData->nCount) points to a wrong memory block
> or
> (m_nMax - pData->nCount) comes up with wrong byte count.
>
> Does anyone have any suggestions on how to track down this issue?
>
> Thank you,
> Olegas

From: Olegas on
Joseph,
Thank you for the follow up.

We enable full page heap for our application via the GFlags utility as
follows:
GFLAGS.exe /p /enable MyApp.exe /full

Once we enabled full page heap, our application started crashing once in a
while at our customer's location. Unfortunately, it is not reproducible on
demand either at the customer's location or in the office under a debugger,
which leads me to believe it has to be some type of timing related issue.

In this particular dump, page heap's guard pages essentially prevented heap
corruption from taking place during memset() execution. If it wasn't for the
guard pages, memset() would happily overwrite 4 bytes past the end of
allocated block and application would crash elsewhere down the road.

Thank you,
Olegas


"Joseph M. Newcomer" wrote:

> It looks to me like you are seeing a problem in synchronization. You have not shown that
> any heap function is on the stack at this point.
>
> Note that heap corruption can occur at any time; when you get a message about heap
> corruption, it is not saying "I have caused heap corruption" it is saying "I finally
> discovered that some time in the unknown past, somebody has corrupted the heap". The
> damage could be trillions of instructions in the past.
>
> While this is a good bug report, unfortunately it isn't much help in tracking down what
> the correuption is or who did it. These still rank as the nastiest possible bugs to find
> and fix.
>
> When you say "full page heap enabled" what are you doing to enable it?
>
> joe
>

From: Olegas on
Alexandre,
Thank you for your reply.

That is a good question. Unfortunately, all I have is a second chance dump
collected via NTSD. I do not have matching AdPlus log that would show
sequence of module load / unload events.

Is there a way to learn that info from a crash dump?

I'll follow your suggestion under a debugger and see if anything stands out.

Thank you,
Olegas


"Alexandre Grigoriev" wrote:

> Was Mydll loaded before the failing thread started or after?
>
> Set a breakpoint in MyDll!_DllMainCRTStartup in DLL_THREAD_DETACH handler
> and see what's going on there.
>