From: loopless on
Some background. This application was a Visual C++ 6.0 MFC application that
has no problems at all. I "ported" to Visual 2003 .NET and now this has
started happening. There is no chance that "old" VC 6 compiled libraries have
been accidentally included.
All I can say is that it is not a simple bug in my application ( the VC 6
build was extensively "purified" and reliable). In fact , I am using the VC 6
build from the exact same code base every day in development, while trying to
get this VC .NET build actually working.

The debug output below is one of three identical (i.e. they all occur in the
CAfxStringMgr::Free) "Invalid free". The invalid free's occurs in both debug
and release builds

I am not sure why anyone would say CStrings cannot be used as static
objects, they are just C++ objects like anything else that will be destroyed
"onexit"



WINDBG OUTPUT OF RELEASE BUILD

0:000> r;kb
eax=7c220060 ebx=7c220060 ecx=7c91eb05 edx=0172f82a esi=03e70000 edi=7c220060
eip=7c901230 esp=0172fa34 ebp=0172fa38 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
ntdll!DbgBreakPoint:
7c901230 cc int 3
ChildEBP RetAddr Args to Child
WARNING: Stack unwind information not available. Following frames may be
wrong.
0172fa38 7c96cd80 7c220060 03e70000 7c220068 ntdll!DbgBreakPoint
0172fa4c 7c96df66 03e70000 7c220060 7c96e11c
ntdll!RtlpNtMakeTemporaryKey+0x6b74
0172fac0 7c94a5d0 03e70000 50000061 7c220068
ntdll!RtlpNtMakeTemporaryKey+0x7d5a
0172fba8 7c9268ad 03e70000 40000060 7c220068
ntdll!RtlInitializeSListHead+0x115c0
0172fc78 7c34218a 03e70000 00000000 7c220068 ntdll!iswdigit+0x128
0172fcc0 7c171836 7c220068 7c171827 7c220068 MSVCR71!free+0xc3
[f:\vs70builds\3052\vc\crtbld\crt\src\free.c @ 103]
0172fcc8 7c171827 7c220068 7c17dca1 000752b0 MFC71!CAfxStringMgr::Free+0xa
[f:\vs70builds\3077\vc\mfcatl\ship\atlmfc\src\mfc\strcore.cpp @ 154]
0172fcd0 7c17dca1 000752b0 7c1ca131 00000000
MFC71!ATL::CStringData::Release+0x19
[f:\vs70builds\3077\vc\mfcatl\ship\atlmfc\include\atlsimpstr.h @ 99]
0172fcd8 7c1ca131 00000000 7c145d75 00000001
MFC71!_AFX_EDIT_STATE::~_AFX_EDIT_STATE+0x14
[f:\vs70builds\3077\vc\mfcatl\ship\atlmfc\src\mfc\viewedit.cpp @ 91]
0172fce0 7c145d75 00000001 000752b0 0172fd30
MFC71!_AFX_RICHEDIT_STATE::`scalar deleting destructor'+0x8
0172fcf0 7c145c92 7c150bd7 7c140000 00000000
MFC71!CProcessLocalObject::~CProcessLocalObject+0x23
[f:\vs70builds\3077\vc\mfcatl\ship\atlmfc\src\mfc\afxtls.cpp @ 473]
0172fcf4 7c150bd7 7c140000 00000000 00000001 MFC71!_CRT_INIT+0x95
[f:\vs70builds\3077\vc\crtbld\crt\src\crtdll.c @ 226]
0172fd30 7c9011a7 7c140000 00000000 00000001 MFC71!_DllMainCRTStartup+0x9a
[f:\vs70builds\3077\vc\crtbld\crt\src\crtdll.c @ 288]
0172fd50 7c923f31 7c14f41a 7c140000 00000000 ntdll!LdrInitializeThunk+0x29
0172fdd4 7c81ca3e 00000000 00000001 00000000 ntdll!LdrShutdownProcess+0x142
0172fec8 7c81cab6 00000000 77e8f3b0 ffffffff kernel32!IsValidLocale+0x8eb
0172fedc 7c348d03 00000000 7c3476c9 00000000 kernel32!ExitProcess+0x14
0172fee4 7c3476c8 00000000 7c80b529 00000000 MSVCR71!__crtExitProcess+0x2e
[f:\vs70builds\3052\vc\crtbld\crt\src\crt0dat.c @ 463]
0172ff14 7c348d11 00000000 00000000 00000000 MSVCR71!doexit+0xab
[f:\vs70builds\3052\vc\crtbld\crt\src\crt0dat.c @ 414]
0172ff24 009a8448 00000000 00090000 00e9fa9c MSVCR71!exit+0xd
[f:\vs70builds\3052\vc\crtbld\crt\src\crt0dat.c @ 303]

PURIFY OUTPUT OF DEBUG BUILD

Address 0x07a7bcdc points into a section in a user DLL
Location of free attempt
free [f:\vs70builds\3077\vc\crtbld\crt\src\dbgheap.c:1024]
CAfxStringMgr::Free(CStringData::ATL *)
[f:\vs70builds\3077\vc\mfcatl\ship\atlmfc\src\mfc\strcore.cpp:154]
ATL::CStringData::Release(void)
[f:\vs70builds\3077\vc\mfcatl\ship\atlmfc\include\atlsimpstr.h:97]
ATL::CSimpleStringT<char,1>::~CSimpleStringT<char,1>(void)
[f:\vs70builds\3077\vc\mfcatl\ship\atlmfc\include\atlsimpstr.h:264]
ATL::CStringT<wchar_t,class StrTraitMFC_DLL<wchar_t,class
ATL::ChTraitsCRT<wchar_t> > >::~CStringT<wchar_t,class
StrTraitMFC_DLL<wchar_t,class ATL::ChTraitsCRT<wchar_t> > >(void)
[f:\vs70builds\3077\vc\mfcatl\ship\atlmfc\include\cstringt.h:963]
_AFX_EDIT_STATE::~_AFX_EDIT_STATE(void)
[f:\vs70builds\3077\vc\mfcatl\ship\atlmfc\src\mfc\viewedit.cpp:91]
_AFX_RICHEDIT_STATE::~_AFX_RICHEDIT_STATE(void)
[f:\vs70builds\3077\vc\mfcatl\ship\atlmfc\src\mfc\winctrl4.cpp:31]
_AFX_RICHEDIT_STATE::`vector deleting destructor'(UINT)
[D:\WINDOWS\system32\MFC71D.DLL]
CProcessLocalObject::~CProcessLocalObject(void)
[f:\vs70builds\3077\vc\mfcatl\ship\atlmfc\src\mfc\afxtls.cpp:472]

CProcessLocal<_AFX_RICHEDIT_STATE>::~CProcessLocal<_AFX_RICHEDIT_STATE>(void)
[D:\WINDOWS\system32\MFC71D.DLL]


"Oleg Starodumov" wrote:

>
> > You should not use CStrings in any static objects (or objects destroyed at
> > static destruction time). AfxStringMgr is static itself. If it's destroyed
> > before your objects, you get a crash.
> >
>
> IMO it should not be a problem, since CAfxStringMgr instance is marked as
> #pragma init_seg( compiler )
>
> > >I have been running both Purify and the Windows debug tools - they show the
> > > same thing.
>
> What exactly do they show? (WinDbg output would be especially useful,
> please also include the output of 'r;kb' commands).
>
> Oleg
>
>
>
>
>
From: Oleg Starodumov on

> Some background. This application was a Visual C++ 6.0 MFC application that
> has no problems at all. I "ported" to Visual 2003 .NET and now this has
> started happening. There is no chance that "old" VC 6 compiled libraries have
> been accidentally included.
> All I can say is that it is not a simple bug in my application ( the VC 6
> build was extensively "purified" and reliable).

Yes, it looks like by some reason the pointer to CAfxStringMgr::m_nil gets passed
to free().

m_nil (an instance of CNilStringData) is designed so that it should never be freed
(its reference counter (nRefs) should never reach zero). If it still gets freed, it most
likely means that by some reason its reference counter reaches zero. You can
verify the value of the reference counter by looking at dword at 0x7c220074
(in Release build).

If the reference counter does really reach zero (becomes <= 0), we should find
the code that is responsible for incorrect modification of the counter. This can be
done by setting a data breakpoint to monitor changes to the counter. In WinDbg,
you can use the following command (when debugging Release build):

ba w4 0x7c220074 "dd 0x7c220074 L1;kb;g"

For any modification of m_nil's reference counter, it will print the new value of the counter
and the call stack, and thus you will be able to monitor all modifications of the reference
counter and determine why it reaches zero and who changes it. The output can be huge,
but hopefully it should allow to identify the place(s) where the counter is changed incorrectly.

> 7c901230 cc int 3
> ChildEBP RetAddr Args to Child
> WARNING: Stack unwind information not available. Following frames may be
> wrong.
> 0172fa38 7c96cd80 7c220060 03e70000 7c220068 ntdll!DbgBreakPoint

Note that symbol server path is not set on your system, and symbols for system DLLs
are not available. I would recommend to setup the symbol server path - it will ensure
that you get correct call stacks when monitoring the reference counter.

The simplest way is to use this command in WinDbg:
..symfix c:\symbols
(it will set symbol server path for the current debugging session)

Better ways are described in Symbols section of Debugging Tools' documentation.

Oleg





From: loopless on
>Yes, it looks like by some reason the pointer to CAfxStringMgr::m_nil gets
passed
>to free().

Hi Oleg,
Thanks for you help on this , but how did you come to this conclusion?
CAfxStringMgr::Free is trying to release invalid memory, but how did you
decide that CAfxStringMgr::m_nil is involved?

I did download the complete symbols for XP, and get many output blocks like
following in windbg after setting the breakpoint.
I assume what we need to look for is the ref count being set to 0. I don't
see the actual refcount in the output?




0:001> ba w4 0x7c220074 "dd 0x7c220074 L1;kb;g"
0:001> g
*** WARNING: Unable to verify checksum for VAOne.exe
7c220074 000001bc
ChildEBP RetAddr Args to Child
0172fb04 004f1319 0172fc0c 77d7f689 dcbaabcd
MFC71!CAfxStringMgr::GetNilString+0xa
0172fbd0 7c1718e7 0172fd00 00ae6920 03f1d100
VAOne!CAutoSEA2Doc::OnUpdateIndicatorSel+0x39
0172fbe4 7c171915 03f1d100 0000009b ffffffff MFC71!_AfxDispatchCmdMsg+0xb6
0172fc14 7c1720a9 0000009b ffffffff 0172fd00 MFC71!CCmdTarget::OnCmdMsg+0x10a
0172fc30 7c172052 0000009b ffffffff 0172fd00 MFC71!CDocument::OnCmdMsg+0x17
0172fc64 7c171dc4 0000009b ffffffff 0172fd00 MFC71!CView::OnCmdMsg+0x51

From: Oleg Starodumov on

> CAfxStringMgr::Free is trying to release invalid memory, but how did you
> decide that CAfxStringMgr::m_nil is involved?
>

Take a look at the (partial) call stack:

ChildEBP RetAddr Args to Child
0172fa38 7c96cd80 7c220060 03e70000 7c220068 ntdll!DbgBreakPoint
0172fa4c 7c96df66 03e70000 7c220060 7c96e11c ntdll!RtlpNtMakeTemporaryKey+0x6b74
0172fac0 7c94a5d0 03e70000 50000061 7c220068 ntdll!RtlpNtMakeTemporaryKey+0x7d5a
0172fba8 7c9268ad 03e70000 40000060 7c220068 ntdll!RtlInitializeSListHead+0x115c0
0172fc78 7c34218a 03e70000 00000000 7c220068 ntdll!iswdigit+0x128
0172fcc0 7c171836 7c220068 7c171827 7c220068 MSVCR71!free+0xc3
0172fcc8 7c171827 7c220068 7c17dca1 000752b0 MFC71!CAfxStringMgr::Free+0xa

Note that 0x7c220068 is passed to CAfxStringMgr::Free() as the first parameter,
which then passes it to free().

If you run 'ln 0x7c220068' command in WinDbg, you will see that this address belongs
to afxStringManager variable (the global instance of CAfxStringMgr), and (considering
the offset from the beginning of afxStringManager) points to afxStringManager.m_nil.

> I don't see the actual refcount in the output?
>
> 0:001> ba w4 0x7c220074 "dd 0x7c220074 L1;kb;g"
> 0:001> g
> *** WARNING: Unable to verify checksum for VAOne.exe
> 7c220074 000001bc

The counter is on the line above (this is the output of 'dd' command, the first number
is the address that we passed to 'dd', and the second number (1bc) is the current value
of the reference counter).

> I assume what we need to look for is the ref count being set to 0.

m_nil's reference counter is explicitly set to 2 in the constructor of CNilStringData.
It assumes that the counter never reaches zero (and therefore CNilStringData object
is never freed) if the following condition is met: the number of times the reference counter
is incremented is the same as the number of times the counter is decremented.
Therefore, to make the counter value reach zero, the counter should be either:
- decremented more times that it has been incremented
- or explicitly set to a smaller value (e.g. by a wild write)

After you have collected the output, I would first check if someone changes the counter
to an unrelated value (normally, every hit of the data breakpoint should show that the counter
is incremented by 1 or decremented by 1). If someone increments or decrements the counter
by more than 1, he is the suspect.

If all increments/decrements are made exactly by 1, there is probably no wild write,
and instead the code by some reasons frees more strings than allocates. It can be more
difficult to find the responsible code in this case, but should still be possible, since
the output will show you all attempts to increment/decrement the counter.

Oleg






From: ezzyrrd-match on
Hi Oleg,
Thanks for the great detective work and explanation - very much
appreciated - I am digging into it some more based on what you have
explained above. When I find the bug ( I got distracted by something
else) I will post

Andrew
(posting from Google..)