From: dpb on
Geo wrote:
> On Thu, 03 Jun 2010 18:42:07 -0500, dpb <none(a)non.net> wrote:
>
>
>> The key is as before, understand the embedding and match it -- once you
>> do that it _has_ to work... :) (I think you're relying too much on the
>> characters as displayed rather than worrying enough about what is
>> actually the byte pattern in memory).
>>
>
> Is it not possible to just open the xml file in Notepad to see what is /really/
> there?

As somebody noted, it's not XML, it's QP-encoded e-mail (MIME, I think???).

I don't know whether Notepad knows how to decode QP-encoded text or not;
I'd not trust it for this job to not do so and hence display the
embedded "=3D" string as simply "=" which is what is confusing Dennis
when he (I think/infer) looks at the string in his e-mail client and
tries to build the matching string pattern(s). It automagically decodes
(in this case _not_ helpfully) for him leaving him grasping a ephemeral
wisps... :)

I'd use an actual binary file viewer to be absolutely sure I knew
everything that was there first making sure there was nothing else going
on transparently. For example, there's the possibility of soft line
breaks that would be invisible in the e-mail client that if present
would also prevent the search pattern from matching.

In reality, his most robust solution depending on how much of this needs
to be decoded and whether there's more than just the one phrase or not
could be to find a QP decoder algorithm and use it prior to doing the
searching. Certainly it would be more generic in the end.

Having it pointed out that it isn't XML encoding certainly means he's
heading down the wrong path for certain in trying that tack.

Don't know if OP's coming back at this point or not...hopefully will to
either catch up and get resolution or report he did finally get there on
his own.

--

From: Dennis Rose on


"dpb" wrote:

> Geo wrote:
> > On Thu, 03 Jun 2010 18:42:07 -0500, dpb <none(a)non.net> wrote:
> >
> >
> >> The key is as before, understand the embedding and match it -- once you
> >> do that it _has_ to work... :) (I think you're relying too much on the
> >> characters as displayed rather than worrying enough about what is
> >> actually the byte pattern in memory).
> >>
> >
> > Is it not possible to just open the xml file in Notepad to see what is /really/
> > there?
>
> As somebody noted, it's not XML, it's QP-encoded e-mail (MIME, I think???).
>
> I don't know whether Notepad knows how to decode QP-encoded text or not;
> I'd not trust it for this job to not do so and hence display the
> embedded "=3D" string as simply "=" which is what is confusing Dennis
> when he (I think/infer) looks at the string in his e-mail client and
> tries to build the matching string pattern(s). It automagically decodes
> (in this case _not_ helpfully) for him leaving him grasping a ephemeral
> wisps... :)
>
> I'd use an actual binary file viewer to be absolutely sure I knew
> everything that was there first making sure there was nothing else going
> on transparently. For example, there's the possibility of soft line
> breaks that would be invisible in the e-mail client that if present
> would also prevent the search pattern from matching.
>
> In reality, his most robust solution depending on how much of this needs
> to be decoded and whether there's more than just the one phrase or not
> could be to find a QP decoder algorithm and use it prior to doing the
> searching. Certainly it would be more generic in the end.
>
> Having it pointed out that it isn't XML encoding certainly means he's
> heading down the wrong path for certain in trying that tack.
>
> Don't know if OP's coming back at this point or not...hopefully will to
> either catch up and get resolution or report he did finally get there on
> his own.
>
> --
>
> .
> Just posted my/your solution and my sincere thanks to each of you in my previous post on 6/2/10. Latest failure was a typo on my part. Everything OK now. Thanks again.
From: Dee Earley on
On 04/06/2010 16:49, dpb wrote:
> I don't know whether Notepad knows how to decode QP-encoded text or not;
> I'd not trust it for this job to not do so and hence display the
> embedded "=3D" string as simply "=" which is what is confusing Dennis
> when he (I think/infer) looks at the string in his e-mail client and
> tries to build the matching string pattern(s). It automagically decodes
> (in this case _not_ helpfully) for him leaving him grasping a ephemeral
> wisps... :)

The only file "formats" notepad recognises is the various UTF* formats.
Everything else is just text.

--
Dee Earley (dee.earley(a)icode.co.uk)
i-Catcher Development Team

iCode Systems

(Replies direct to my email address will be ignored.
Please reply to the group.)
From: dpb on
Dee Earley wrote:
> On 04/06/2010 16:49, dpb wrote:
>> I don't know whether Notepad knows how to decode QP-encoded text or not;
>> I'd not trust it for this job to not do so and hence display the
>> embedded "=3D" string as simply "=" which is what is confusing Dennis
>> when he (I think/infer) looks at the string in his e-mail client and
>> tries to build the matching string pattern(s). It automagically decodes
>> (in this case _not_ helpfully) for him leaving him grasping a ephemeral
>> wisps... :)
>
> The only file "formats" notepad recognises is the various UTF* formats.
> Everything else is just text.

I'd still not trust/use Notepad for the specific purpose (or any other
simple text editor for that matter) when I'm curious about what just
might be embedded in a file...

In this case if it doesn't understand QP-encoding (is that documented?)
once know that that is what it is, as long as the file isn't erroneously
doing something else then it'll work. My point is that when one is
investigating some similar anomaly one needs for there to be no chance
of anything in the way of seeing what's actually embedded in the
file/string/buffer/...

--
From: Dennis Rose on


"Dee Earley" wrote:

> On 04/06/2010 16:49, dpb wrote:
> > I don't know whether Notepad knows how to decode QP-encoded text or not;
> > I'd not trust it for this job to not do so and hence display the
> > embedded "=3D" string as simply "=" which is what is confusing Dennis
> > when he (I think/infer) looks at the string in his e-mail client and
> > tries to build the matching string pattern(s). It automagically decodes
> > (in this case _not_ helpfully) for him leaving him grasping a ephemeral
> > wisps... :)
>
> The only file "formats" notepad recognises is the various UTF* formats.
> Everything else is just text.
>
> --
> Dee Earley (dee.earley(a)icode.co.uk)
> i-Catcher Development Team
>
> iCode Systems
>
> (Replies direct to my email address will be ignored.
> Please reply to the group.)
> .
>

What's wrong with my code at the first of this post? After all it was your
idea for me to use a "proper" email parser and I would still like to use a
real parser!!