From: Joe Marcus Clarke on

--=-VemPnshHAVr639ushSdb
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable


On Wed, 2008-01-09 at 10:53 -0500, Alexandre "Sunny" Kovalenko wrote:
> I have seen recent commit WRT string collation in devel/glib20 by
> marcus, so I have decided to check if there is an interest to fix SEGV
> in g_utf8_collate when it is given 8-bit non-UTF-8 string(s) to collate.

Any commits I have made in the area of UTF-8 are completely accidental.
I am not the UTF-8 guy. Both bland and jylefort have expressed interest
in this. Perhaps one of them will comment.

>=20
> Good (but by no means only) example of this would be using Evolution to
> open mailbox with the mix of KOI-8, CP1251 and UTF-8 message subjects
> and order them by the subject. Admittedly, I do not know whether there
> are special symbols that trigger the situation or any mix would do. vova
> at fbsd ru posted test case mailbox under the link below.
>=20
> Full discussion including my first approach to fix this problem could be
> found here
>=20
> http://bugzilla.gnome.org/show_bug.cgi?id=3D492389
>=20
> Slightly different approach is attached to this E-mail.

I'm not sure why the malloc and copy. Why not just use g_strdup()?

>=20
> Without either patch, my Evolution will core dump on start-up.=20
>=20
> First patch was rejected by gnome folks with the recommendation "Don't
> do that", which, unfortunately, is not that easy to follow ;)
>=20
> Any comments from people with the knowledge of gnome, UTF-8 and string
> collation will be greatly appreciated.

I'm hoping someone like bland chimes in here. Touching glib in such a
way makes me nervous, but bland has had experience, especially with
Cyrillic, so maybe he can offer some additional tips or insight.

Joe

--=20
PGP Key : http://www.marcuscom.com/pgp.asc

--=-VemPnshHAVr639ushSdb
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQBHhQXEb2iPiv4Uz4cRAgnPAKCjG2DNvrzU8MJNG7UGt3qoUytI0ACbB6cK
+IuJ047ghlGithkFbovyQxw=
=ztkG
-----END PGP SIGNATURE-----

--=-VemPnshHAVr639ushSdb--

From: "Alexandre \"Sunny\" Kovalenko" on

On Wed, 2008-01-09 at 12:35 -0500, Joe Marcus Clarke wrote:
> On Wed, 2008-01-09 at 10:53 -0500, Alexandre "Sunny" Kovalenko wrote:
> > I have seen recent commit WRT string collation in devel/glib20 by
> > marcus, so I have decided to check if there is an interest to fix SEGV
> > in g_utf8_collate when it is given 8-bit non-UTF-8 string(s) to collate.
>
> Any commits I have made in the area of UTF-8 are completely accidental.
> I am not the UTF-8 guy. Both bland and jylefort have expressed interest
> in this. Perhaps one of them will comment.

I hope so. Just in case, they would decide to, I have reduced the
situation to the small program below. I get

GLib-CRITICAL **: g_convert: assertion `str != NULL' failed

and no core dump from this simple program, whereas Evolution manages to
pass NULL to strcoll further down in g_utf8_collate and get SEGV for its
pains.

Conversely, if the answer still is "Evolution should not have done
that", I will happily crawl back under my rock and keep my patch
locally.

#include <stdio.h>

#include <glib.h>

int main(int argc, char *argv[])
{
GString *str1 = g_string_new("\xf3\xcf\xd2\xcf\xcb\x20\xd7\xcf\xd3\xc5
\xcd\xd8\x20\xcf\xc2\xc5\xda\xd8\xd1\xce\x2e\x2e\x2e");
GString *str2 = g_string_new("Goodbye, cruel world!");

printf("%s\n", str1->str);
if(g_utf8_collate(str1->str, str2->str) == 0)
g_print("%s and %s are equal\n", str1->str, str2->str);
else
g_print("%s and %s are NOT equal\n", str1->str, str2->str);
}

built with

gcc `pkg-config --cflags glib-2.0` -g -o testCollate testCollate.c
`pkg-config --libs glib-2.0`

on

7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #0: Sat Jan 5 22:37:29 EST 2008

using

glib-2.14.5 and libiconv-1.11_1

option "fix string collation" was set during the build of glib.

> I'm not sure why the malloc and copy. Why not just use g_strdup()?

Because I don't know much of anything about glib programming, and I was
suspecting that someone will use g_free() on the result, so g_malloc()
sounded like the plausible candidate. I really hope someone with
the necessary knowledge will come up with the right solution, but, for
now, I do what I can.

--
Alexandre "Sunny" Kovalenko

_______________________________________________
freebsd-ports(a)freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscribe(a)freebsd.org"
From: Joe Marcus Clarke on

--=-f+mwK7Ma/WDQ0zr4TPJH
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable


On Wed, 2008-01-09 at 19:40 -0500, Alexandre "Sunny" Kovalenko wrote:
> On Wed, 2008-01-09 at 12:35 -0500, Joe Marcus Clarke wrote:=20
> > On Wed, 2008-01-09 at 10:53 -0500, Alexandre "Sunny" Kovalenko wrote:
> > > I have seen recent commit WRT string collation in devel/glib20 by
> > > marcus, so I have decided to check if there is an interest to fix SEG=
V
> > > in g_utf8_collate when it is given 8-bit non-UTF-8 string(s) to colla=
te.
> >=20
> > Any commits I have made in the area of UTF-8 are completely accidental.
> > I am not the UTF-8 guy. Both bland and jylefort have expressed interes=
t
> > in this. Perhaps one of them will comment.
>=20
> I hope so. Just in case, they would decide to, I have reduced the
> situation to the small program below. I get=20
>=20
> GLib-CRITICAL **: g_convert: assertion `str !=3D NULL' failed
>=20
> and no core dump from this simple program, whereas Evolution manages to
> pass NULL to strcoll further down in g_utf8_collate and get SEGV for its
> pains.

That sounds like a no-no for Evolution to be dereferencing a NULL
pointer. Hopefully they'd fix this to prevent the problem.

>=20
> Conversely, if the answer still is "Evolution should not have done
> that", I will happily crawl back under my rock and keep my patch
> locally.

I can't imagine you're alone in this. But then again, any Cyrillic mail
that comes my way is always spam, so what do I know.

Joe

--=20
PGP Key : http://www.marcuscom.com/pgp.asc

--=-f+mwK7Ma/WDQ0zr4TPJH
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQBHhXHzb2iPiv4Uz4cRApNhAJ9m/nq1K6lz9aH0+0WnAAtJRMzBqgCbBQKf
H39qAn56D0eBX/UlYlH5NUA=
=GRO8
-----END PGP SIGNATURE-----

--=-f+mwK7Ma/WDQ0zr4TPJH--

From: "Alexandre \"Sunny\" Kovalenko" on

On Wed, 2008-01-09 at 20:16 -0500, Joe Marcus Clarke wrote:
> On Wed, 2008-01-09 at 19:40 -0500, Alexandre "Sunny" Kovalenko wrote:
> > On Wed, 2008-01-09 at 12:35 -0500, Joe Marcus Clarke wrote:
> > > On Wed, 2008-01-09 at 10:53 -0500, Alexandre "Sunny" Kovalenko wrote:
> > > > I have seen recent commit WRT string collation in devel/glib20 by
> > > > marcus, so I have decided to check if there is an interest to fix SEGV
> > > > in g_utf8_collate when it is given 8-bit non-UTF-8 string(s) to collate.
> > >
> > > Any commits I have made in the area of UTF-8 are completely accidental.
> > > I am not the UTF-8 guy. Both bland and jylefort have expressed interest
> > > in this. Perhaps one of them will comment.
> >
> > I hope so. Just in case, they would decide to, I have reduced the
> > situation to the small program below. I get
> >
> > GLib-CRITICAL **: g_convert: assertion `str != NULL' failed
> >
> > and no core dump from this simple program, whereas Evolution manages to
> > pass NULL to strcoll further down in g_utf8_collate and get SEGV for its
> > pains.
>
> That sounds like a no-no for Evolution to be dereferencing a NULL
> pointer. Hopefully they'd fix this to prevent the problem.

It's not Evolution, it is glib, specifically g_utf8_collate, which would
call strcoll(3) blindly on the return of g_utf8_normalize inside
gunicollate.c. And now, I can get core dumped out of this simple program
as well, merely by setting CHARSET=en_US.UTF-8 (I had it is ASCII in the
terminal window, which would trigger different path within
g_utf8_collate).

>
> >
> > Conversely, if the answer still is "Evolution should not have done
> > that", I will happily crawl back under my rock and keep my patch
> > locally.
>
> I can't imagine you're alone in this. But then again, any Cyrillic mail
> that comes my way is always spam, so what do I know.

More importantly, it is UTF-8 spam -- in order to trigger this, you need
KOI8-R or CP1251, and in the sorted column to boot. I suspect that
Latin1 or ShiftJIS would do the trick too.

Now, how about this: would you be amenable to this Really Harmless(tm)
patch, which merely adds error checking along the lines used in the same
function, about dozen lines up ;)

--- glib/gunicollate.c.B 2008-01-09 20:48:25.000000000 -0500
+++ glib/gunicollate.c 2008-01-09 20:49:35.000000000 -0500
@@ -166,6 +166,9 @@
str1_norm = g_utf8_normalize (str1, -1, G_NORMALIZE_ALL_COMPOSE);
str2_norm = g_utf8_normalize (str2, -1, G_NORMALIZE_ALL_COMPOSE);

+ g_return_val_if_fail (str1_norm != NULL, 0);
+ g_return_val_if_fail (str2_norm != NULL, 0);
+
if (g_get_charset (&charset))
{
result = strcoll (str1_norm, str2_norm);

I can add it to your files/extra-patch-glib_gunicollate.c, or package
it separately -- I really hate it when I start Evolution after portupgrade
to write some E-mails real quick, only to find out that I have forgotten
to patch glib... again.

--
Alexandre "Sunny" Kovalenko

_______________________________________________
freebsd-ports(a)freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscribe(a)freebsd.org"
From: Alexander Nedotsukov on
Alexandre,

The problem you exposed have its roots in Evo code. g_utf8_* stuff
defined to work on *utf-8* strings only and have undefined behaviour on
MBCS strings. It may sound stupid but crashes are allowed in this case
:-) Even we apply your latest patch the true problem solution will be
only postponed. We have to continue with Evo source. Find subject parser
part and ensure that it will be utf-8 encoded string at the end.

All the best,
Alexander.

Alexandre "Sunny" Kovalenko wrote:
> On Wed, 2008-01-09 at 20:16 -0500, Joe Marcus Clarke wrote:
>
>> On Wed, 2008-01-09 at 19:40 -0500, Alexandre "Sunny" Kovalenko wrote:
>>
>>> On Wed, 2008-01-09 at 12:35 -0500, Joe Marcus Clarke wrote:
>>>
>>>> On Wed, 2008-01-09 at 10:53 -0500, Alexandre "Sunny" Kovalenko wrote:
>>>>
>>>>> I have seen recent commit WRT string collation in devel/glib20 by
>>>>> marcus, so I have decided to check if there is an interest to fix SEGV
>>>>> in g_utf8_collate when it is given 8-bit non-UTF-8 string(s) to collate.
>>>>>
>>>> Any commits I have made in the area of UTF-8 are completely accidental.
>>>> I am not the UTF-8 guy. Both bland and jylefort have expressed interest
>>>> in this. Perhaps one of them will comment.
>>>>
>>> I hope so. Just in case, they would decide to, I have reduced the
>>> situation to the small program below. I get
>>>
>>> GLib-CRITICAL **: g_convert: assertion `str != NULL' failed
>>>
>>> and no core dump from this simple program, whereas Evolution manages to
>>> pass NULL to strcoll further down in g_utf8_collate and get SEGV for its
>>> pains.
>>>
>> That sounds like a no-no for Evolution to be dereferencing a NULL
>> pointer. Hopefully they'd fix this to prevent the problem.
>>
>
> It's not Evolution, it is glib, specifically g_utf8_collate, which would
> call strcoll(3) blindly on the return of g_utf8_normalize inside
> gunicollate.c. And now, I can get core dumped out of this simple program
> as well, merely by setting CHARSET=en_US.UTF-8 (I had it is ASCII in the
> terminal window, which would trigger different path within
> g_utf8_collate).
>
>
>>> Conversely, if the answer still is "Evolution should not have done
>>> that", I will happily crawl back under my rock and keep my patch
>>> locally.
>>>
>> I can't imagine you're alone in this. But then again, any Cyrillic mail
>> that comes my way is always spam, so what do I know.
>>
>
> More importantly, it is UTF-8 spam -- in order to trigger this, you need
> KOI8-R or CP1251, and in the sorted column to boot. I suspect that
> Latin1 or ShiftJIS would do the trick too.
>
> Now, how about this: would you be amenable to this Really Harmless(tm)
> patch, which merely adds error checking along the lines used in the same
> function, about dozen lines up ;)
>
> --- glib/gunicollate.c.B 2008-01-09 20:48:25.000000000 -0500
> +++ glib/gunicollate.c 2008-01-09 20:49:35.000000000 -0500
> @@ -166,6 +166,9 @@
> str1_norm = g_utf8_normalize (str1, -1, G_NORMALIZE_ALL_COMPOSE);
> str2_norm = g_utf8_normalize (str2, -1, G_NORMALIZE_ALL_COMPOSE);
>
> + g_return_val_if_fail (str1_norm != NULL, 0);
> + g_return_val_if_fail (str2_norm != NULL, 0);
> +
> if (g_get_charset (&charset))
> {
> result = strcoll (str1_norm, str2_norm);
>
> I can add it to your files/extra-patch-glib_gunicollate.c, or package
> it separately -- I really hate it when I start Evolution after portupgrade
> to write some E-mails real quick, only to find out that I have forgotten
> to patch glib... again.
>
>

_______________________________________________
freebsd-ports(a)freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ports
To unsubscribe, send any mail to "freebsd-ports-unsubscribe(a)freebsd.org"