From: Peter Duniho on
vanderghast wrote:
> I have heard of such characters (Swedish/Finnish come to mind, but can
> be wrong), but fortunately, for now, there are not part of the targeted
> "cultures".

I'm not sure you're getting what I mean. For example: the letter 'é'
can be represented as a combining accent followed by a plain e (in
UTF-16, 0x0301 followed by 0x0065) or as a single Unicode character (in
UTF-16, 0x00e9).

It's not a language-specific issue.

That said, I wrote a quick test (see below), and discovered that the
IgnoreNonSpace option actually does more work than the documentation
describes. In particular, it appears to actually handle the situation
you're specifically dealing with, by treating \u00e9 as the same as
\u0065 (in addition to ignoring the non-space \u0301 character, had I
included that).

Whether it's safe to rely on this undocumented behavior, I'm not
entirely sure. However, Microsoft has always held
backward-compatibility as a high priority, and even if the current
behavior was eventually deemed incorrect according to their
specification, I'd be really surprised if they changed it due to the
potential of breaking lots of existing code. It's probably more likely
they'd update the specification and documentation.

So, in other words, ignore what I wrote about the difference between
combining characters and individual accented characters. I mean, don't
ignore the specific data, but do ignore my conclusion based on the data
as it applies to your string comparison scenario. :)

Pete



using System;
using System.Globalization;

namespace TestAccentCompare
{
class Program
{
static void Main(string[] args)
{
string strAccented = "\u00e9", strPlain = "\u0065";

Console.WriteLine("'{0}' == '{1}': {2} (CompareOptions.None)",
strAccented, strPlain, (String.Compare(strAccented,
strPlain, CultureInfo.CurrentCulture,
CompareOptions.None) == 0).ToString());

Console.WriteLine("'{0}' == '{1}': {2}
(CompareOptions.IgnoreNonSpace)",
strAccented, strPlain, (String.Compare(strAccented,
strPlain, CultureInfo.CurrentCulture,
CompareOptions.IgnoreNonSpace) == 0).ToString());

Console.ReadLine();
}
}
}
From: vanderghast on
And to be like the SQL operator LIKE, unless I missed a shortcut, someone
has to define a CompareInfo object. Adding few lines to your code (even if
it is somehow trivial from the documentation, if we know where to look):


using System;
using System.Globalization;


namespace TestAccentCompare
{
class Program
{
static void Main(string[] args)
{

string strAccented = "\u00e9", strPlain = "\u0065";

Console.WriteLine("'{0}' == '{1}': {2} (CompareOptions.None)",
strAccented, strPlain, (String.Compare(strAccented,
strPlain, CultureInfo.CurrentCulture,
CompareOptions.None) == 0).ToString());

Console.WriteLine("'{0}' == '{1}': {2}
(CompareOptions.IgnoreNonSpace)",
strAccented, strPlain, (String.Compare(strAccented,
strPlain, CultureInfo.CurrentCulture,
CompareOptions.IgnoreNonSpace) == 0).ToString());

#region simulate SQL operator LIKE

string AccentedSentence = "... found near Baie d'Urfée,
under...";
string SearchingKey = "urfee";

CompareInfo ci = new CultureInfo("fr-CA").CompareInfo;

Boolean found = -1 != ci.IndexOf(
AccentedSentence,
SearchingKey,
CompareOptions.IgnoreCase | CompareOptions.IgnoreNonSpace);

Console.WriteLine("'{0}' in '{1}' : {2} (CIAI)",
SearchingKey, AccentedSentence, found);

#endregion

Console.ReadLine();
}
}
}





"Peter Duniho" <no.peted.spam(a)no.nwlink.spam.com> wrote in message
news:%23YfM5BJaKHA.5976(a)TK2MSFTNGP05.phx.gbl...
> vanderghast wrote:
>> I have heard of such characters (Swedish/Finnish come to mind, but can be
>> wrong), but fortunately, for now, there are not part of the targeted
>> "cultures".
>
> I'm not sure you're getting what I mean. For example: the letter 'é' can
> be represented as a combining accent followed by a plain e (in UTF-16,
> 0x0301 followed by 0x0065) or as a single Unicode character (in UTF-16,
> 0x00e9).
>
> It's not a language-specific issue.
>
> That said, I wrote a quick test (see below), and discovered that the
> IgnoreNonSpace option actually does more work than the documentation
> describes. In particular, it appears to actually handle the situation
> you're specifically dealing with, by treating \u00e9 as the same as \u0065
> (in addition to ignoring the non-space \u0301 character, had I included
> that).
>
> Whether it's safe to rely on this undocumented behavior, I'm not entirely
> sure. However, Microsoft has always held backward-compatibility as a high
> priority, and even if the current behavior was eventually deemed incorrect
> according to their specification, I'd be really surprised if they changed
> it due to the potential of breaking lots of existing code. It's probably
> more likely they'd update the specification and documentation.
>
> So, in other words, ignore what I wrote about the difference between
> combining characters and individual accented characters. I mean, don't
> ignore the specific data, but do ignore my conclusion based on the data as
> it applies to your string comparison scenario. :)
>
> Pete
>
>
>
> using System;
> using System.Globalization;
>
> namespace TestAccentCompare
> {
> class Program
> {
> static void Main(string[] args)
> {
> string strAccented = "\u00e9", strPlain = "\u0065";
>
> Console.WriteLine("'{0}' == '{1}': {2} (CompareOptions.None)",
> strAccented, strPlain, (String.Compare(strAccented,
> strPlain, CultureInfo.CurrentCulture,
> CompareOptions.None) == 0).ToString());
>
> Console.WriteLine("'{0}' == '{1}': {2}
> (CompareOptions.IgnoreNonSpace)",
> strAccented, strPlain, (String.Compare(strAccented,
> strPlain, CultureInfo.CurrentCulture,
> CompareOptions.IgnoreNonSpace) == 0).ToString());
>
> Console.ReadLine();
> }
> }
> }