From: Volker Jordan on
Hi NG,

is there a way to select nvarchar fields, that have non latin characters ?

Do I have to use UNICODE() and search for high numbers ?

Regards
Volker Jordan
From: Erland Sommarskog on
Volker Jordan (v_jordan(a)web.de) writes:
> is there a way to select nvarchar fields, that have non latin characters ?
>
> Do I have to use UNICODE() and search for high numbers ?

That or an expression with patindex:

SELECT ...
FROM tbl
WHERE patindex('%[' + nchar(nnn) + '-' + nchar(65535) + ']%',
col COLLATE Latin1_General_BIN2)

This assumes that there is a code point at which the Latin characters
"ends". I will have to admit that I don't know whether is such a simple
limit. It also depends on what characters you really want to find. Is
LATIN CAPIAL LETTER N WITH GRAVE (used in Pinyin) a "Latin" character
by your standards?

In any case, for these kind of patterns it is a good idea to force a
binary collation so that the ranges works you would expect.

A possibility to explore is what support the .Net Framework offers in
this regard. If .Net Fx includes function for classification of characters,
you could write a CLR function for the task.


--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
From: Volker Jordan on
Am Mon, 07 Jun 2010 13:00:49 +0000 schrieb Erland Sommarskog:

> Volker Jordan (v_jordan(a)web.de) writes:
>> is there a way to select nvarchar fields, that have non latin
>> characters ?
>>
>
> SELECT ...
> FROM tbl
> WHERE patindex('%[' + nchar(nnn) + '-' + nchar(65535) + ']%',
> col COLLATE Latin1_General_BIN2)
>
> In any case, for these kind of patterns it is a good idea to force a
> binary collation so that the ranges works you would expect.
>

That looks very good, but why do I have to set a binary collate ?

Regards
Volker
From: Erland Sommarskog on
Volker Jordan (v_jordan(a)web.de) writes:
>> SELECT ...
>> FROM tbl
>> WHERE patindex('%[' + nchar(nnn) + '-' + nchar(65535) + ']%',
>> col COLLATE Latin1_General_BIN2)
>>
>> In any case, for these kind of patterns it is a good idea to force a
>> binary collation so that the ranges works you would expect.
>>
>
> That looks very good, but why do I have to set a binary collate ?

I tried to explain that in my post, but consider this example:

create table #data (a nvarchar(20) NOT NULL)
go
insert #data (a) VALUES ('RABARBER')
insert #data (a) VALUES ('EWIGKEIT')
insert #data (a) VALUES ('Wissenschaft')
go
SELECT * FROM #data WHERE a LIKE '%[a-z]%'
SELECT * FROM #data
WHERE a COLLATE Latin1_General_BIN2 LIKE '%[a-z]%'
go
DROP TABLE #data

The desire is to find words with lowercase letters in them. But
you will find that the first select returns all three words. This is
because the range a-z relates to the collation so it goes aBbCc...z.
If you force a binary collation, you use the actual ASCII range.

To avoid such surprises, you should specify a binary collation. I
should also add that for your Unicode quest, it is important that
you use a BIN2 collation. The BIN collations are quire weird.

--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

From: Volker Jordan on
Am Mon, 07 Jun 2010 23:39:22 +0200 schrieb Erland Sommarskog:


> To avoid such surprises, you should specify a binary collation. I should
> also add that for your Unicode quest, it is important that you use a
> BIN2 collation. The BIN collations are quire weird.

Many thanks !