anomalies in capitalization in String functions [Lisp]

Prev: tiny fix to asdf
Next: External command's output as stream

From: Jerry Boetje on 2 Mar 2010 10:27

The spec for STRING-CAPITALIZE is defined to break into words where:
"a ``word'' is defined to be a consecutive subsequence consisting of
alphanumeric characters". This gives interesting results such as
"don't" => "Don'T". Any 4th-grader would know that the right
capitalization is "Don't". In CLforJava, we use the Unicode
definitions for breaking, and we get "Don't". Any thoughts about
changing this weirdness? Please, no "but, but it's the specification"
comments. I get the spec. This gets more into a transition from the
1980's definition of characters and strings and into the Unicode
world. I'd rather talk about the world of today and what we can do
about it.

From: Tamas K Papp on 2 Mar 2010 10:33

On Tue, 02 Mar 2010 07:27:16 -0800, Jerry Boetje wrote:

> The spec for STRING-CAPITALIZE is defined to break into words where: "a
> ``word'' is defined to be a consecutive subsequence consisting of
> alphanumeric characters". This gives interesting results such as "don't"
> => "Don'T". Any 4th-grader would know that the right capitalization is
> "Don't". In CLforJava, we use the Unicode definitions for breaking, and
> we get "Don't". Any thoughts about changing this weirdness? Please, no
> "but, but it's the specification" comments. I get the spec. This gets
> more into a transition from the 1980's definition of characters and
> strings and into the Unicode world. I'd rather talk about the world of
> today and what we can do about it.

The obvious solution seems to be writing and using your own function
to capitalize strings (which would be the usual approach to cases
where the standard is clear, but you don't like it).

Tamas

From: Zach Beane on 2 Mar 2010 10:33

Jerry Boetje <jerryboetje(a)mac.com> writes:

> The spec for STRING-CAPITALIZE is defined to break into words where:
> "a ``word'' is defined to be a consecutive subsequence consisting of
> alphanumeric characters". This gives interesting results such as
> "don't" => "Don'T". Any 4th-grader would know that the right
> capitalization is "Don't". In CLforJava, we use the Unicode
> definitions for breaking, and we get "Don't". Any thoughts about
> changing this weirdness? Please, no "but, but it's the specification"
> comments. I get the spec. This gets more into a transition from the
> 1980's definition of characters and strings and into the Unicode
> world. I'd rather talk about the world of today and what we can do
> about it.

Follow the spec for STRING-CAPITALIZE and provide your own function that
does what you prefer, instead of what's mandated by the standard.

Zach

From: Pascal Costanza on 2 Mar 2010 11:05

On 02/03/2010 16:27, Jerry Boetje wrote:
> The spec for STRING-CAPITALIZE is defined to break into words where:
> "a ``word'' is defined to be a consecutive subsequence consisting of
> alphanumeric characters". This gives interesting results such as
> "don't" => "Don'T". Any 4th-grader would know that the right
> capitalization is "Don't". In CLforJava, we use the Unicode
> definitions for breaking, and we get "Don't". Any thoughts about
> changing this weirdness? Please, no "but, but it's the specification"
> comments. I get the spec. This gets more into a transition from the
> 1980's definition of characters and strings and into the Unicode
> world. I'd rather talk about the world of today and what we can do
> about it.

Even in the world of today, not everybody speaks only English.

Pascal

--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/

From: Tim Bradshaw on 2 Mar 2010 11:18

On 2010-03-02 15:27:16 +0000, Jerry Boetje said:

> Any thoughts about
> changing this weirdness?

If you're happy to search the entire corpus of Lisp code for things
that changing this might break, make and test modifications to be sure
that things will not break, and cover any possible problems if things
broke despite your fixes, then yes, I'd be happy to see it changed.

| Next | Last
Pages: 1 2 3 4 5
Prev: tiny fix to asdf
Next: External command's output as stream