Is this Regular Expression for UTF-8 Correct?? [MFC]

Prev: Where to handle CSliderCtl messages in
Next: Blocking mouse clicks

From: Peter Olcott on 31 May 2010 09:14

On 5/30/2010 10:11 PM, Liviu wrote:
> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote...
>> On 5/30/2010 3:08 AM, Liviu wrote:
>>> "Peter Olcott"<NoSpam(a)OCR4Screen.com> wrote...
>>>>
>>>> Here is the original:
>>>> http://www.ocr4screen.com/UTF8_ORIG.cpp
>>>
>>> Not exactly. The original, before you rushed out what you _now_
>>> present as the original (after Pete Delgado deservedly mocked you)
>>
>> By original I am referring to the last posting on the 27th of May.
>
> At the time you posted on the 27th the link went to a different .cpp
> file, which was _not_ the same as this utf8_orig.cpp you are claiming
> now as the "original". Of course, the file itself was hosted on your
> server, still is, and you can change its contents as often as you wish.
> But you can't undo what you posted and others may have already
> downloaded, so better be honest about it.

Also I am referring to this file as "original" in the sense that it is
the one before any testing was done.

>
>> Since I almost always use the&& operator (since K&R was
>> the de facto standard "C") I merely typed&& when I meant&.
>
> Are you _still_ confused? The distinction between logical&& and
> bitwise& operators hasn't changed since the beginning of C.

No and neither has the fact that people are still not infallible and
continue to make typographical errors.

>
>> As I already said aside from these three trivial errors the code did
>> indeed work correctly the very first time.
>
> The code didn't compile,
Wrong it did compile.

> then ran into infinite loops, then failed to
> convert anything other than pure ASCII, then at long last may be
> doing something remotely meaningful, however inefficiently, but still
> lacks the "validate" part which you originally stated as a goal.

It does validate.

>
> || My method can completely validate any UTF-8 sequence of
> || bytes and decode it into its corresponding code point values in
> || fewer machine clock cycles than any possible alternative

To the best of my knowledge this is true within the context of basic
design and not specific encoding which were the original qualifiers that
you failed to quote. I will have benchmarking results soon.

>
> Yet, you call that "work correctly the very first time".

You must be dense. I keep saying the very first time "AFTER" trivial
typographical errors have been corrected, and you continue to read it as
if the term "AFTER" is missing.

I am not infallible. I acknowledge that there were mistakes. The point
is there were far far fewer mistakes than are typical in this
profession. Historically debugging took about 90% of the total time and
coding and design took 10%. I have reversed this.

> Oh well,
> good luck with that notion of "correctly" in your future endeavors.
>
>> The class is not stateless. It must dynamically create its state
>> transition matrix in its constructor.
>
> Assuming you created multiple instances of that class, all objects
> would hold the exact same transition matrix and would be identical
> to each other for all functional purposes. In that sense, the class is
> stateless. I just didn't have enough imagination to fathom that you'd
> contemplate instantiating more than one static object of that class.

The class always has a static state as opposed to a dynamic state.

>
>> I always encode my classes so that they will fit on the stack.
>> By doing this [...] Also I eliminate the need for dynamic memory
>> allocation.
>
> You never know how large the (remaining) stack is, so shouldn't
> code for that. Also, when your code calls "States.resize(7, 256);"
> for example, then that's a dynamic memory allocation right there.

Yes but "I" am not doing it, my two-dimensional std::vector is doing it.
Since "I" am not doing it I can not make a dynamic memory error. The
specific way that "I" avoid ever doing dynamic memory allocation is to
always rely on a library to do this for me.

The distinction is between user code that is written to solve a specific
problem, and library code that is written to solve a very broad class of
problems.

>
>> It is not a singleton. I use two dimensional std::vectors
>
> I meant singleton in the sense of a class designed to only have one
> object of its type ever instantiated.

Array2D is not a singleton in any sense, that would be like saying
std::vector is a singleton in any sense. The term singleton comes from
"Design Patterns" and is incorrectly used within any other context.

>
> Liviu
>
>

From: Liviu on 31 May 2010 15:07

"Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote...
> On 5/30/2010 10:11 PM, Liviu wrote:
>>
>> [...] still lacks the "validate" part
>
> It does validate.

Does it do it correctly? As far as I can tell from cursory reading, the
code would pass the single byte 0xC2 as valid UTF-8 which it isn't.

> You must be dense.

There was a saying in my mother tongue that "a fool is not a true fool
unless he's smug, too". You sound like a perfect fit for that club.

Liviu

From: Jerry Coffin on 2 Jun 2010 15:15

In article <Gv-dnUxAO6S5Zm7WnZ2dnUVZ_uOdnZ2d(a)giganews.com>,
NoSpam(a)OCR4Screen.com says...

[ ... ]

> Because it is intuitively obvious that there can be no other way
> that requires fewer machine cycles.

If I had to summarize how to write slow code in a single sentence, I
could hardly improve on this. What's "intuitively obvious" is wrong
more often than not -- especially with modern processors.

--
Later,
Jerry.

From: Jerry Coffin on 2 Jun 2010 15:15

In article <O9adnQhj079gmp3RnZ2dnUVZ_uKdnZ2d(a)giganews.com>,
NoSpam(a)OCR4Screen.com says...

[ ... ]

> If I generate every possible valid CodePoint and translate to and
> from UTF-8 and get the same value that I send in back out this will
> prove with very high reliability that both functions are correct.

It will prove only that their bugs cancel each other out. If (for
example) you have a problem from misunderstanding the standard, it's
often pretty easy to write code with matching bugs.

--
Later,
Jerry.

From: Peter Olcott on 2 Jun 2010 17:28

On 6/2/2010 2:15 PM, Jerry Coffin wrote:
> In article<Gv-dnUxAO6S5Zm7WnZ2dnUVZ_uOdnZ2d(a)giganews.com>,
> NoSpam(a)OCR4Screen.com says...
>
> [ ... ]
>
>> Because it is intuitively obvious that there can be no other way
>> that requires fewer machine cycles.
>
> If I had to summarize how to write slow code in a single sentence, I
> could hardly improve on this. What's "intuitively obvious" is wrong
> more often than not -- especially with modern processors.
>

Empirical validation showed that is was in the ballpark of correct.

First | Prev | Next | Last
Pages: 18 19 20 21 22 23 24 25 26 27 28 29
Prev: Where to handle CSliderCtl messages in
Next: Blocking mouse clicks