From: Hector Santos on
Joseph M. Newcomer wrote:

> This code looks like something from K&R C programming first edition.


HA! you shouldn't be ashame about it, Joey! You're too easy. :)

--
HLS
From: Joseph M. Newcomer on
I can generally write an FSM parser in an hour or so, depending on the syntax. I wrote an
XML parser, recursive descent, in eight hours, start to finish. The constraints were
strange, and involved "no public source code, ever", which I thought was foolish, but they
were paying. I did tell them there were a number of cheats, such as it did not handle all
possible encodings of XML files, a constraint they found acceptable.
joe
On Fri, 22 Jan 2010 10:40:06 -0800, "Tom Serface" <tom(a)camaswood.com> wrote:

>That's one of the things that MFC really has going for it. There is a lot
>of code available and you typically get source with it so, even if there is
>some learning curve, you still get a jump start on getting your job done
>even if you just see how it's done in the sample code.
>
>Tom
>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:#A$vtG0mKHA.5464(a)TK2MSFTNGP02.phx.gbl...
>> Goran,
>>
>> Many times even with 3rd party libraries, you still have to learn how to
>> use it. Many times, the attempt to generalized does not cover all bases.
>> What if there is a bug? Many times with CSV, it might requires upfront
>> field definition or its all viewed as strings. So the "easiest" does not
>> always mean use a 3rd party solution.
>>
>> Of course the devil is in the details and it helps when the OP provides
>> info, like what language and platform. If he said .NET, as I mention the
>> MS .net collection library has a pretty darn good reader class with the
>> benefits of supporting OOPS as well which allows you to create a data
>> "class" that you pass to the line reader.
>>
>> Guess what? There is still a learning curve here to understand the
>> interface, to use it right as there would be with any library.
>>
>> So the easiest? For me, it all depends - a simple text reader and
>> strtok() parser and work in the escaping issues can be both very easy and
>> super fast! with no dependency on 3rd party QA issues.
>>
>> For me, I have never come across a library or class that could handle
>> everything and if it did, required a data definition interface of some
>> sort - like the .NET collection class offers. If he using .NET, then I
>> recommend using this class as the "easiest."
>>
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on
One of the rules we developed about forty years ago (1968) is that \r is meaningless noise
treated as whitespace, and \n is a newline. This works until you import a text file
creating on a pre-OS X Mac, where \r is the newline character.
joe

On Fri, 22 Jan 2010 19:34:10 -0500, David Wilkinson <no-reply(a)effisols.com> wrote:

>Tom Serface wrote:
>> One thing most parsers don't handle correctly, that's I've seen, is
>> double double quotes for strings if you want to have a quote as part of
>> the string like:
>>
>> "This is my string "Tom" that I am using", "Next token", "Next token"
>>
>> In the above, from my perspective, the parser should read the entire
>> first string since we didn't come to a delimiter yet, but a lot of
>> tokenizers choke on this sort of thing.
>
>Another thing is tolerating files that have \n or \r line endings rather than \r\n.
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on
On Fri, 22 Jan 2010 23:44:44 -0500, Hector Santos <sant9442(a)nospam.gmail.com> wrote:

>Joseph M. Newcomer wrote:
>
>> This code looks like something from K&R C programming first edition.
>
>
>HA! you shouldn't be ashame about it, Joey! You're too easy. :)

Huh? I'd be embarassed to publish an algorithm that was based on K&R C. It represents
the best of mediocre programming of thirty years ago. When I first read it, in 1975, I
said "This language is really badly done", and "strings with fixed size buffers are a
total disaster" and "strtok is one of the worst designs I have ever seen". This is
because for at least 7 years I had been using languages that didn't have these defects.

About 14 years ago, I started programming Win32 "Unicode-aware" and using my own "safe"
libraries for strings; in the intervening years, I moved to strsafe.h, and then to CString
as a way of life. The number of memory clobbers I have has essentially dropped to zero.

K&R C has no safe string operations (use of strcat and strcpy is a firable offense in some
programming shops), is 8-bit-character-only, and still thinks that it makes sense to
create strings by using declarations that declare 8-bit character arrays of static sizes.
There are some VERY rare situations in which this can be done, and I try to avoid them
more and more. The great thing about VS2008 is that it flags all these as warnings, and
in any real build environment, it is appropriate to compile both at /W4 and with "treat
warnings as errors" enabled. Most of the K&R C programming style represents bad style.

Even the second edition, the first example of character array usage (page 29) does not
handle boundaries properly; the "copy" function does not accept a size_t of maximum size
of the destination. Microsoft identified THOUSANDS of potential problems caused by the
failure to pass in buffer lengths, and required massive rewrites to pass buffer lengths
in. So the very, very first example a programmer sees of how to do character arrays DOES
IT COMPLETELY WRONG according to modern programming standards! It even has the comment
"assume 'to' is big enough", and that's how several rather unpleasant pieces of malware
have successfully attacked systems (starting with the infamous RTM Worm of 1988, oh, by
the way, THAT WAS 22 YEARS AGO! You would have thought that in the intervening decades
the idea of fixed-sized buffers without boundary checking would have disappeared
COMPLETELY)

The example also shows the horror of embedding an assignment statement in an if-test. How
can we expect people to learn good practice when one of the canonical introductory texts
teaches AWAY from best practice, in its very first character string example!

Then on page 33 the same error is repeated in the 'copy' function. The error is repeated
again in the getop function on page 78. This sort of code might have been acceptable in
1975 (although I found it offensive back then) but it is definitely NOT remotely
acceptable as a programming model in 2010.

strcpy is defined to cause buffer overrun on page 105. Nobody observes that this a
Really, Really, REALLY BAD IDEA!

At this point, any responsible modern programmer throws the book at the wall in disgust.
joe
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Hector Santos on

Joseph M. Newcomer wrote:

> One of the rules we developed about forty years ago (1968) is that \r is meaningless noise
> treated as whitespace, and \n is a newline. This works until you import a text file
> creating on a pre-OS X Mac, where \r is the newline character.
> joe


Don't confuse raw vs cooked vs display/print device vs storage systems!

\r\n has their basis as hardware device codes for the harder devices
of the day; printers, teletypes, dumb terminals, etc

\r <CR> is what it is - a carriage return (move it to the first
column) of the printer head! Note the operative word - Carriage!

\n <LF> is what it is - a line feed (move carriage head down one line)
of the printer head!

When the consoles came, the printer head was now your cursor. That is
why it is paired whether there are from translations or not.

Now, your Terminal and Printer could have OPTIONAL translation for an
automatic line feed (/n) with each carriage return (/r) which means it
APPEAR as it was a line delimiter as in in the unix wienie world. In
the MAC word, a /n is the line delimiter. DOS of courses uses /r/n
(<CR><LF>) pairs.

But it is your terminal or printer providing the illusion with
translations which may be default depending on the OS it connected
to). So if you dumped a unix file or mac file to a printer, it did
the proper translation for you. The printer or carriage or laser
point did not change, you still need to tell it to go left, right, up
or down!

Geez, Meaningless?

This again is a example of insane revisionist comments.

--
HLS