From: Tom Serface on
Yes, after some time I have a parser that I like, but it has a lot of hand
coding in it. I agree that it is a matter of taste how the strings are
formed, but unfortunately, I don't have a lot of control over the input to
out program sometimes. I'm not a big fan of the \ escape thing in CSV files
since that seems odd to uninitiated users.

Not having the separator should be considered a syntax error though. That
much seems fair. We've mostly gone to XML for input and output these days
and that's solved a lot of issues, but raised a whole lot of other ones of
course.

Tom

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:ef#g5K6mKHA.5692(a)TK2MSFTNGP04.phx.gbl...
> Tom Serface wrote:
>
>> One thing most parsers don't handle correctly, that's I've seen, is
>> double double quotes for strings if you want to have a quote as part of
>> the string like:
>>
>> "This is my string "Tom" that I am using", "Next token", "Next token"
>>
>> In the above, from my perspective, the parser should read the entire
>> first string since we didn't come to a delimiter yet, but a lot of
>> tokenizers choke on this sort of thing.
>
>
> Often, it takes two to tango. A writer needs to escape tokens in order to
> reach some level of sanity. i.e, borrowing a C slash for \".
>
> "This is my string \"Tom\" that I am using"
>
> Or use some encoding method, each HTTP Escape! :)
>
> The above is simple if just delimiting by comma. So watching for an
> embedded comma is required. For example:
>
> "This is my string "Tom, Hector" that I am using"
>
> That can be easily handled if the design assumption is each field is
> double quoted. The first token:
>
> "This is my string "Tom,
>
> does not end in double quote, so you continue with a concatenation of the
> next token.
>
> Hector" that I am using"
>
> to complete the first field.
>
> But overall, I found unless its really simple, it helps if you have field
> type definitions known before hand.
>
>
> --
> HLS

From: Tom Serface on
Yes, that's become particularly important to me in recent years since I've
had to work with files from other platforms (like Mac or other Unix based
systems). I guess that why we get to keep working. So many things to
consider.

Tom

"David Wilkinson" <no-reply(a)effisols.com> wrote in message
news:#lNeGP8mKHA.5552(a)TK2MSFTNGP05.phx.gbl...
> Tom Serface wrote:
>> One thing most parsers don't handle correctly, that's I've seen, is
>> double double quotes for strings if you want to have a quote as part of
>> the string like:
>>
>> "This is my string "Tom" that I am using", "Next token", "Next token"
>>
>> In the above, from my perspective, the parser should read the entire
>> first string since we didn't come to a delimiter yet, but a lot of
>> tokenizers choke on this sort of thing.
>
> Another thing is tolerating files that have \n or \r line endings rather
> than \r\n.
>
> --
> David Wilkinson
> Visual C++ MVP

From: Tom Serface on
Well, you could have used Xerces and spent 8 days getting it to work instead
:o)

Tom

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
news:99vkl5p0cdc2ngvsqpdn0h9rhr5sn8fnal(a)4ax.com...
> I can generally write an FSM parser in an hour or so, depending on the
> syntax. I wrote an
> XML parser, recursive descent, in eight hours, start to finish. The
> constraints were
> strange, and involved "no public source code, ever", which I thought was
> foolish, but they
> were paying. I did tell them there were a number of cheats, such as it
> did not handle all
> possible encodings of XML files, a constraint they found acceptable.
> joe


From: Tom Serface on
I think Joe is saying it is meaningless these days because there is no
carriage to return any longer. I think most of us consider \n synonymous
with Enter and that implies the start of a new line. A lot of this is
carry over from the days of teletype and paper terminals and we're just
stuck with it as part of ASCII.

Tom

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:uqDAH$$mKHA.1548(a)TK2MSFTNGP04.phx.gbl...
>
> Joseph M. Newcomer wrote:
>
>> One of the rules we developed about forty years ago (1968) is that \r is
>> meaningless noise
>> treated as whitespace, and \n is a newline. This works until you import
>> a text file
>> creating on a pre-OS X Mac, where \r is the newline character.
>> joe
>
>
> Don't confuse raw vs cooked vs display/print device vs storage systems!
>
> \r\n has their basis as hardware device codes for the harder devices of
> the day; printers, teletypes, dumb terminals, etc
>
> \r <CR> is what it is - a carriage return (move it to the first column) of
> the printer head! Note the operative word - Carriage!
>
> \n <LF> is what it is - a line feed (move carriage head down one line) of
> the printer head!
>
> When the consoles came, the printer head was now your cursor. That is why
> it is paired whether there are from translations or not.
>
> Now, your Terminal and Printer could have OPTIONAL translation for an
> automatic line feed (/n) with each carriage return (/r) which means it
> APPEAR as it was a line delimiter as in in the unix wienie world. In the
> MAC word, a /n is the line delimiter. DOS of courses uses /r/n (<CR><LF>)
> pairs.
>
> But it is your terminal or printer providing the illusion with
> translations which may be default depending on the OS it connected to).
> So if you dumped a unix file or mac file to a printer, it did the proper
> translation for you. The printer or carriage or laser point did not
> change, you still need to tell it to go left, right, up or down!
>
> Geez, Meaningless?
>
> This again is a example of insane revisionist comments.
>
> --
> HLS

From: Hector Santos on
Not so Tom.

It is all the still the same! Trust me! Its what we do! This is my
business. (http://www.santronics.com) It is what we do as one of the
early pioneers in the telecommunications market. It is all still the
same. It a natural part of our framework and everyone else in the same
market. It is a fundamental understanding in this market. If you
don't follow it, you will not be compatibility with the rest of the world.

Our software covers every aspect of the communications market, from
mail readers, telecommunication programs, mail/file distribution and
hosting, dialup vs internet, name it. Your mail post here is
guaranteed to be read by some users in the world with one of our mail
reading devices. Your mail is guaranteed to be stored and forwarded
(gated) to servers using our product, and honestly, if you recently
saw a doctor and a health claim was filed on your behalf, the chances
are really good our software was somewhere in the network loop in
getting that claim collected, processed and the doctor paid!

When you hit ENTER, depending on the device and the OS, it will do the
translation for you.

If you going to display a text file on the screen or send it to a
printer, the device is doing the translation for you or not.

Storage is different because the OS may use 1 EOL (END OF LINE)
character or two. Sure, one can say that is a "WASTE" but you also
have to think of the consequences in overall global portability and
interfacing with other software and hardware devices.

Ultimately, regardless of how it is stored, a translation needs to
take place if you are going to display or print it correctly. If that
was not the case, then I am sure Tom you have seen times where a
printout was all one black line or jagged across a page.

Now, internet based mail protocols, it uses CRLF for many historical
reasons. When a MAC or UNIX mail software sends email or news it must
implement translations otherwise it is broken.

Same with FTP, a well designed server and client needs to take this
into account.

Same with the HTTP protocol - the CRLF is the standard. So that means
that if you are in the MAC/UNIX world, the interface software MUST do
translations.

For some parts of a user software, like a mail reader, most good ones
needs to be DOS/UNIX/MAC ready in reading a text file and these
software generally have sound/solid logic for reading such files.
This is an example where as Joe indicated, a "/n" may be read as a
NEWLINE (EOL is my preferred terminology) but only if there is no /r
that proceeds it.

It is not old, it still here, it fundamental in telecommunications and
no way we can't live without it. But the software and devices today
are so highly engineered to deal with all situations, it is all
transparent to users. :)

--

Tom Serface wrote:

> I think Joe is saying it is meaningless these days because there is no
> carriage to return any longer. I think most of us consider \n
> synonymous with Enter and that implies the start of a new line. A lot
> of this is carry over from the days of teletype and paper terminals and
> we're just stuck with it as part of ASCII.
>
> Tom
>
> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
> news:uqDAH$$mKHA.1548(a)TK2MSFTNGP04.phx.gbl...
>>
>> Joseph M. Newcomer wrote:
>>
>>> One of the rules we developed about forty years ago (1968) is that \r
>>> is meaningless noise
>>> treated as whitespace, and \n is a newline. This works until you
>>> import a text file
>>> creating on a pre-OS X Mac, where \r is the newline character.
>>> joe
>>
>>
>> Don't confuse raw vs cooked vs display/print device vs storage systems!
>>
>> \r\n has their basis as hardware device codes for the harder devices
>> of the day; printers, teletypes, dumb terminals, etc
>>
>> \r <CR> is what it is - a carriage return (move it to the first
>> column) of the printer head! Note the operatie word - Carriage!
>>
>> \n <LF> is what it is - a line feed (move carriage head down one line)
>> of the printer head!
>>
>> When the consoles came, the printer head was now your cursor. That is
>> why it is paired whether there are from translations or not.
>>
>> Now, your Terminal and Printer could have OPTIONAL translation for an
>> automatic line feed (/n) with each carriage return (/r) which means it
>> APPEAR as it was a line delimiter as in in the unix wienie world. In
>> the MAC word, a /n is the line delimiter. DOS of courses uses /r/n
>> (<CR><LF>) pairs.
>>
>> But it is your terminal or printer providing the illusion with
>> translations which may be default depending on the OS it connected
>> to). So if you dumped a unix file or mac file to a printer, it did the
>> proper translation for you. The printer or carriage or laser point
>> did not change, you still need to tell it to go left, right, up or down!
>>
>> Geez, Meaningless?
>>
>> This again is a example of insane revisionist comments.
>>
>> --
>> HLS
>



--
HLS