From: robin on
<analyst41(a)hotmail.com> wrote in message news:9c95f818-d2df-493b-a40f-2977f8fe6a0f(a)z26g2000yqm.googlegroups.com...
On Jan 31, 6:35 am, "robin" <robi...(a)bigpond.com> wrote:

|> If you are still having a problem with this --
|> have you tried reading single characters with direct accress READ?

|I have tried reading one character at a time using A1 format and
|assembling the records and it leads to pretty much the same thing.

That would be right.

|I am not that familar with direct (unformatted?) read -

Then give it a try.

| I assume
|Fortran should be able to do the equivalent of a Hex dump - But I
|haven't had a chance to figure out how to do that.


From: dpb on
dpb wrote:
....

> I use the JPSoftware command interpreters <www.jpsoft.com> as great
> enhancements to, yet compatible w/ the MS CLIs.
>
> There's a free version available ...

Specifically,

<http://www.jpsoft.com/tccledes.htm>

--
From: analyst41 on
On Jan 29, 9:15 pm, "analys...(a)hotmail.com" <analys...(a)hotmail.com>
wrote:
> On Jan 29, 9:44 am, dpb <n...(a)non.net> wrote:
>
>
>
>
>
> > analys...(a)hotmail.com wrote:
> > > On Jan 28, 3:15 am, Arjen Markus <arjen.markus...(a)gmail.com> wrote:
> > >> On 28 jan, 00:51, "analys...(a)hotmail.com" <analys...(a)hotmail.com>
> > >> wrote:
>
> > >>> I posted on this topic before and this is my latest take on it:
> > >>> (1) In my case the messy files are csv extracts from a database (whose
> > >>> character encoding is Unicode - I don't know if it has anything to do
> > >>> with the problem).
> > >>> (2) I discovered that Fortran sees spurious EOR markers within
> > >>> character fields and I couldn't see a rhyme or reason why.
> > >>> (3) But since I control the input - I inserted row numbers at the
> > >>> beginning and end of each row extracted from the database and I added
> > >>> 2000000000 to the row number make sure its unlikely that this data
> > >>> would show up naturally.
> > >>> (4) I then read each record and make sure that it has at least 18
> > >>> characters (if not it is simply concatenated to cum_buffer - see
> > >>> below).
> > >>> I use the statement (adapted from Cooper Redwine's book)
> > >>> read (unit = nn, fmt = '(A)', advance = 'no', iostat = read_stat, size
> > >>> = num_chars) buffer
> > >>> you must have EOR or EOF or error on each read - otherwise the buffer
> > >>> is too small and the program has to be halted.
> > >>> I then check if the record number is showing up at the end which is
> > >>> the same as the one on the left.  If yes, you have a complete record -
> > >>> if not - you have a spurious EOR and and simply concatenate the buffer
> > >>> to another buffer called cum_buffer.
> > >>> when cum_buffer looks like
> > >>> 2000000127stuff2000000127
> > >>> You have a facsimile of a row 127 from the database.
> > >>> You might still have to struggle with separating 'stuff' into fields -
> > >>> but thats a purely programming task having nothing to do with the file
> > >>> system or operating system or character encoding schemes.
> > >>> I hope others find this useful and suggestions for improvements would
> > >>> be good.
> > >> I do not remember your previous postings, but I am curious about these
> > >> end-of-records. Can you send me an example? (I want to look at CSV
> > >> files
> > >> more closely, as I recently was confronted with some of their nastier
> > >> aspects
> > >> in the context of my Flibs project -http://flibs.sf.net).
>
> > >> Regards,
>
> > >> Arjen- Hide quoted text -
>
> > >> - Show quoted text -
>
> > > I'd love to given you actual files that show fake EORs - but it is
> > > copyright/proprietary data and I din't have the time to clean it up
> > > from that stand point.
>
> > > But here are three cases( the occurrence of these strings causes
> > > Fortran to see a fake EOR - LF95 running on windows):
>
> > > <br />
>
> > > </STRONG>
>
> > > </B>
>
> > > These seem to be terminators of HTML phrases - I don't  know why
> > > Fortran thinks these are EORs.  Excel would trip up similarly as would
> > > the language R - in fact, Fortran, R and Excel may see a different
> > > number of rows in the same csv file.
>
> > Can you post a short section of the file surrounding the offending
> > characters as seen by a hex dump program so can see what's actually in
> > the data stream?
>
> > Do these strings fail when read on their own in any length record or
> > only in the generated output file from the database?
>
> > If you can make it fail repeatedly it should be quite simple to at least
> > figure out what is the root cause and whether that is a data problem or
> > a bug in the particular compiler i/o library.
>
> > Which raises a point of what happens w/ another compiler?
>
> > --- Hide quoted text -
>
> > - Show quoted text -
>
> I can tell you that its not a Fortran issue.  Notepad, Excel and the R
> language are unable to split the file up into records so that the
> records correspond to rows in the database.
>
> I actually don;t know the Windows/DOS command to produce a HEX dump -
> if someone knows it - please post it.  I have reduced the problem
> row=set to a few rows - it should be possible to post the entire data
> here as a HEX dump.- Hide quoted text -
>
> - Show quoted text -

I pulled from the database row 1('description') and row2 (the
content).

But notepad, excel and Fortran think the file has 5 rows:

description
"<strong>Unknown Anytime, Anywhere Learning<br />
</strong> The answer is Unknown. <strong> you can start and finish in
less then 17 months.</strong> <br />
<br />
Unknown about ensuring you learn ."


0B24:0100 EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D
0A ...description..
0B24:0110 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E
"<strong>Unknown
0B24:0120 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65 Anytime,
Anywhe
0B24:0130 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F re
Learning<br /
0B24:0140 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65 >..</
strong> The
0B24:0150 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6F answer
is Unkno
0B24:0160 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75 wn.
<strong> you
0B24:0170 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66 can
start and f
-d
0B24:0180 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68 inish in
less th
0B24:0190 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73 en 17
months.</s
0B24:01A0 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C trong>
<br />..<
0B24:01B0 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61 br /
>..Unknown a
0B24:01C0 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F bout
ensuring yo
0B24:01D0 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A u
learn ."....&.
0B24:01E0 0F 32 ED 0B C9 74 0F 43-53 26 8B 1F E8 5D 00 5B .
2...t.CS&...].[
0B24:01F0 73 0B 43 43 E2 F2 2E C7-06 96 90 04 00 5D 5F 5B
s.CC.........]_[

From: dpb on
analyst41(a)hotmail.com wrote:
....

> I pulled from the database row 1('description') and row2 (the
> content).
>
> But notepad, excel and Fortran think the file has 5 rows:
>
> description
> "<strong>Unknown Anytime, Anywhere Learning<br />
> </strong> The answer is Unknown. <strong> you can start and finish in
> less then 17 months.</strong> <br />
> <br />
> Unknown about ensuring you learn ."
>
>
> EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A ...description..
> 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E "<strong>Unknown
> 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65 Anytime, Anywhe
> 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F re Learning<br /
> 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65 >..</strong> The
> 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6F answer is Unkno
> 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75 wn. <strong> you
> 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66 can start and f
> 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68 inish in less th
> 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73 en 17 months.</s
> 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C trong> <br />..<
> 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61 br />..Unknown a
> 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F bout ensuring yo
> 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A u learn ."....&.
....

Well, it does.

Search and count the 0D 0A pairs (CRLF) -- they're the record markers.

--
From: analyst41 on
On Feb 1, 7:40 pm, dpb <n...(a)non.net> wrote:
> analys...(a)hotmail.com wrote:
>
> ...
>
>
>
>
>
> > I pulled from the database row 1('description') and row2 (the
> > content).
>
> > But notepad, excel and Fortran think the file has 5 rows:
>
> > description
> > "<strong>Unknown Anytime, Anywhere Learning<br />
> > </strong> The answer is Unknown. <strong> you can start and finish in
> > less then 17 months.</strong> <br />
> > <br />
> > Unknown about ensuring you learn ."
>
> > EF BB BF 64 65 73 63 72-69 70 74 69 6F 6E 0D 0A   ...description..
> > 22 3C 73 74 72 6F 6E 67-3E 55 6E 6B 6E 6F 77 6E   "<strong>Unknown
> > 20 41 6E 79 74 69 6D 65-2C 20 41 6E 79 77 68 65    Anytime, Anywhe
> > 72 65 20 4C 65 61 72 6E-69 6E 67 3C 62 72 20 2F   re Learning<br /
> > 3E 0D 0A 3C 2F 73 74 72-6F 6E 67 3E 20 54 68 65   >..</strong> The
> > 20 61 6E 73 77 65 72 20-69 73 20 55 6E 6B 6E 6F    answer is Unkno
> > 77 6E 2E 20 3C 73 74 72-6F 6E 67 3E 20 79 6F 75   wn. <strong> you
> > 20 63 61 6E 20 73 74 61-72 74 20 61 6E 64 20 66    can start and f
> > 69 6E 69 73 68 20 69 6E-20 6C 65 73 73 20 74 68   inish in less th
> > 65 6E 20 31 37 20 6D 6F-6E 74 68 73 2E 3C 2F 73   en 17 months.</s
> > 74 72 6F 6E 67 3E 20 3C-62 72 20 2F 3E 0D 0A 3C   trong> <br />..<
> > 62 72 20 2F 3E 0D 0A 55-6E 6B 6E 6F 77 6E 20 61   br />..Unknown a
> > 62 6F 75 74 20 65 6E 73-75 72 69 6E 67 20 79 6F   bout ensuring yo
> > 75 20 6C 65 61 72 6E 20-2E 22 0D 0A 03 D8 26 8A   u learn ."....&.
>
> ...
>
> Well, it does.
>
> Search and count the 0D 0A pairs (CRLF) -- they're the record markers.
>
> --- Hide quoted text -
>
> - Show quoted text -

Thanks.

But since these markers are occurring both in the middle of a field
and also at the end of an actual row from the database - I am still
not able to separate out true EORs from the others.