From: John on
How can a field be properly read that has a newline character within
double quotes?
Right now I'm using the INFILE statement below and it loads the
observation but shifts values of the fields.

Raw File
------------------------
Field 1,Field 2,"Field 3
Spans 2 lines",Field4,Field5,,,,,,
Field 1,"Field 2
Spans 2 lines",Field 3,Field4,Field5,,,,,,

INFILE "/&fileRef" DLM=',' dsd;

Resulting Obs
Field1 Field 2 Field 3 Field 4 Field
5 etc...
------------------------------------------------------------------------------------------------
Field 1 Field 2 "Field 3 Spans 2 lines" Field4
etc....
Field 1 "Field 2 Spans 2 lines" Field 3 Field4
etc....
From: "Terjeson, Mark" on
Hi John,

A couple clarification questions:

1) Do you want to keep those line terminations
embedded in Field 3? or drop them?
(there can be two approaches, a) do all the
custom handling during the INPUT stage, or
b) write a few line utility to strip out
line terminations that would be inside a
running set of quotes and then just use
very basic dsd input operations on the
corrected file)

2) Will there always be the same number of
delimited fields for each record?
(just doublechecking)(necessary if anyone
decides to do more of an incoming stream
approach...)

3) What platform(o/s) are you on? i.e. what
are the line termination character(s) in
your file? (e.g. LF, CR, CRLF, other?)



Hope this is helpful.


Mark Terjeson
Investment Business Intelligence
Investment Management & Research
Russell Investments
253-439-2367


Russell
Global Leaders in Multi-Manager Investing




-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
John
Sent: Thursday, September 24, 2009 7:50 AM
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: SAS DSD Option on a delimited file

How can a field be properly read that has a newline character within
double quotes?
Right now I'm using the INFILE statement below and it loads the
observation but shifts values of the fields.

Raw File
------------------------
Field 1,Field 2,"Field 3
Spans 2 lines",Field4,Field5,,,,,,
Field 1,"Field 2
Spans 2 lines",Field 3,Field4,Field5,,,,,,

INFILE "/&fileRef" DLM=',' dsd;

Resulting Obs
Field1 Field 2 Field 3 Field 4 Field
5 etc...
------------------------------------------------------------------------
------------------------
Field 1 Field 2 "Field 3 Spans 2 lines" Field4
etc....
Field 1 "Field 2 Spans 2 lines" Field 3 Field4
etc....
From: Andre Wielki on
John this is a begin of solution as your contents is varying character
or numeric?
you have to rework your sas dataset concatenate or other
HTH
Andre

filename a "d:\Field11.txt";
data readcorrect;
length field1 8 field2 field3 field3b $15 field4 8;
infile a dlm=',' dsd pad truncover;
input field1 :10. (field2-field3)(:$15.)/ field3b field4-field10 ;
run;

for data like
1, 2,"Field 3
Spans 2 lines", 4, 5,,,,,,
1,"Field 2
Spans 2 lines", 3, 4, 5,,,,,,

John a �crit :
> How can a field be properly read that has a newline character within
> double quotes?
> Right now I'm using the INFILE statement below and it loads the
> observation but shifts values of the fields.
>
> Raw File
> ------------------------
> Field 1,Field 2,"Field 3
> Spans 2 lines",Field4,Field5,,,,,,
> Field 1,"Field 2
> Spans 2 lines",Field 3,Field4,Field5,,,,,,
>
> INFILE "/&fileRef" DLM=',' dsd;
>
> Resulting Obs
> Field1 Field 2 Field 3 Field 4 Field
> 5 etc...
> ------------------------------------------------------------------------------------------------
> Field 1 Field 2 "Field 3 Spans 2 lines" Field4
> etc....
> Field 1 "Field 2 Spans 2 lines" Field 3 Field4
> etc....
>
>

--
Andr� WIELKI
INED (Institut National d'Etudes D�mographiques)
Service Informatique
133 Boulevard Davout 75980 Paris Cedex 20
m�l : wielki(a)ined.fr t�l : 33 (0) 1 56 06 21 54
From: "Data _null_;" on
This is a common problem when EXCEL files are saved as CSV or other
flat file format. Your file contains '0A'x character(s) LF(linefeed)
that it are in the "middle" of the record.

I have never found an INFILE option that allows this character to be
read properly. The INFILE options MAC holds some promise, but as you
can see below the '0A'x associated with CR the PC termstring also
becomes part of the input.

2479 filename FT56F001 'testlf.csv';
2480 data _null_;
2481 infile FT56F001 mac;
2482 input;
2483 list;
2484 run;

NOTE: The infile FT56F001 is:
File Name=C:\Documents and Settings\....\My Documents\My SAS
Files\9.1\testlf.csv,
RECFM=V,LRECL=256

RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-

1 CHAR Field 1,Field 2,"Field 3 .Spans 2 lines",Field4,Field5,,,,,, 61
ZONE 4666623246666232246666232057667232666672246666324666632222222
NUMR 695C401C695C402C2695C4030A301E3020C9E532C695C44C695C45CCCCCC0

2 CHAR .Field 1,"Field 2 .Spans 2 lines",Field 3,Field4,Field5,,,,,, 62
ZONE 04666623224666623205766723266667224666623246666324666632222222
NUMR A695C401C2695C4020A301E3020C9E532C695C403C695C44C695C45CCCCCC0

3 CHAR . 1
ZONE 0
NUMR A
NOTE: 3 records were read from the infile FT56F001.

I prefer to just "bite the bytes" and clean the records and get on
with it. This data step can be used for that purpose. This program
updates the file in place, using INFILE option SHAREBUFFERS, which may
be new to you. The program replaces the LFs with SPACES but you could
used some other character if you need to "find them later".

filename FT56F001 'testlf0.csv';
data _null_;
length filename $256;
infile FT56F001 filename=filename recfm=N lrecl=256 sharebuffers eof=eof;
file FT56F001;
lbyte = '00'x;
c = 0;
do while(1);
input byte $char1.;
if byte eq '0a'x and lbyte ne '0d'x then do;
c + 1;
put '20'x;
end;
lbyte = byte;
end;
eof:
putlog 'NOTE: There were ' c 'maddening line feed characters
cleaned from: ' filename;
return;
run;

If anyone knows the MAGIC option please let us know.



On 9/24/09, John <jpovey(a)gmail.com> wrote:
> How can a field be properly read that has a newline character within
> double quotes?
> Right now I'm using the INFILE statement below and it loads the
> observation but shifts values of the fields.
>
> Raw File
> ------------------------
> Field 1,Field 2,"Field 3
> Spans 2 lines",Field4,Field5,,,,,,
> Field 1,"Field 2
> Spans 2 lines",Field 3,Field4,Field5,,,,,,
>
> INFILE "/&fileRef" DLM=',' dsd;
>
> Resulting Obs
> Field1 Field 2 Field 3 Field 4 Field
> 5 etc...
> ------------------------------------------------------------------------------------------------
> Field 1 Field 2 "Field 3 Spans 2 lines" Field4
> etc....
> Field 1 "Field 2 Spans 2 lines" Field 3 Field4
> etc....
>
From: John on
Thanks for all your suggestion. I was able to read the file and do a
little datastep cleansing.
While I didn't find an infile option, I left the infile I had and
created and array of columns from 1 ro 250 or so that excel maxes out
at. I was then able to loop through and search for fields that
started with '"' or ended with a '"' and combined them to a single
field.

A nested do loop did the trick, but it doesn't preserve the newline
character, which is fine for my purposes.

Thanks again
John