From: Paul M Foster on
On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote:

<snip>

>
> Personally, I find working with fixed widths is best. The text file
> might be larger but I don't have worry about escaping any type of
> characters ;)

I find this impossible, since I never know the largest width of all the
fields in a file. And a simple explode() call allows pulling all the
fields into an array, based on a common delimiter.

Paul

--
Paul M. Foster
From: Mattias Thorslund on
Paul M Foster wrote:
> I process a lot of CSV files, and what I typically see is that Excel
> will enclose fields which might contain commas in quotes. This gets
> messy. So I finally wrote a C utility which parses the file and yields
> tab-delimited records without the quotes.
>
> Paul
>

And fgetcsv() didn't work for you?

http://www.php.net/fgetcsv

Cheers,

Mattias
From: Ashley Sheridan on
On Thu, 2010-03-18 at 12:12 -0400, Paul M Foster wrote:

> On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote:
>
> <snip>
>
> >
> > Personally, I find working with fixed widths is best. The text file
> > might be larger but I don't have worry about escaping any type of
> > characters ;)
>
> I find this impossible, since I never know the largest width of all the
> fields in a file. And a simple explode() call allows pulling all the
> fields into an array, based on a common delimiter.
>
> Paul
>
> --
> Paul M. Foster
>


Explode won't work in the case of a comma in a field value.

Also, newlines can exist within a field value, so a line in the file
doesn't equate to a row of data

The best way is just to start parsing at the beginning of the file and
break it into fields one by one from there.

The bit I don't like about characters other than a comma being used in a
"comma separated values" file is that you can't automatically tell what
character has been used as the delimiter. Hence being asked by
spreadsheet programs what the delimiter is if a comma doesn't give up
what it recognises as valid fields.

Thanks,
Ash
http://www.ashleysheridan.co.uk


From: Paul M Foster on
On Thu, Mar 18, 2010 at 09:16:30AM -0700, Mattias Thorslund wrote:

> Paul M Foster wrote:
>> I process a lot of CSV files, and what I typically see is that Excel
>> will enclose fields which might contain commas in quotes. This gets
>> messy. So I finally wrote a C utility which parses the file and yields
>> tab-delimited records without the quotes.
>>
>> Paul
>>
>
> And fgetcsv() didn't work for you?
>
> http://www.php.net/fgetcsv

I wrote my utility (and the infrastructure to process these files) long
before I was working with PHP. For what I do with the files, I must pipe
one operation's results to another process/command to get the final
result. This is impossible with web-based PHP. So I shell out from PHP
to do it. Like this:

// convert original file to tab-delimited
cat maillist.csv | cqf | filter.cq3or4 > jones.tab
// filter unwanted fields and reorder fields
mlt3.py nady jones.tab jones.rdb
// build basic DBF file
dbfsak -r mailers.rdb jones.dbf
// append rdb records to DBF file
dbfsak -a jones.rdb jones.dbf

Paul

--
Paul M. Foster
From: Paul M Foster on
On Thu, Mar 18, 2010 at 04:15:33PM +0000, Ashley Sheridan wrote:

> On Thu, 2010-03-18 at 12:12 -0400, Paul M Foster wrote:
>
> On Thu, Mar 18, 2010 at 08:57:00AM -0700, Tommy Pham wrote:
>
> <snip>
>
> >
> > Personally, I find working with fixed widths is best. The text file
> > might be larger but I don't have worry about escaping any type of
> > characters ;)
>
> I find this impossible, since I never know the largest width of all the
> fields in a file. And a simple explode() call allows pulling all the
> fields into an array, based on a common delimiter.
>
> Paul
>
> --
> Paul M. Foster
>
>
>
> Explode won't work in the case of a comma in a field value.

That's why I convert the files to tab-delimited first. explode() does
work in that case.

>
> Also, newlines can exist within a field value, so a line in the file doesn't
> equate to a row of data

I've never seen this in the files I receive.

>
> The best way is just to start parsing at the beginning of the file and break it
> into fields one by one from there.
>
> The bit I don't like about characters other than a comma being used in a "comma
> separated values" file is that you can't automatically tell what character has
> been used as the delimiter. Hence being asked by spreadsheet programs what the
> delimiter is if a comma doesn't give up what it recognises as valid fields.

I've honestly never seen a "CSV" or "Comma-separated Values" which used
tabs for delimiters. At that point, it's really not a *comma* separated
value file.

My application for all this is accepting mailing lists from customers
which I have to convert into DBFs for a commercial mailing list program.
Because most of my customers can barely find the on/off switch on their
computers, I never know what I'm going to get. So before I string
together the filters to process the file, I have to actually look at and
analyze the file to find out what it is. Could be a fixed-field length
file, a CSV, a tab-delimited file, or anything in between. Once I've
selected the filters, the sequence they will be put together in, and the
fields from the file I want to capture, I hit the button. After it's all
done, I now have to look at the result to ensure that the requested
fields ended up where they were supposed to.

Paul

--
Paul M. Foster