From: William Park on
oraustin(a)hotmail.com wrote:
> Firstly - is there a sed newsgroup for me to be posting to?
> Quick question now :)
> I have a CSV file which I created by concatenation of multiple files.
> In some of the files the fields are also delimited with double quotes.
> 123,ouahfds,12341
> "123,"fsdfsd,ewfdw",14324
>
> I'd like to double quote all the fields. Not sure how to achieve this
> - please help
> Thanks
> Oliver


CSV is messy, if fields contain comma, CR (\r), or LF (\n), since they
are FS (field separator) and RS (record separator). Only Bash shell can
handle this.

while read -C a b c; do
echo "before: {$a} {$b} {$c}"
a=${a|.csvquote}
b=${b|.csvquote}
c=${c|.csvquote}
echo "after: {$a} {$b} {$c}"
done <<+ EOF
123,ouahfds,12341
"123","fsdfsd,ewfdw",14324
EOF

which will give you

123,ouahfds,12341
""123"","""fsdfsd,ewfdw""",14324

Now, you have to decide if you want to keep the existing double quotes.
If you want to get rid of them, you can dequote and then quote, like

while read -C a b c; do
echo "before: {$a} {$b} {$c}"
a=${a|.csvdequote}
b=${b|.csvdequote}
c=${c|.csvdequote}
echo "middle: {$a} {$b} {$c}"
a=${a|.csvquote}
b=${b|.csvquote}
c=${c|.csvquote}
echo "after: {$a} {$b} {$c}"
done <<+ EOF
123,ouahfds,12341
"123","fsdfsd,ewfdw",14324
EOF

which will give you

123,ouahfds,12341
123,"fsdfsd,ewfdw",14324

Then, just check for the existence of double quote (") in each field,
and put them if it's not there already. If it's there, then skip.

--
William Park <opengeometry(a)yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
From: William James on
Chris F.A. Johnson wrote:
> Given 'a,"b,",c",d', the fields are:
>
> a
> b,
> c"
> d

Incorrect, according to
http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm,
which says:

Fields that contain double quote characters must be surounded by
double-quotes, and the embedded double-quotes must each be represented
by a pair of consecutive double quotes.

So a correct record is
a,"b,","c""",d

From: G_r_a_n_t_ on
On 25 Jul 2005 16:51:00 -0700, "William James" <w_a_x_man(a)yahoo.com> wrote:

> Chris F.A. Johnson wrote:
> > Given 'a,"b,",c",d', the fields are:
....
> So a correct record is
> a,"b,","c""",d

Looks very MSFT :) makes \ escaping look sane?

Cheers

From: Chris F.A. Johnson on
On 2005-07-25, William James wrote:
> Chris F.A. Johnson wrote:
>> Given 'a,"b,",c",d', the fields are:
>>
>> a
>> b,
>> c"
>> d
>
> Incorrect, according to
> http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm,
> which says:
>
> Fields that contain double quote characters must be surounded by
> double-quotes, and the embedded double-quotes must each be represented
> by a pair of consecutive double quotes.
>
> So a correct record is
> a,"b,","c""",d

Which is exactly what I said. All you have done is delineate those
fields differently.

There are so many specifications for a CSV file, that any attempt
to handle exceptional cases is fraught with difficulty.

In this case, the question is about turning malformed "CSV" files
into a standardized format. Without a statement of how they are to
be interpreted, all attempts are mere conjecture.

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
==================================================================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
From: oraustin on
> In this case, the question is about turning malformed "CSV" files
> into a standardized format. Without a statement of how they are to
> be interpreted, all attempts are mere conjecture.

Thanks for all your input on this - it seems to have gone off at a
little tangent (which is good I think).
The CSV I need to convert are not badly formed - that was a big type in
my first example. There are only two types of lines
123,4556,efwref,134
or
"123","123412","sdfhuk,aqfds","1432"
I'm very grateful for all the scripts but firstly I'd like just to use
one unix command - probably SED, other people have to use and modify
what I write and they and I don't have time right now to become
profficient in a number of commands.

surely this is simple in SED.
check if the first character of a line is not " and if so then
substitie all , with "," in the rest of the line. Except I don't know
how to write that :)
Thanks Oliver

On an aside maybe you can advise - we get data from many companies it
always requires some manipulation - the example here is fairly simple -
is PERL the best thing for me to focus on for file
alteration/manipulation? Can this link in with Microsoft Biz talk which
is the framework I believe we will use?

First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6
Next: replace single quote by escaped single quote