From: Ed Morton on
Michael Tosch wrote:

> Michael Tosch wrote:
>
>> oraustin(a)hotmail.com wrote:
>>
>>> Firstly - is there a sed newsgroup for me to be posting to?
>>> Quick question now :)
>>> I have a CSV file which I created by concatenation of multiple files.
>>> In some of the files the fields are also delimited with double quotes.
>>> 123,ouahfds,12341
>>> "123,"fsdfsd,ewfdw",14324
>>>
>>> I'd like to double quote all the fields. Not sure how to achieve this
>>> - please help
>>> Thanks
>>> Oliver
>>>
>>
>>
>> This is easier with awk:
>>
>> awk -F, '{for(i=1;i<=NF;++i){if($i!~/"*"/){$i="\""$i"\""}};print}' OFS=,
>>
>>
>
> To satisfy the purists and Ed's: one can omit some curly brackets:
>
> awk -F, '{for(i=1;i<=NF;++i)if($i!~/"*"/)$i="\""$i"\"";print}' OFS=,
>

Some Eds might point out that you could further shorten the script by
the common idiom of using a true condition to invoke the default print
action:

awk -F, '{for(i=1;i<=NF;++i)if($i!~/"*"/)$i="\""$i"\""}1' OFS=,

if there were any unsatisfied Eds around ;-) . Nice script.

Ed.
From: Chris F.A. Johnson on
On 2005-07-25, Michael Tosch wrote:
> oraustin(a)hotmail.com wrote:
>> Firstly - is there a sed newsgroup for me to be posting to?
>> Quick question now :)
>> I have a CSV file which I created by concatenation of multiple files.
>> In some of the files the fields are also delimited with double quotes.
>> 123,ouahfds,12341
>> "123,"fsdfsd,ewfdw",14324
>>
>> I'd like to double quote all the fields. Not sure how to achieve this
>> - please help
>> Thanks
>> Oliver
>>
>
>
> This is easier with awk:
>
> awk -F, '{for(i=1;i<=NF;++i){if($i!~/"*"/){$i="\""$i"\""}};print}' OFS=,

This fails if there are commas in any of the fields, e.g.:

$ echo 1,2,3,"a, b, c",d,e,f |
awk -F, '{for(i=1;i<=NF;++i){if($i!~/"*"/){$i="\""$i"\""}};print}' OFS=,
"1","2","3","a"," b"," c","d","e","f"

It should be:

"1","2","3","a, b, c","d","e","f"


--
Chris F.A. Johnson <http://cfaj.freeshell.org>
==================================================================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
From: William James on
oraustin(a)hotmail.com wrote:

> I have a CSV file which I created by concatenation of multiple files.
> In some of the files the fields are also delimited with double quotes.
> 123,ouahfds,12341
> "123,"fsdfsd,ewfdw",14324
>
> I'd like to double quote all the fields. Not sure how to achieve this
> - please help
> Thanks
> Oliver

The data in a field in a csv file is allowed to contain anything:
commas, double-quotes, linefeeds, zero bytes. The following Awk
program handles the commas and the double-quotes.

{ parse_csv( $0, ary )
out = ""
for (k=1; k in ary; k++)
out = out "," to_csv( ary[k] )
print substr( out, 2 )
}

# Handles fields that contain commas and double-quotes.
function parse_csv( str, array, field,i )
{ split( "", array )
str = str ","
while ( match(str,
/[ \t]*("[^"]*(""[^"]*)*"|[^,]*)[ \t]*,/) )
{ field = substr( str, 1, RLENGTH )
gsub( /^[ \t]*"?|"?[ \t]*,$/, "", field )
gsub( /""/, "\"", field )
array[++i] = field
str = substr( str, RLENGTH + 1 )
}
}

function to_csv( s )
{ gsub( /"/, "&&", s )
return "\"" s "\""
}


Save the program as "csv.awk" and run with
awk -f csv.awk infile >outfile

From: Ed Morton on
William James wrote:
> oraustin(a)hotmail.com wrote:
>
>
>>I have a CSV file which I created by concatenation of multiple files.
>>In some of the files the fields are also delimited with double quotes.
>>123,ouahfds,12341
>>"123,"fsdfsd,ewfdw",14324
>>
>>I'd like to double quote all the fields. Not sure how to achieve this
>>- please help
>>Thanks
>>Oliver
>
>
> The data in a field in a csv file is allowed to contain anything:
> commas, double-quotes, linefeeds, zero bytes.

Then does this line mean:

a,"b,",c",d

fields:

a
"b,",c"
d

or fields:

a
"b,"
c"
d

or something else?

Williams awk script converts that line to this:

"a","b,","c","d"

which is either adding a ," in the middle of the second field (if the
correct interpretation is the first form above) or deleting the " that
follows c in the input file if it's the second form above.

Chris's shell script converts that line to:

"a","b,","c"","d"

which means it's treating it like the second set of fields above.

Anyone know what the correct output should be and why?

Ed.
From: Chris F.A. Johnson on
On 2005-07-25, Ed Morton wrote:
> William James wrote:
>> oraustin(a)hotmail.com wrote:
>>
>>>I have a CSV file which I created by concatenation of multiple files.
>>>In some of the files the fields are also delimited with double quotes.
>>>123,ouahfds,12341
>>>"123,"fsdfsd,ewfdw",14324
>>>
>>>I'd like to double quote all the fields. Not sure how to achieve this
>>>- please help
>>>Thanks
>>>Oliver
>>
>>
>> The data in a field in a csv file is allowed to contain anything:
>> commas, double-quotes, linefeeds, zero bytes.
>
> Then does this line mean:
>
> a,"b,",c",d
>
> fields:
>
> a
> "b,",c"
> d
>
> or fields:
>
> a
> "b,"
> c"
> d
>
> or something else?
>
> Williams awk script converts that line to this:
>
> "a","b,","c","d"
>
> which is either adding a ," in the middle of the second field (if the
> correct interpretation is the first form above) or deleting the " that
> follows c in the input file if it's the second form above.
>
> Chris's shell script converts that line to:
>
> "a","b,","c"","d"
>
> which means it's treating it like the second set of fields above.
>
> Anyone know what the correct output should be and why?

Given 'a,"b,",c",d', the fields are:

a
b,
c"
d

Commas separate fields except when they are inside a quoted field.

Quotes define fields (which may contain commas or quotation marks)
only when they start a field.

A more problematic case would be:

a,"b,","c,d

Is that a malformed record, or is the quote before the 'c' part of
the contents of the field?

IOW, should my csv_split function be changed to:

case $csv_record in
\"*\"*) csv_right=${csv_record#*\",}
csv_value=${csv_record%%\",*}
record_vals[$csv_vnum]=${csv_value#\"}
;;
*) record_vals[$csv_vnum]=${csv_record%%,*}
csv_right=${csv_record#*,}
;;
esac

Or:

case $csv_record in
\"*\",* | \"*\" )
csv_right=${csv_record#*\",}
csv_value=${csv_record%%\",*}
record_vals[$csv_vnum]=${csv_value#\"}
;;
\"*) die 1 "malformed record" ;;
*) record_vals[$csv_vnum]=${csv_record%%,*}
csv_right=${csv_record#*,}
;;
esac

--
Chris F.A. Johnson <http://cfaj.freeshell.org>
==================================================================
Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
<http://www.torfree.net/~chris/books/cfaj/ssr.html>
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6
Next: replace single quote by escaped single quote