|
From: Ed Morton on 25 Jul 2005 13:41 Michael Tosch wrote: > Michael Tosch wrote: > >> oraustin(a)hotmail.com wrote: >> >>> Firstly - is there a sed newsgroup for me to be posting to? >>> Quick question now :) >>> I have a CSV file which I created by concatenation of multiple files. >>> In some of the files the fields are also delimited with double quotes. >>> 123,ouahfds,12341 >>> "123,"fsdfsd,ewfdw",14324 >>> >>> I'd like to double quote all the fields. Not sure how to achieve this >>> - please help >>> Thanks >>> Oliver >>> >> >> >> This is easier with awk: >> >> awk -F, '{for(i=1;i<=NF;++i){if($i!~/"*"/){$i="\""$i"\""}};print}' OFS=, >> >> > > To satisfy the purists and Ed's: one can omit some curly brackets: > > awk -F, '{for(i=1;i<=NF;++i)if($i!~/"*"/)$i="\""$i"\"";print}' OFS=, > Some Eds might point out that you could further shorten the script by the common idiom of using a true condition to invoke the default print action: awk -F, '{for(i=1;i<=NF;++i)if($i!~/"*"/)$i="\""$i"\""}1' OFS=, if there were any unsatisfied Eds around ;-) . Nice script. Ed.
From: Chris F.A. Johnson on 25 Jul 2005 14:39 On 2005-07-25, Michael Tosch wrote: > oraustin(a)hotmail.com wrote: >> Firstly - is there a sed newsgroup for me to be posting to? >> Quick question now :) >> I have a CSV file which I created by concatenation of multiple files. >> In some of the files the fields are also delimited with double quotes. >> 123,ouahfds,12341 >> "123,"fsdfsd,ewfdw",14324 >> >> I'd like to double quote all the fields. Not sure how to achieve this >> - please help >> Thanks >> Oliver >> > > > This is easier with awk: > > awk -F, '{for(i=1;i<=NF;++i){if($i!~/"*"/){$i="\""$i"\""}};print}' OFS=, This fails if there are commas in any of the fields, e.g.: $ echo 1,2,3,"a, b, c",d,e,f | awk -F, '{for(i=1;i<=NF;++i){if($i!~/"*"/){$i="\""$i"\""}};print}' OFS=, "1","2","3","a"," b"," c","d","e","f" It should be: "1","2","3","a, b, c","d","e","f" -- Chris F.A. Johnson <http://cfaj.freeshell.org> ================================================================== Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress <http://www.torfree.net/~chris/books/cfaj/ssr.html>
From: William James on 25 Jul 2005 16:17 oraustin(a)hotmail.com wrote: > I have a CSV file which I created by concatenation of multiple files. > In some of the files the fields are also delimited with double quotes. > 123,ouahfds,12341 > "123,"fsdfsd,ewfdw",14324 > > I'd like to double quote all the fields. Not sure how to achieve this > - please help > Thanks > Oliver The data in a field in a csv file is allowed to contain anything: commas, double-quotes, linefeeds, zero bytes. The following Awk program handles the commas and the double-quotes. { parse_csv( $0, ary ) out = "" for (k=1; k in ary; k++) out = out "," to_csv( ary[k] ) print substr( out, 2 ) } # Handles fields that contain commas and double-quotes. function parse_csv( str, array, field,i ) { split( "", array ) str = str "," while ( match(str, /[ \t]*("[^"]*(""[^"]*)*"|[^,]*)[ \t]*,/) ) { field = substr( str, 1, RLENGTH ) gsub( /^[ \t]*"?|"?[ \t]*,$/, "", field ) gsub( /""/, "\"", field ) array[++i] = field str = substr( str, RLENGTH + 1 ) } } function to_csv( s ) { gsub( /"/, "&&", s ) return "\"" s "\"" } Save the program as "csv.awk" and run with awk -f csv.awk infile >outfile
From: Ed Morton on 25 Jul 2005 17:31 William James wrote: > oraustin(a)hotmail.com wrote: > > >>I have a CSV file which I created by concatenation of multiple files. >>In some of the files the fields are also delimited with double quotes. >>123,ouahfds,12341 >>"123,"fsdfsd,ewfdw",14324 >> >>I'd like to double quote all the fields. Not sure how to achieve this >>- please help >>Thanks >>Oliver > > > The data in a field in a csv file is allowed to contain anything: > commas, double-quotes, linefeeds, zero bytes. Then does this line mean: a,"b,",c",d fields: a "b,",c" d or fields: a "b," c" d or something else? Williams awk script converts that line to this: "a","b,","c","d" which is either adding a ," in the middle of the second field (if the correct interpretation is the first form above) or deleting the " that follows c in the input file if it's the second form above. Chris's shell script converts that line to: "a","b,","c"","d" which means it's treating it like the second set of fields above. Anyone know what the correct output should be and why? Ed.
From: Chris F.A. Johnson on 25 Jul 2005 18:42
On 2005-07-25, Ed Morton wrote: > William James wrote: >> oraustin(a)hotmail.com wrote: >> >>>I have a CSV file which I created by concatenation of multiple files. >>>In some of the files the fields are also delimited with double quotes. >>>123,ouahfds,12341 >>>"123,"fsdfsd,ewfdw",14324 >>> >>>I'd like to double quote all the fields. Not sure how to achieve this >>>- please help >>>Thanks >>>Oliver >> >> >> The data in a field in a csv file is allowed to contain anything: >> commas, double-quotes, linefeeds, zero bytes. > > Then does this line mean: > > a,"b,",c",d > > fields: > > a > "b,",c" > d > > or fields: > > a > "b," > c" > d > > or something else? > > Williams awk script converts that line to this: > > "a","b,","c","d" > > which is either adding a ," in the middle of the second field (if the > correct interpretation is the first form above) or deleting the " that > follows c in the input file if it's the second form above. > > Chris's shell script converts that line to: > > "a","b,","c"","d" > > which means it's treating it like the second set of fields above. > > Anyone know what the correct output should be and why? Given 'a,"b,",c",d', the fields are: a b, c" d Commas separate fields except when they are inside a quoted field. Quotes define fields (which may contain commas or quotation marks) only when they start a field. A more problematic case would be: a,"b,","c,d Is that a malformed record, or is the quote before the 'c' part of the contents of the field? IOW, should my csv_split function be changed to: case $csv_record in \"*\"*) csv_right=${csv_record#*\",} csv_value=${csv_record%%\",*} record_vals[$csv_vnum]=${csv_value#\"} ;; *) record_vals[$csv_vnum]=${csv_record%%,*} csv_right=${csv_record#*,} ;; esac Or: case $csv_record in \"*\",* | \"*\" ) csv_right=${csv_record#*\",} csv_value=${csv_record%%\",*} record_vals[$csv_vnum]=${csv_value#\"} ;; \"*) die 1 "malformed record" ;; *) record_vals[$csv_vnum]=${csv_record%%,*} csv_right=${csv_record#*,} ;; esac -- Chris F.A. Johnson <http://cfaj.freeshell.org> ================================================================== Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress <http://www.torfree.net/~chris/books/cfaj/ssr.html> |