|
From: William Park on 25 Jul 2005 19:21 oraustin(a)hotmail.com wrote: > Firstly - is there a sed newsgroup for me to be posting to? > Quick question now :) > I have a CSV file which I created by concatenation of multiple files. > In some of the files the fields are also delimited with double quotes. > 123,ouahfds,12341 > "123,"fsdfsd,ewfdw",14324 > > I'd like to double quote all the fields. Not sure how to achieve this > - please help > Thanks > Oliver CSV is messy, if fields contain comma, CR (\r), or LF (\n), since they are FS (field separator) and RS (record separator). Only Bash shell can handle this. while read -C a b c; do echo "before: {$a} {$b} {$c}" a=${a|.csvquote} b=${b|.csvquote} c=${c|.csvquote} echo "after: {$a} {$b} {$c}" done <<+ EOF 123,ouahfds,12341 "123","fsdfsd,ewfdw",14324 EOF which will give you 123,ouahfds,12341 ""123"","""fsdfsd,ewfdw""",14324 Now, you have to decide if you want to keep the existing double quotes. If you want to get rid of them, you can dequote and then quote, like while read -C a b c; do echo "before: {$a} {$b} {$c}" a=${a|.csvdequote} b=${b|.csvdequote} c=${c|.csvdequote} echo "middle: {$a} {$b} {$c}" a=${a|.csvquote} b=${b|.csvquote} c=${c|.csvquote} echo "after: {$a} {$b} {$c}" done <<+ EOF 123,ouahfds,12341 "123","fsdfsd,ewfdw",14324 EOF which will give you 123,ouahfds,12341 123,"fsdfsd,ewfdw",14324 Then, just check for the existence of double quote (") in each field, and put them if it's not there already. If it's there, then skip. -- William Park <opengeometry(a)yahoo.ca>, Toronto, Canada ThinFlash: Linux thin-client on USB key (flash) drive http://home.eol.ca/~parkw/thinflash.html BashDiff: Super Bash shell http://freshmeat.net/projects/bashdiff/
From: William James on 25 Jul 2005 19:51 Chris F.A. Johnson wrote: > Given 'a,"b,",c",d', the fields are: > > a > b, > c" > d Incorrect, according to http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm, which says: Fields that contain double quote characters must be surounded by double-quotes, and the embedded double-quotes must each be represented by a pair of consecutive double quotes. So a correct record is a,"b,","c""",d
From: G_r_a_n_t_ on 25 Jul 2005 22:40 On 25 Jul 2005 16:51:00 -0700, "William James" <w_a_x_man(a)yahoo.com> wrote: > Chris F.A. Johnson wrote: > > Given 'a,"b,",c",d', the fields are: .... > So a correct record is > a,"b,","c""",d Looks very MSFT :) makes \ escaping look sane? Cheers
From: Chris F.A. Johnson on 26 Jul 2005 02:33 On 2005-07-25, William James wrote: > Chris F.A. Johnson wrote: >> Given 'a,"b,",c",d', the fields are: >> >> a >> b, >> c" >> d > > Incorrect, according to > http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm, > which says: > > Fields that contain double quote characters must be surounded by > double-quotes, and the embedded double-quotes must each be represented > by a pair of consecutive double quotes. > > So a correct record is > a,"b,","c""",d Which is exactly what I said. All you have done is delineate those fields differently. There are so many specifications for a CSV file, that any attempt to handle exceptional cases is fraught with difficulty. In this case, the question is about turning malformed "CSV" files into a standardized format. Without a statement of how they are to be interpreted, all attempts are mere conjecture. -- Chris F.A. Johnson <http://cfaj.freeshell.org> ================================================================== Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress <http://www.torfree.net/~chris/books/cfaj/ssr.html>
From: oraustin on 26 Jul 2005 04:24
> In this case, the question is about turning malformed "CSV" files > into a standardized format. Without a statement of how they are to > be interpreted, all attempts are mere conjecture. Thanks for all your input on this - it seems to have gone off at a little tangent (which is good I think). The CSV I need to convert are not badly formed - that was a big type in my first example. There are only two types of lines 123,4556,efwref,134 or "123","123412","sdfhuk,aqfds","1432" I'm very grateful for all the scripts but firstly I'd like just to use one unix command - probably SED, other people have to use and modify what I write and they and I don't have time right now to become profficient in a number of commands. surely this is simple in SED. check if the first character of a line is not " and if so then substitie all , with "," in the rest of the line. Except I don't know how to write that :) Thanks Oliver On an aside maybe you can advise - we get data from many companies it always requires some manipulation - the example here is fairly simple - is PERL the best thing for me to focus on for file alteration/manipulation? Can this link in with Microsoft Biz talk which is the framework I believe we will use? |