FasterCSV - Merge CSV [Ruby]

Prev: Problem with Mail gem: bounced and error_status return nil
Next: Requiring files in a module context

From: Christian Smith on 2 Jul 2010 03:35

I have 3 CSVs with the same content with say 10 rows. There is a slight
variation to 1 column in the data it contains.

csv1 - 20 lines 10 cols
csv2 - 52 lines 10 cols
csv3 - 24 lines 10 cols

How can I merge all 3 csvs into 1 csv using fastercsv so I have

csv4 96 lines 10 cols

Thanks!

Seed
--
Posted via http://www.ruby-forum.com/.

From: Brian Candler on 2 Jul 2010 07:11

Christian Smith wrote:
> I have 3 CSVs with the same content with say 10 rows. There is a slight
> variation to 1 column in the data it contains.
>
> csv1 - 20 lines 10 cols
> csv2 - 52 lines 10 cols
> csv3 - 24 lines 10 cols
>
> How can I merge all 3 csvs into 1 csv using fastercsv so I have
>
> csv4 96 lines 10 cols
>
> Thanks!
>
> Seed

Why use fastercsv?
cat csv1 csv2 csv3 >csv4
would meet your requirement.

But if you want to use fastercsv, then open each file in turn, read it
line at a time, and output the line you just read.
--
Posted via http://www.ruby-forum.com/.

From: Rob Biedenharn on 2 Jul 2010 09:54

On Jul 2, 2010, at 7:11 AM, Brian Candler wrote:
> Christian Smith wrote:
>> I have 3 CSVs with the same content with say 10 rows. There is a
>> slight
>> variation to 1 column in the data it contains.
>>
>> csv1 - 20 lines 10 cols
>> csv2 - 52 lines 10 cols
>> csv3 - 24 lines 10 cols
>>
>> How can I merge all 3 csvs into 1 csv using fastercsv so I have
>>
>> csv4 96 lines 10 cols
>>
>> Thanks!
>>
>> Seed
>
> Why use fastercsv?
> cat csv1 csv2 csv3 >csv4
> would meet your requirement.

except that you'd have headers from csv2 and csv3 (but perhaps your
line counts imply no headers?)

>
> But if you want to use fastercsv, then open each file in turn, read it
> line at a time, and output the line you just read.
> --

If the files are small-ish, you can avoid a chicken-and-egg problem of
the headers by reading all the input files (saving the headers from
the first), then writing it all out from memory.

-Rob

Rob Biedenharn
Rob(a)AgileConsultingLLC.com http://AgileConsultingLLC.com/
rab(a)GaslightSoftware.com http://GaslightSoftware.com/

From: Christian Smith on 2 Jul 2010 13:15

Rob Biedenharn wrote:
> On Jul 2, 2010, at 7:11 AM, Brian Candler wrote:
>>>
>>> csv4 96 lines 10 cols
>>>
>>> Thanks!
>>>
>>> Seed
>>
>> Why use fastercsv?
>> cat csv1 csv2 csv3 >csv4
>> would meet your requirement.
>
> except that you'd have headers from csv2 and csv3 (but perhaps your
> line counts imply no headers?)
>
>>
>> But if you want to use fastercsv, then open each file in turn, read it
>> line at a time, and output the line you just read.
>> --
>
> If the files are small-ish, you can avoid a chicken-and-egg problem of
> the headers by reading all the input files (saving the headers from
> the first), then writing it all out from memory.
>
> -Rob
>
> Rob Biedenharn
> Rob(a)AgileConsultingLLC.com http://AgileConsultingLLC.com/
> rab(a)GaslightSoftware.com http://GaslightSoftware.com/

If the files are small-ish, you can avoid a chicken-and-egg problem of
the headers by reading all the input files (saving the headers from
the first), then writing it all out from memory.

The files aren't smallish but memory isn't an issue. I would love to be
able to do this. I am able to read the 3 files into an array but it's
parsing them back into 1 csv I am having trouble with. I would assume
this would be a lot faster than a line read>write approach.

--
Posted via http://www.ruby-forum.com/.

From: Reid Thompson on 2 Jul 2010 13:31

On Sat, Jul 03, 2010 at 02:15:06AM +0900, Christian Smith wrote:
> Rob Biedenharn wrote:
> > On Jul 2, 2010, at 7:11 AM, Brian Candler wrote:
> >>>
> >>> csv4 96 lines 10 cols
> >>>
> >>> Thanks!
> >>>
> >>> Seed
> >>
> >> Why use fastercsv?
> >> cat csv1 csv2 csv3 >csv4
> >> would meet your requirement.
> >
> > except that you'd have headers from csv2 and csv3 (but perhaps your
> > line counts imply no headers?)
> >
> >>
> >> But if you want to use fastercsv, then open each file in turn, read it
> >> line at a time, and output the line you just read.
> >> --
> >
> > If the files are small-ish, you can avoid a chicken-and-egg problem of
> > the headers by reading all the input files (saving the headers from
> > the first), then writing it all out from memory.
> >
> > -Rob
> >
> > Rob Biedenharn
> > Rob(a)AgileConsultingLLC.com http://AgileConsultingLLC.com/
> > rab(a)GaslightSoftware.com http://GaslightSoftware.com/
>
> If the files are small-ish, you can avoid a chicken-and-egg problem of
> the headers by reading all the input files (saving the headers from
> the first), then writing it all out from memory.
>
> The files aren't smallish but memory isn't an issue. I would love to be
> able to do this. I am able to read the 3 files into an array but it's
> parsing them back into 1 csv I am having trouble with. I would assume
> this would be a lot faster than a line read>write approach.
>
> --
> Posted via http://www.ruby-forum.com/.

just cat and grep out the header lines

cat csv* |grep -v string-portion-unique-to-headers > full.csv

If you want to head a header row, then

cat csv* | grep string-portion-unique-to-headers |sort | uniq > full.csv
cat csv* | grep -v string-portion-unique-to-headers >> full.csv

| Next | Last
Pages: 1 2
Prev: Problem with Mail gem: bounced and error_status return nil
Next: Requiring files in a module context