Processing 100 output files [Shell]

Prev: Useful use of cat? (was Re: This Week's Useless Use of Cat Award goes to...)
Next: Retain quotes when passing arguments to another script

From: ezhil on 24 Feb 2010 07:47

Hi,

My program creates 100 output files each having 9 columns and 150
rows. The first 5 (out of 9) columns is the same in all 100 files. I
am trying to parse all the files so that my final file will have the
first 5 columns and 2 columns (7,9) from each file. Is there an
elegant way of doing this?

I have tried simple thing like redirecting all 100 files into a single
file (using >>) and using NR counter (when it reaches 150) to select 2
columns in next 150 rows. It works fine now but what will happen if
each file has different rows?

I am also trying to parse each file when it created (on the fly) and
then delete the file (instead of creating 100 output files). I am
writing a shell script and try to use awk inside the shell script. But
it is not working.

Thanks in advance.

Kind regards,
Ezhil

From: pk on 24 Feb 2010 07:57

ezhil wrote:

> My program creates 100 output files each having 9 columns and 150
> rows. The first 5 (out of 9) columns is the same in all 100 files. I
> am trying to parse all the files so that my final file will have the
> first 5 columns and 2 columns (7,9) from each file. Is there an
> elegant way of doing this?

So your output should be

c1 c2 c3 c4 c5 c7f1 c9f1 c7f2 c9f2 ... c7f100 c9f100
^^^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^
common file1 file2 ... file100

If that is what you want, try this:

awk 'NR==FNR{out[FNR]=$1 OFS $2 OFS $3 OFS $4 OFS $5}
{out[FNR]=out[FNR] OFS $7 OFS $9}
END {for(i=1;i<=FNR;i++) print out[i]}' file1 file2 file3 ... file100

set OFS to a different value (eg, awk -v OFS=',' etc. for a comma) if you
need a different output separator.

From: Ed Morton on 24 Feb 2010 08:19

On 2/24/2010 6:57 AM, pk wrote:
> ezhil wrote:
>
>> My program creates 100 output files each having 9 columns and 150
>> rows. The first 5 (out of 9) columns is the same in all 100 files. I
>> am trying to parse all the files so that my final file will have the
>> first 5 columns and 2 columns (7,9) from each file. Is there an
>> elegant way of doing this?
>
> So your output should be
>
> c1 c2 c3 c4 c5 c7f1 c9f1 c7f2 c9f2 ... c7f100 c9f100
> ^^^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^
> common file1 file2 ... file100
>
> If that is what you want, try this:
>
> awk 'NR==FNR{out[FNR]=$1 OFS $2 OFS $3 OFS $4 OFS $5}
> {out[FNR]=out[FNR] OFS $7 OFS $9}
> END {for(i=1;i<=FNR;i++) print out[i]}' file1 file2 file3 ... file100
>
> set OFS to a different value (eg, awk -v OFS=',' etc. for a comma) if you
> need a different output separator.
>

The OP had this concern:

> It works fine now but what will happen if
> each file has different rows?

so apparently you can't assume he'll have the same number of lines in every file
and particularly you can't assume the number of lines in the last file read will
be the max number of lines.

He doesn't say what to do in that case, but we could do this:

awk 'NR==FNR{out[FNR]=$1 OFS $2 OFS $3 OFS $4 OFS $5}
{out[FNR]=out[FNR] OFS $7 OFS $9; maxFnr=(FNR > maxFnr ? FNR : maxFnr)}
END {for(i=1;i<=maxFnr;i++) print out[i]}' file1 file2 file3 ... file100

Note the change from FNR to maxFnr in the END loop.

There's probably more needs to be done so the "columns" don't get left-shifted
if there's fewer lines in some files but until the OP tells us what he wants
(e.g. populating some "NULL" value in columns if missing lines) there's not much
point guessing any further....

Ed.

From: ezhil on 24 Feb 2010 08:28

On Feb 24, 12:57 pm, pk <p...(a)pk.invalid> wrote:
> ezhil wrote:
> > My program creates 100 output files each having 9 columns and 150
> > rows. The first 5 (out of 9) columns is the same in all 100 files. I
> > am trying to parse all the files so that my final file will have the
> > first 5 columns and 2 columns (7,9) from each file. Is there an
> > elegant way of doing this?
>
> So your output should be
>
> c1 c2 c3 c4 c5 c7f1 c9f1 c7f2 c9f2 ... c7f100 c9f100
> ^^^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^
> common file1 file2 ... file100
>
> If that is what you want, try this:
>
> awk 'NR==FNR{out[FNR]=$1 OFS $2 OFS $3 OFS $4 OFS $5}
> {out[FNR]=out[FNR] OFS $7 OFS $9}
> END {for(i=1;i<=FNR;i++) print out[i]}' file1 file2 file3 ... file100
>
> set OFS to a different value (eg, awk -v OFS=',' etc. for a comma) if you
> need a different output separator.

Hi PK,

When I tried the above cmd, I got the syntax error at $9}. I have
just tried with 3 files to check the final output.

awk 'NR==FNR{ out[FNR] = $1 OFS $2 OFS $3 OFS $4 OFS $5} {out[FNR =
out[FNR] OFS $7 OFS $9} END {for(i=1;i<=FNR;i++) print out[i]}' 1.txt
2.txt 3.txt

Thanks again,
Ezhil

From: Ed Morton on 24 Feb 2010 08:50

On 2/24/2010 7:28 AM, ezhil wrote:
> On Feb 24, 12:57 pm, pk<p...(a)pk.invalid> wrote:
>> ezhil wrote:
>>> My program creates 100 output files each having 9 columns and 150
>>> rows. The first 5 (out of 9) columns is the same in all 100 files. I
>>> am trying to parse all the files so that my final file will have the
>>> first 5 columns and 2 columns (7,9) from each file. Is there an
>>> elegant way of doing this?
>>
>> So your output should be
>>
>> c1 c2 c3 c4 c5 c7f1 c9f1 c7f2 c9f2 ... c7f100 c9f100
>> ^^^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^
>> common file1 file2 ... file100
>>
>> If that is what you want, try this:
>>
>> awk 'NR==FNR{out[FNR]=$1 OFS $2 OFS $3 OFS $4 OFS $5}
>> {out[FNR]=out[FNR] OFS $7 OFS $9}
>> END {for(i=1;i<=FNR;i++) print out[i]}' file1 file2 file3 ... file100
>>
>> set OFS to a different value (eg, awk -v OFS=',' etc. for a comma) if you
>> need a different output separator.
>
> Hi PK,
>
> When I tried the above cmd, I got the syntax error at $9}. I have
> just tried with 3 files to check the final output.
>
> awk 'NR==FNR{ out[FNR] = $1 OFS $2 OFS $3 OFS $4 OFS $5} {out[FNR =
> out[FNR] OFS $7 OFS $9} END {for(i=1;i<=FNR;i++) print out[i]}' 1.txt
> 2.txt 3.txt

Instead of copy/pasting the script from your newsreader, you tried to re-type it
and missed a closing "]" at "out[FNR =...".

Ed.

| Next | Last
Pages: 1 2
Prev: Useful use of cat? (was Re: This Week's Useless Use of Cat Award goes to...)
Next: Retain quotes when passing arguments to another script