From: sln on
On Wed, 11 Aug 2010 10:51:40 +0800, "ela" <ela(a)yantai.org> wrote:

>After testing different approaches, Jens Thoms Toerring's works better and
>therefore I modified the codes accordingly. Now I just don't know why the
>array content cannot be retrieved but only a number "1" is returned. Can
>anyone tell me the reason? In fact I can simply pass $line instead of @cells
>but what I finally want to achieve is to only print out several cells
>instead of all.
>
>
>my %ahash;
>while ( my $line = <$afp> ) {
> my @cells = split /\t/, $line;
> $ahash{ $cells[ 5 ] } = $cells[ 1 ];
>}
>close $afp;
>
>open my $ifp, '<', $infofile or die "Can't open $infofile for reading\n";
>
>my %ihash;
>while ( my $line = <$ifp> ) {
> my @cells = split /\t/, $line;
> $ihash{ $cells[ 1 ] } = @cells;
>}
>close $ifp;
>
>while ( my $line = <$fp> ) {
> if ( $line eq "\n" ) {
> print $ofp "\n";
> next;
> }
> chomp $line;
>
> if ( $format eq "" ) {
> @cells = split /:/, $line;
> $tag = $cells[ 0 ];
> } else {
> @cells = split /\t/, $line;
> $tag = $cells[ $acci ];
> }
>
> $gid = $ahash{ $tag } if exists $ahash{ $tag };
> @gene_info = $ihash{$gid};
> print $ofp "$line\t(a)gene_info";
>}
>
>close $fp;
>

I'm puzzled why you should tackle this in Perl when
I'm guessing this would be a hard SLQ statement for you
to do.

Realizing its a simple sql from 3 tables on a key field
then trying to do it in Perl, etc ..

Your looking for speed, but you can't normalize the task.
You make the big mistake of gathering everything into memory
thereby hogging memory with useless information, then
compounding that error with one time use. Although, I'm not
sure about the one time use, unless its interactive, but
I didn't look to hard for that in the code.

It doesen't appear you have multiple lines per key
gene data, however, that data could be massive.
There is no need to keep all the data in memory.
You could in effect, keep a key => file position
hash via tell(), then retrieve the data later with a
seek.

Applying a pseudo analysis on your content-less code,
it is storing data beyond its use. Its like formal
symbolic logic. Write the equation, then solve it,
its called reverse-engineering.

This is the bottom line equation of your work:

------------------
@Gene-Info Array = @{ I-Hash{ A-Hash{ fp0 } } } if A-Hash{ fp0 } exists
------------------

From inner to outer, when constructing the A-Hash, there is no
need to add a key to the I-Hash if it does not exist in the A-Hash.
If you wrote the sql for this you would have picked this up.
And since the I-Hash contains all the mega gene data, you just
ruptured your memory's brain.

Start over, write pseudo-code, re-check your work via logic analysis
from the inner to outer context. This will save you countless hours
of headache.

-sln