From: Steven D'Aprano on
On Fri, 29 Jan 2010 11:23:54 +0200, Johann Spies wrote:

> On Thu, Jan 28, 2010 at 07:07:04AM -0800, evilweasel wrote:
>> Hi folks,
>>
>> I am a newbie to python, and I would be grateful if someone could point
>> out the mistake in my program. Basically, I have a huge text file
>> similar to the format below:
>>
>> AAAAAGACTCGAGTGCGCGGA 0
>> AAAAAGATAAGCTAATTAAGCTACTGG 0
>> AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
>> AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
>> AAAAAGGTCGCCTGACGGCTGC 0
>
> I know this is a python list but if you really want to get the job done
> quickly this is one method without writing python code:
>
> $ cat /tmp/y
> AAAAAGACTCGAGTGCGCGGA 0
> AAAAAGATAAGCTAATTAAGCTACTGG 0
> AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
> AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
> AAAAAGGTCGCCTGACGGCTGC 0
> $ grep -v 0 /tmp/y > tmp/z
> $ cat /tmp/z
> AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
> AAAAAGGGGGCTCACAGGGGAGGGGTAT 1

That will do the wrong thing for lines like:

AAAAAGATAAGCTAATTAAGCTACTGGGTT 10


--
Steven
From: Johann Spies on
On Fri, Jan 29, 2010 at 10:04:33AM +0000, Steven D'Aprano wrote:
> > I know this is a python list but if you really want to get the job done
> > quickly this is one method without writing python code:
> >
> > $ cat /tmp/y
> > AAAAAGACTCGAGTGCGCGGA 0
> > AAAAAGATAAGCTAATTAAGCTACTGG 0
> > AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
> > AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
> > AAAAAGGTCGCCTGACGGCTGC 0
> > $ grep -v 0 /tmp/y > tmp/z
> > $ cat /tmp/z
> > AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
> > AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
>
> That will do the wrong thing for lines like:
>
> AAAAAGATAAGCTAATTAAGCTACTGGGTT 10

In that case change the grep to ' 0$' then only the lines with a
singel digit '0' at the end of the line will be excluded.

One can do the same using regulare expressions in Python and it will
probably a lot slower on large files.

Regards
Johann
--
Johann Spies Telefoon: 021-808 4599
Informasietegnologie, Universiteit van Stellenbosch

"My son, if sinners entice thee, consent thou not."
Proverbs 1:10
From: D'Arcy J.M. Cain on
On Fri, 29 Jan 2010 11:23:54 +0200
Johann Spies <jspies(a)sun.ac.za> wrote:
> I know this is a python list but if you really want to get the job
> done quickly this is one method without writing python code:
> [...]
> $ grep -v 0 /tmp/y > tmp/z

There's plenty of ways to do it without writing Python. C, C++, Perl,
Forth, Awk, BASIC, Intercal, etc. So what? Besides, your solution
doesn't work. You want "grep -vw 0 /tmp/y > tmp/z" and even then it
doesn't meet the requirements. It extracts the lines the OP wants but
doesn't reformat them. It also assumes a Unix system or at least
something with grep installed so it isn't portable.

If you want to see how the same task can be done in many different
languages see http://www.roesler-ac.de/wolfram/hello.htm.

--
D'Arcy J.M. Cain <darcy(a)druid.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
From: nn on


Johann Spies wrote:
> On Thu, Jan 28, 2010 at 07:07:04AM -0800, evilweasel wrote:
> > Hi folks,
> >
> > I am a newbie to python, and I would be grateful if someone could
> > point out the mistake in my program. Basically, I have a huge text
> > file similar to the format below:
> >
> > AAAAAGACTCGAGTGCGCGGA 0
> > AAAAAGATAAGCTAATTAAGCTACTGG 0
> > AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
> > AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
> > AAAAAGGTCGCCTGACGGCTGC 0
>
> I know this is a python list but if you really want to get the job
> done quickly this is one method without writing python code:
>
> $ cat /tmp/y
> AAAAAGACTCGAGTGCGCGGA 0
> AAAAAGATAAGCTAATTAAGCTACTGG 0
> AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
> AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
> AAAAAGGTCGCCTGACGGCTGC 0
> $ grep -v 0 /tmp/y > tmp/z
> $ cat /tmp/z
> AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
> AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
>
> Regards
> Johann
> --
> Johann Spies Telefoon: 021-808 4599
> Informasietegnologie, Universiteit van Stellenbosch
>
> "My son, if sinners entice thee, consent thou not."
> Proverbs 1:10

I would rather use awk for this:

awk 'NF==2 && $2!~/^0$/ {printf("seq%s\n%s\n",NR,$1)}' dnain.dat

but I think that is getting a bit off topic...
From: Aahz on
In article <mailman.1551.1264701475.28905.python-list(a)python.org>,
D'Arcy J.M. Cain <darcy(a)druid.net> wrote:
>
>If you have a problem and you think that regular expressions are the
>solution then now you have two problems. Regex is really overkill for
>the OP's problem and it certainly doesn't improve readability.

If you're going to use a quote, it works better if you use the exact
quote and attribute it:

'Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.' --Jamie Zawinski
--
Aahz (aahz(a)pythoncraft.com) <*> http://www.pythoncraft.com/

import antigravity