From: ccc31807 on
I'm not changing jobs, but I've been contacted about some contract
opportunities that (reportedly) are difficult but seem easy enough to
me, manipulating genome files to produce various kinds of reports,
graphs, etc. I have zero experience in this, so I'm just wondering ...

1. What are the career opportunities in bioinformatics using Perl?

2. Looking for books, I found the following:
a. Beginning Perl for Bioinformatics by James Tisdall
b. Mastering Perl for Bioinformatics by James D. Tisdall
c. Building Bioinformatics Solutions: with Perl, R and MySQL by
Conrad Bessant**
d. Perl Programming for Biologists by D. Curtis Jamison
e. Genomic Perl: From Bioinformatics Basics to Working Code by Rex A.
Dwyer

Looking at the tables of contents, reviews, and reader comments, I
believe that c. is probably the best value, but it's real hard to tell
without buying and reading the book. Anybody have any experiences
using any of these books? I'd like to conserve both time and money by
starting with the 'best' book.

Thanks, CC.
From: J�rgen Exner on
ccc31807 <cartercc(a)gmail.com> wrote:
>I'm not changing jobs, but I've been contacted about some contract
>opportunities that (reportedly) are difficult but seem easy enough to
>me, manipulating genome files to produce various kinds of reports,
>graphs, etc. I have zero experience in this, so I'm just wondering ...

The usual problem is the huge volume of data that needs processing.
Therefore typically the standard algorithms don't work any more and you
need a really strong background in data processing.
Perl is not necessariy the best choice here. Perl's powerful features
make it easy to write code that seems to do the job, but it won't scale
from the small test samples to the huge actual data set where you really
need special methods and optimizations.

A little while ago there was someone posting questions here regularly
about how to deal with genom sequences. If don't know if he is still
around, but maybe you can check the archives and contact him.

jue
From: Bradley K. Sherman on
In article <0f055c16-6bca-4c4d-94d8-60510cdc7a27(a)e34g2000vbm.googlegroups.com>,
ccc31807 <cartercc(a)gmail.com> wrote:
>
>Looking at the tables of contents, reviews, and reader comments, I
>believe that c. is probably the best value, but it's real hard to tell
>without buying and reading the book. Anybody have any experiences
>using any of these books? I'd like to conserve both time and money by
>starting with the 'best' book.
>

The 'best' book is the one that engages you. It's hard to
predict.

For $22.95 you can get access to *all* the O'Reilly books
<http://my.safaribooksonline.com/>
including several on bioinformatics. There's a free trial!

You might want to check the used book stores for a textbook like
_The Molecular Biology of the Gene_, so that you can pick up some
biology.

--bks

From: Bradley K. Sherman on
In article <c9cbe5dsmj9l5r0dcj7effhsmuotk00uqq(a)4ax.com>,
J�rgen Exner <jurgenex(a)hotmail.com> wrote:
> ...
>The usual problem is the huge volume of data that needs processing.
>Therefore typically the standard algorithms don't work any more and you
>need a really strong background in data processing.
>Perl is not necessariy the best choice here. Perl's powerful features
>make it easy to write code that seems to do the job, but it won't scale
>from the small test samples to the huge actual data set where you really
>need special methods and optimizations.
> ...

This is not really fair. Most of bioinformatics is data wrangling
and Perl is exactly the right choice for that.

See, e.g.
<http://www.foo.be/docs/tpj/issues/vol1_2/tpj0102-0001.html>

--bks

From: ccc31807 on
On Oct 26, 10:45 am, b...(a)panix.com (Bradley K. Sherman) wrote:
> >The usual problem is the huge volume of data that needs processing.
> >Therefore typically the standard algorithms don't work any more and you
> >need a really strong background in data processing.

>
> This is not really fair.  Most of bioinformatics is data wrangling
> and Perl is exactly the right choice for that.

In my day job, I deal with data files on the order of several hundred
thousand records. The scripts I write to produce reports from these
data files sometimes take a second (or several seconds) to run. The
data file I have for the bioinformatics project is much larger, but is
a lot simpler (it's a dotplot file).

Sometimes, data files can be so huge that the script just breaks.
Sometimes, the script just runs longer than you might expect.
Obviously, the longer time really isn't a problem ... there's no
difference between a script that runs in microseconds and one that
runs in minutes (say, between 60 and 120) ... as long as the script
runs to completion.

I'm sympathetic to jue's observation about the scaling problem, but
after having looked at the data, the fact that it's genomic or
biological is totally irrelevant. It's really the amount of data
rather than the kind of data that seems to be significant.

You seem to have a handle on what's going on. Is using Perl for
bioinformatics totally off the wall, or a reasonable option for data
mangling?

CC