From: Uri Guttman on
>>>>> "JE" == J�rgen Exner <jurgenex(a)hotmail.com> writes:

JE> ccc31807 <cartercc(a)gmail.com> wrote:
>> I'm not changing jobs, but I've been contacted about some contract
>> opportunities that (reportedly) are difficult but seem easy enough to
>> me, manipulating genome files to produce various kinds of reports,
>> graphs, etc. I have zero experience in this, so I'm just wondering ...

JE> The usual problem is the huge volume of data that needs processing.
JE> Therefore typically the standard algorithms don't work any more and you
JE> need a really strong background in data processing.
JE> Perl is not necessariy the best choice here. Perl's powerful features
JE> make it easy to write code that seems to do the job, but it won't scale
JE> from the small test samples to the huge actual data set where you really
JE> need special methods and optimizations.

JE> A little while ago there was someone posting questions here regularly
JE> about how to deal with genom sequences. If don't know if he is still
JE> around, but maybe you can check the archives and contact him.

i will disagree on this. first off, perl is major in the biotech world
for several reasons. one it is the best at text processing and most
large genetic files are just plain text formats. secondly, there is
large package called bioperl (with its own mailing list and community)
that does tons of standard things on those files and more. finally, if
you look back a bit, there is a great article called 'how perl saved the
human genome project'. when that project was initially running it was
distributed over many labs worldwide. and they created many new
incompatible file formats for the data. the author of cgi.pm (who is
really an MD and genetic researcher) designed perl modules to convert
those formats to a common set of core formats so they could easily
exchange data. so perl has a strong tie to the biotech industry that is
not likely to be broken for a long while.

as for jobs, i don't see many leads in that industry but they are
usually looking for direct experience in it (hard to get from the
outside) and/or higher degrees in related fields because you would be
working in such an environment where you need it.

so if the OP can learn enough from books and practice to get a job in
the field, i say go for it. there many be other hurdles to jump but i
can't predict what they will be.

uri
perlhunter.com (so i know something about the perl job market)

--
Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
From: Ben Morrow on

Quoth ccc31807 <cartercc(a)gmail.com>:
>
> You seem to have a handle on what's going on. Is using Perl for
> bioinformatics totally off the wall, or a reasonable option for data
> mangling?

The people who maintain the BioPerl distributions on CPAN seem to think
it's a decent choice of language. See also
http://use.perl.org/~Alias/journal/39783 .

Ben

From: Bradley K. Sherman on
In article <56bcf5e0-d0cf-4de0-bbef-6b6fd07236ed(a)p23g2000vbl.googlegroups.com>,
ccc31807 <cartercc(a)gmail.com> wrote:
> ...
>You seem to have a handle on what's going on. Is using Perl for
>bioinformatics totally off the wall, or a reasonable option for data
>mangling?
>

I think that Perl is the primary language for bioinformatics.
I can't back that up with numbers but I have been working in
bioinformatics since 1992. Some of the younger bioinformaticians
might want to make a case for Python, but I'm skeptical.

My philosophy is to use Perl until it becomes necessary to
write something in C. It rarely becomes necessary.

Learning databases and statistics are also of great importance.

--bks

From: Jochen Lehmeier on
On Mon, 26 Oct 2009 17:00:49 +0100, ccc31807 <cartercc(a)gmail.com> wrote:

> You seem to have a handle on what's going on. Is using Perl for
> bioinformatics totally off the wall, or a reasonable option for data
> mangling?

I have no idea about bioinformatics, but Perl is easy enough that you
should be able to get a book, jot down a quick & dirty test script and
just sic it on your biggest and meanest data set.

Then you get a quick handle on how long basic stuff takes. If it works
fast enough, fine; if not, feel free to ask here. And if you find that
it's just not the right tool, then you won't have lost much.

IMO, the deal breaker will be if you have to handle data in an O(n^2)
fashion (or worse), i.e. where one would really use some very special
index structure, especially if the whole data set does not fit into RAM.

Good luck!
From: Keith Bradnam on
On Oct 26, 7:17 am, ccc31807 <carte...(a)gmail.com> wrote:
> I'm not changing jobs, but I've been contacted about some contract
> opportunities that (reportedly) are difficult but seem easy enough to
> me, manipulating genome files to produce various kinds of reports,
> graphs, etc. I have zero experience in this, so I'm just wondering ...
>
> 1. What are the career opportunities in bioinformatics using Perl?
>
> 2. Looking for books, I found the following:
>  a. Beginning Perl for Bioinformatics by James Tisdall
>  b. Mastering Perl for Bioinformatics by James D. Tisdall
>  c. Building Bioinformatics Solutions: with Perl, R and MySQL by
> Conrad Bessant**
>  d. Perl Programming for Biologists by D. Curtis Jamison
>  e. Genomic Perl: From Bioinformatics Basics to Working Code by Rex A.
> Dwyer
>
> Looking at the tables of contents, reviews, and reader comments, I
> believe that c. is probably the best value, but it's real hard to tell
> without buying and reading the book. Anybody have any experiences
> using any of these books? I'd like to conserve both time and money by
> starting with the 'best' book.
>
> Thanks, CC.

I co-teach a Unix & Perl course at UC Davis that is aimed at teaching
graduate students how to learn the basics of Perl in a biological
context. We have specifically tried to assume no prior knowledge of
programming as many people who take our course are new to this.

We have made our course materials (data & documentation) freely
available to anyone else who is interested:

http://korflab.ucdavis.edu/Unix_and_Perl/index.html

There is a corresponding Google Group for discussion of issues arising
from the course. We also make regular updates to the documentation.
Hope this might be of use to you.

Keith