From: Felix on
On Jul 4, 11:25 am, David Cournapeau <courn...(a)gmail.com> wrote:
> On Mon, Jul 5, 2010 at 12:00 AM, D'Arcy J.M. Cain <da...(a)druid.net> wrote:
> > I wish it was orders of magnitude faster for web development.  I'm just
> > saying that places where we need compiled language speed that Python
> > already has that in C.
>
> Well, I wish I did not have to use C, then :) For example, as a
> contributor to numpy, it bothers me at a fundamental level that so
> much of numpy is in C.

This is something that I have been thinking about recently. Python has
won quite a following in the scientific computing area, probably
especially because of great libraries such as numpy, scipy, pytables
etc. But it also seems python itself is falling further and further
behind in terms of performance and parallel processing abilities. Of
course all that can be fixed by writing C modules (e.g. with the help
of cython), but that weakens the case for using python in the first
place.
For an outsider it does not look like a solution to the GIL mess or a
true breakthrough for performance are around the corner (even though
there seem to be many different attempts at working around these
problems or helping with parts). Am I wrong? If not, what is the
perspective? Do we need to move on to the next language and loose all
the great libraries that have been built around python?

Felix
From: Stefan Behnel on
Felix, 09.07.2010 05:39:
> On Jul 4, 11:25 am, David Cournapeau wrote:
>> Well, I wish I did not have to use C, then :) For example, as a
>> contributor to numpy, it bothers me at a fundamental level that so
>> much of numpy is in C.
>
> This is something that I have been thinking about recently. Python has
> won quite a following in the scientific computing area, probably
> especially because of great libraries such as numpy, scipy, pytables
> etc. But it also seems python itself is falling further and further
> behind in terms of performance and parallel processing abilities.

Well, at least its "parallel processing abilities" are quite good actually.
If you have really large computations, they usually run on more than one
computer (not just more than one processor). So you can't really get around
using something like MPI, in which case an additional threading layer is
basically worthless, regardless of the language you use. For computations,
threading keeps being highly overrated.

WRT a single machine, you should note that GPGPUs are a lot faster these
days than even multi-core CPUs. And Python has pretty good support for
GPUs, too.


> Of course all that can be fixed by writing C modules (e.g. with the help
> of cython), but that weakens the case for using python in the first
> place.

Not at all. Look at Sage, for example. It's attractive because it provides
tons of functionality, all nicely glued together through a simple language
that even non-programmers can use efficiently and effectively. And its use
of Cython makes all of this easily extensible without crossing the gap of a
language border.

Stefan

From: sturlamolden on
On 9 Jul, 05:39, Felix <schle...(a)cshl.edu> wrote:

> This is something that I have been thinking about recently. Python has
> won quite a following in the scientific computing area, probably
> especially because of great libraries such as numpy, scipy, pytables
> etc.

Python is much more friendly to memory than Matlab, and a much nicer
language to work in. It can also be used to program more than just
linear algebra. If you have to read data from a socket, Matlab is not
so fun anymore.

> But it also seems python itself is falling further and further
> behind in terms of performance and parallel processing abilities.

First, fine-grained parallelism really belongs in libraries like MKL,
GotoBLAS and FFTW. Python can manage the high-level routines just like
Matlab. You can call a NumPy routine like np.dot, and the BLAS library
(e.g. Intel MKL) will do the multi-threading for you. We almost always
use Python to orchestrate C and Fortran. We can use OpenMP in C or
Fortran, or we can just release the GIL and use Python threads.

Second, the GIL it does not matter for MPI, as it works with
processes. Nor does it matter for os.fork or multiprocessing. On
clusters, which are as common in high-performance computing as SMP
systems, one has to use processes (usually MPI) rather than threads,
as there is no shared memory between processors. On SMP systems, MPI
can use shared-memory and be just as efficient as threads (OpenMP).
(MPI is usually faster due to cache problems with threads.)

Consider that Matlab does not even have threads (or did not last time
I checked). Yet it takes advantage of multi-core CPUs for numerical
computing. It's not the high-level interface that matters, it's the
low-level libraries. And Python is just that: a high-level "glue"
language.

> For an outsider it does not look like a solution to the GIL mess or a
> true breakthrough for performance are around the corner (even though
> there seem to be many different attempts at working around these
> problems or helping with parts). Am I wrong?

Yes you are.

We don't do CPU intensive work in "pure Python". We use Python to
control C and Fortran libraries. That gives us the opportunity to
multi-thread in C, release the GIL and multi-thread in Python, or
both.






From: sturlamolden on
On 9 Jul, 06:44, Stefan Behnel <stefan...(a)behnel.de> wrote:

> WRT a single machine, you should note that GPGPUs are a lot faster these
> days than even multi-core CPUs. And Python has pretty good support for
> GPUs, too.

With OpenCL, Python is better than C for heavy computing. The Python
or C/C++ program has to supply OpenCL code (structured text) to the
OpenCL driver, which does the real work on GPU or CPU. Python is much
better than C or C++ at processing text. There will soon be OpenCL
drivers for most processors on the market.

But OpenCL drivers will not be pre-installed on Windows, as Microsoft
has a competing COM-based technology (DirectX Compute, with an
atrocious API and syntax).

From: Felix on
On Jul 9, 1:16 am, sturlamolden <sturlamol...(a)yahoo.no> wrote:
> On 9 Jul, 05:39, Felix <schle...(a)cshl.edu> wrote:
> > For an outsider it does not look like a solution to the GIL mess or a
> > true breakthrough for performance are around the corner (even though
> > there seem to be many different attempts at working around these
> > problems or helping with parts). Am I wrong?
>
> Yes you are.
>
> We don't do CPU intensive work in "pure Python". We use Python to
> control C and Fortran libraries. That gives us the opportunity to
> multi-thread in C, release the GIL and multi-thread in Python, or
> both.

Yes, this setup works very well and is (as I said) probably the reason
python is so widely used in scientific computing these days.
However I find that I can almost never do everything with vector
operations, but have to iterate over data structures at some point.
And here the combination of CPython slowness and the GIL means either
bad performance or having to write this in C (with which cython helps
fortunately). If it were possible to write simple, parallel,
reasonably fast loops in (some subset of) python directly that would
certainly be a great advantage. Given the performance of other JITs it
sounds like it should be possible, but maybe python is too complex to
make this realistic.

Felix

PS: No need to convince me that MATLAB is not the solution.