From: Philip Semanchuk on

On Mar 13, 2010, at 6:21 PM, np map wrote:

> I'd like to write an open source clustering (for computation and
> general use) and automation of configuration/deployment in Python.
> It's main purpose is to be used in academic environments.
> It would be something like running numpy/simpy code (and other custom
> python code) on a set of machines in a distributed fashion (e.g.
> splitting tasks, doing certain bits on some machines, other sub-tasks
> on other machines, etc).

Hi np map,
Are you aware of ipython? My colleagues tell me that it contains some
code for executing processes on remote machines. It sounds like code
which you could build on or borrow from.

HTH
Philip



>
> The cluster could be used in at least two ways:
> - submit code/files via a web interface, monitor the task via the web
> interface and download the results from the master node (user<>web
> interface<>master)
> - run code directly from another machine on the cluster (as if it were
> a subprocess or something like this)
>
>
> Requirements (so far):
> - support the Ubuntu Linux distribution in the initial iteration
> - be easy to extend to other OS-es and package managers
> - try to be 3.x compatible where dual compatibility is possible (2.x
> and 3.x)
> - it will support Python 2.5-2.6
> - document required changes to the 2.x only code to make it work on
> 3.x
> - make it easy to submit code directly from python scripts to the
> cluster (with the right credentials)
> - support key based authentication for job submission
> - should talk to at least one type of RDBMS to store various types of
> data
> - the cluster should be able to kill a task on nodes automatically if
> it executes for too long or requires too much memory (configurable)
> - should be modular (use automation & configuration or just
> clustering)
>
>
> Therefore, I'd like to know a few things:
>
> Is there a clustering toolkit already available for python?
>
> What would the recommended architecture be ?
>
> How should the "user" code interface with the clustering system's
> code?
>
> How should the results be stored (at the node and master level)?
>
> Should threading be supported in the tasks?
>
> How should they be returned to the Master node(s)? (polling, submitted
> by the nodes, etc)
>
> What libraries should be used for this? (e.g. fabric as a library,
> pyro, etc)
>
> Any other suggestions and pieces of advice?
>
> Should Fabric be used in this clustering system for automation? If
> not, what else? Would simply using a wrapper written in python for the
> 'ssh' app be ok?
>
> Would the following architecture be ok?
> Master: splits tasks into sub-tasks, sends them to nodes - provided
> the node's load isn't greater than a certain percentage, gets results,
> stores and provides configuration to nodes, stores results, etc
> Node: runs code, applies configuration, submits the results to the
> master, etc
>
> If this system actually gets python-level code submission inside, how
> should it work?
>
> The reason I posted this set of questions and ideas is that I'd like
> this to be as flexible and usable as possible.