Data Flow languages and programming

Prev: simple: P versus NP
Next: 19990823: General announcements. Goedel's question: if a theorem has a proof of length at most n, can we find it in time O(n^2)? Another question on what can be computed in limited time and space.

From: Richard Harter on 26 Oct 2009 18:22

SHOULD IT BE TURTLES ALL THE WAY UP?

In the famous anecdote, the little old lady replies to the noted
philosopher, "It's turtles all the way down." When it comes to
writing software many writers on software design and many
programming language creators seem to believe that it is turtles
all the way up.

What do I mean by "turtles all the way up"? By this I mean the
thesis that the techniques and programming language concepts that

are used in the small can be extended indefinitely to programming

in the large. In other words if we use language X for 100 line
programs and 1000 line programs we can also use it for 1000000
line programs. We may have to add some extensions and new
features along the way, but we can increase the size of the
programs we write indefinitely using the same ideas.
<p>
The antithesis is that it isn't turtles all the way up, or at
least it shouldn't be turtles all the way up. That is, the kinds

of languages and programming technologies that we use in
large scale programming should be quite different from the kind
of languages used for programming in the small.
<p>
There are many propositions as to what those large scale
technologies should be, and many such technologies in use. Here
I am going to look at data flow languages and data flow
structuring of programs.

WHAT ARE DATA FLOW LANGUAGES?

There are two wikipedia articles that give useful answers:
http://en.wikipedia.org/wiki/Dataflow_programming
http://en.wikipedia.org/wiki/Flow-based_programming

The distinction wikipedia makes between data flow programming and

flow based programming is obscure. The following definition is
an edited version of definitions used in the two articles.

Data flow languages structure applications as networks of
"black box" elements that exchange data across predefined
connections by message passing. Elements execute when they
get messages; they send messages asynchronously. Data flow
based applications are inherently parallel.

There are a wide variety of data flow languages, varying from
spread sheets, to Labview, to Erlang. Many are graphical;
programming is done by altering flow diagrams. One thing they
all have in common is that they have a run time system.

Traditional imperative programs are composed of routines that
call each other; i.e., when a call is made the caller constructs
a data packet (calling sequence) and transfers control and the
data packer to the called routine. When the called routine is
done it constructs a data packet to pass back to the caller and
transfers control back to the caller.

In data flow programs the "routines" do not call each other.
Instead they are activated by the run time system when there is
input for them; when they create outputs the run time system
takes care of moving the output to the destination input areas.
When the "routines" are done they transfer control back to the
run time system.

One difference between traditional programs and data flow
programs is that traditional programs use LIFO semantics whereas
data flow programs use FIFO semantics. That is, a traditional
program puts data on a stack and gets data back on the same
stack. In data flow programs each element gets data from a queue

and puts data to other queues.

Another difference is that the connectivity of traditional
programs is deeply embedded in the code. To pass data from A to
B, A calls B. That is, the caller has to specify where the data
goes. In data flow programs the connectivity can be separate
from the code. A doesn't pass data directly to B; instead it
passes data to the run time system which in turn passes the data
to B. The caller does not have to specify where the data goes.

As a result data flow programs can use different languages for
the internal implementation of the computational elements and for

the connectivity. In fact, it is common for data flow languages
to be graphical.

ADVANTAGES AND DISADVANTAGES

What are the advantages and disadvantages of data flow
programming?

Some significant advantages:

* Concurrency and parallelism are natural. Code can be
distributed between cores and even across networks. Many of the
problems associated with threads disappear.

* Data flow networks are natural for representing process. This
is particularly true for transaction processing.

* Message passing gets rid of the problems associated with shared
memory and locks.

* Data flow programs are more extensible than traditional
programs. Elements can be readily composed into composite
elements from the bottom up rather than top down.

Some significant disadvantages:

* The mindset of data flow programming is unfamiliar to most
professional programmers. Most dataflow programming languages
are niche languages used by non-professional programer users.

* The intervention of the run time system can be expensive. The
great advantage of LIFO semantics is that the implementing code
is immediate and cheap.

* Not using shared memory has costs. Either messages must be
copied or they must be immutable.

* Using data flow programming in the large pretty much requires
that it be used from the start. That is, converting traditional
programs into data flow programs is difficult because the
structuring is so different.

Richard Harter, cri(a)tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Kafka wasn't an author;
Kafka was a prophet!

From: user923005 on 26 Oct 2009 20:04

On Oct 26, 3:22 pm, c...(a)tiac.net (Richard Harter) wrote:
> SHOULD IT BE TURTLES ALL THE WAY UP?
>
> In the famous anecdote, the little old lady replies to the noted
> philosopher, "It's turtles all the way down." When it comes to
> writing software many writers on software design and many
> programming language creators seem to believe that it is turtles
> all the way up.
>
> What do I mean by "turtles all the way up"? By this I mean the
> thesis that the techniques and programming language concepts that
>
> are used in the small can be extended indefinitely to programming
>
> in the large. In other words if we use language X for 100 line
> programs and 1000 line programs we can also use it for 1000000
> line programs. We may have to add some extensions and new
> features along the way, but we can increase the size of the
> programs we write indefinitely using the same ideas.

I've never seen a convincing argument to show that this is wrong.

We can use a 26 letter alphabet to make little words.
We can use a 26 letter alphabet to make bigger words.
We can use a 26 letter alphabet to make little paragraphs.
We can use a 26 letter alphabet to make bigger paragraphs.
We can use a 26 letter alphabet to make little books.
We can use a 26 letter alphabet to make bigger books.
We can use a 26 letter alphabet to make entire libraries.

Why isn't the same thing true of programming languages?

Now, I admit that if our tiny building blocks are poorly made, the
bigger building blocks become more and more fragile.
But that is true no matter which programming language we use to build
the smallest blocks. And it is always a tragic mistake to try to make
one big giant block that does it all (Forth metaphor is not an
exception because the mega word comes from its babies).

I also don't think it matters which direction we build the turtles, as
long as they make it from top to bottom.

> <p>
> The antithesis is that it isn't turtles all the way up, or at
> least it shouldn't be turtles all the way up. That is, the kinds
>
> of languages and programming technologies that we use in
> large scale programming should be quite different from the kind
> of languages used for programming in the small.
> <p>
> There are many propositions as to what those large scale
> technologies should be, and many such technologies in use. Here
> I am going to look at data flow languages and data flow
> structuring of programs.
>
> WHAT ARE DATA FLOW LANGUAGES?
>
> There are two wikipedia articles that give useful answers:
> http://en.wikipedia.org/wiki/Dataflow_programming
> http://en.wikipedia.org/wiki/Flow-based_programming
>
> The distinction wikipedia makes between data flow programming and
>
> flow based programming is obscure. The following definition is
> an edited version of definitions used in the two articles.
>
> Data flow languages structure applications as networks of
> "black box" elements that exchange data across predefined
> connections by message passing. Elements execute when they
> get messages; they send messages asynchronously. Data flow
> based applications are inherently parallel.
>
> There are a wide variety of data flow languages, varying from
> spread sheets, to Labview, to Erlang. Many are graphical;
> programming is done by altering flow diagrams. One thing they
> all have in common is that they have a run time system.
>
> Traditional imperative programs are composed of routines that
> call each other; i.e., when a call is made the caller constructs
> a data packet (calling sequence) and transfers control and the
> data packer to the called routine. When the called routine is
> done it constructs a data packet to pass back to the caller and
> transfers control back to the caller.
>
> In data flow programs the "routines" do not call each other.
> Instead they are activated by the run time system when there is
> input for them; when they create outputs the run time system
> takes care of moving the output to the destination input areas.
> When the "routines" are done they transfer control back to the
> run time system.
>
> One difference between traditional programs and data flow
> programs is that traditional programs use LIFO semantics whereas
> data flow programs use FIFO semantics. That is, a traditional
> program puts data on a stack and gets data back on the same
> stack. In data flow programs each element gets data from a queue
>
> and puts data to other queues.
>
> Another difference is that the connectivity of traditional
> programs is deeply embedded in the code. To pass data from A to
> B, A calls B. That is, the caller has to specify where the data
> goes. In data flow programs the connectivity can be separate
> from the code. A doesn't pass data directly to B; instead it
> passes data to the run time system which in turn passes the data
> to B. The caller does not have to specify where the data goes.
>
> As a result data flow programs can use different languages for
> the internal implementation of the computational elements and for
>
> the connectivity. In fact, it is common for data flow languages
> to be graphical.
>
> ADVANTAGES AND DISADVANTAGES
>
> What are the advantages and disadvantages of data flow
> programming?
>
> Some significant advantages:
>
> * Concurrency and parallelism are natural. Code can be
> distributed between cores and even across networks. Many of the
> problems associated with threads disappear.
>
> * Data flow networks are natural for representing process. This
> is particularly true for transaction processing.
>
> * Message passing gets rid of the problems associated with shared
> memory and locks.
>
> * Data flow programs are more extensible than traditional
> programs. Elements can be readily composed into composite
> elements from the bottom up rather than top down.
>
> Some significant disadvantages:
>
> * The mindset of data flow programming is unfamiliar to most
> professional programmers. Most dataflow programming languages
> are niche languages used by non-professional programer users.
>
> * The intervention of the run time system can be expensive. The
> great advantage of LIFO semantics is that the implementing code
> is immediate and cheap.
>
> * Not using shared memory has costs. Either messages must be
> copied or they must be immutable.
>
> * Using data flow programming in the large pretty much requires
> that it be used from the start. That is, converting traditional
> programs into data flow programs is difficult because the
> structuring is so different.
>
> Richard Harter, c...(a)tiac.nethttp://home.tiac.net/~cri,http://www.varinoma.com
> Kafka wasn't an author;
> Kafka was a prophet!

From: Mok-Kong Shen on 27 Oct 2009 04:05

Richard Harter wrote:
[snip]
> Some significant advantages:
>
> * Concurrency and parallelism are natural. Code can be
> distributed between cores and even across networks. Many of the
> problems associated with threads disappear.
[snip]

To be able to naturally deal with concurrency and parallelism
is excellent. But wouldn't it be, on the other hand, somewhat
inconvenient to program processes that requires sequentiality?
Should or could there be an adequate interface between data flow
and conventional programming languages?

M. K. Shen

From: Dmitry A. Kazakov on 27 Oct 2009 04:55

On Mon, 26 Oct 2009 22:22:14 GMT, Richard Harter wrote:

> Some significant disadvantages:
>
> * The mindset of data flow programming is unfamiliar to most
> professional programmers. Most dataflow programming languages
> are niche languages used by non-professional programer users.
>
> * The intervention of the run time system can be expensive. The
> great advantage of LIFO semantics is that the implementing code
> is immediate and cheap.
>
> * Not using shared memory has costs. Either messages must be
> copied or they must be immutable.
>
> * Using data flow programming in the large pretty much requires
> that it be used from the start. That is, converting traditional
> programs into data flow programs is difficult because the
> structuring is so different.

* Low abstraction level, basically lacking any abstraction. Data flow works
with primitive built-in atomic types forced into value semantics.

* Already mentioned inefficiency because of value semantics (i.e. either
permanent marshaling inputs, or else blocking the producers or the
consumers when data are shared) Normally it is the programmer to choose how
to protect states. With the dataflow it must be the engine to decide.
Basically it is lack of abstraction again. Working with *data* instead of
*states*, that summaries it.

* The networks of wired blocks cannot be encapsulated into reusable opaque
primitives. You can group blocks, that's it. Usual methods of decomposition
do not work, because there is a firewall between "block" and "data". You
cannot merge them (like in OO) into a reusable entity (type = values +
behavior).

* Unmaintainable code. Look at large data flow programs (e.g. in DiaDem,
Simulink, LabView). It is impossible to use reasonable software processes
on them. How to compare two diagrams? When are they semantically
equivalent? How do I validate a diagram? How to search for anything in a
diagram? Another example of this problem is represented by GUI libraries,
which are mostly event controlled. Practically none of them can be easily
used in multitasking environment. It is a horror to debug hangups or
generators in messages routed from one messages loop to another. And the
proposal is to build *everything* this way. Shudder.

* Semantic problems, the barriers of a block that fire the computation of
the block's output is a very primitive model that does not scale in
reality. It almost instantly goes into a mess of deadlocking vs. race
condition choices. And see above, it practicably cannot be debugged except
for trivial cases (without feedbacks, with all inputs synchronous etc) it
cannot be decomposed into simpler tasks...

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

From: Mok-Kong Shen on 27 Oct 2009 06:47

Richard Harter wrote:

> Some significant advantages:
>
> * Concurrency and parallelism are natural.

I guess that such a language could profitably be used to simulate
the neuronal network of the brain. Could that be right?

M. K. Shen

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: simple: P versus NP
Next: 19990823: General announcements. Goedel's question: if a theorem has a proof of length at most n, can we find it in time O(n^2)? Another question on what can be computed in limited time and space.

Data Flow languages and programming - Part I