Raffy's Karma [was Re: [ANN] Filtered functions] [Lisp]

Prev: NY Times
Next: complex symmetric matrices

From: George Neuner on 14 Dec 2009 13:00

On Fri, 11 Dec 2009 07:24:53 -0500, Kenneth Tilton
<kentilton(a)gmail.com> wrote:

>George Neuner wrote:
>
>> ... I don't consider [Kenny's] favorite analogy - spreadsheets - to be
>> representative of dataflow.
>
>Actually Mark Guliano (sp?) the COSI presenter at LUGM 99 preferred the
>same analogy. Where do you see it breaking down? I try to avoid it
>because almost everyone misses that it is an analogy and says something
>like, "Hunh? I do not want to write Lotus 1-2-3!"

YMMV, but IMO the defining feature of dataflow is that it is a
feed-forward broadcast/multicast model - the goal of the original
hardware model was to implicitly sequence computations so as to
advance them as their input became ready rather than as explicitly
directed by the programmer, to enable efficient use of parallel
hardware and, in particular, to enable speculative execution whenever
possible. I don't believe any software model can claim really to be
dataflow unless it addresses these original goals.

I concede that it is reasonable to view dataflow as a network of
active observers rather than a network of active providers, but if you
take the position that all the players are active agents, I believe
that the observation model is somewhat less efficient when there are
many linkages and that it makes more sense to have agents be
activated/scheduled upon receipt of data. The applicable hardware
analogy is "asynchronous rendezvous logic" which is what the original
model was based on.

As for my issue with spreadsheets - their working have traditionally
been explained in terms of the observer model (though obviously they
could be implemented either way - spreadsheet is not a model itself
but an example). In any event, most people understand them in terms
of observation and, because of that, I believe the analogy is, at
best, misleading because it does not (I think adequately) illustrate
the feed forward nature of the computation.

>> I do consider that Cells itself is in the
>> spirit of dataflow although, when I tried it, my opinion was that the
>> programmer has to do too much manual fussing with the linkage web
>
>Really? There is not even a mechanism for manual fussing. Do you mean
>sometimes you created circularities and had to rethink your rules?
>
>One thing that gets crazy is doing a SETF to cell X in an observer on
>cell Y. It is not illegal, and it is useful in keeping big hairy
>applications efficient, but it does make for some harder thinking
>approaching "manual" in that one really needs to think about the
>dependency graph at hand. Still a net win, though.
>
>> (which is also a problem in other attempts at dataflow languages like
>> SISAL, Lucid, Oz, etc.). IMO, execution linkages should be determined
>> automatically by the compiler.
>
>No, linkages must be done at run-time because the dependency graph
>changes at run-time. And they /are/ determined automatically in Cells.

I'm not arguing "compile time" vs. "run time" - I'm saying that the
linkages should, IMO, be implicitly specified by usage rather than
directed by the programmer. I don't think there should be a need for
explicit rules on how to handle particular data - all the data should
be "compute on demand" or "compute as ready" determined by whichever
condition is reached first.

I know that Cells really was not an attempt at a cohesive dataflow
language - I make allowances for it being an extension to Lisp rather
than a replacement for it.

>Maybe you are thinking of some other package?

If I have confused Cells with something else, I apologize ... it has
been quite a while since I played with it.

George

From: Kenneth Tilton on 14 Dec 2009 14:06

George Neuner wrote:
> On Fri, 11 Dec 2009 07:24:53 -0500, Kenneth Tilton
> <kentilton(a)gmail.com> wrote:
>
>> George Neuner wrote:
>>
>>> ... I don't consider [Kenny's] favorite analogy - spreadsheets - to be
>>> representative of dataflow.
>> Actually Mark Guliano (sp?) the COSI presenter at LUGM 99 preferred the
>> same analogy. Where do you see it breaking down? I try to avoid it
>> because almost everyone misses that it is an analogy and says something
>> like, "Hunh? I do not want to write Lotus 1-2-3!"
>
> YMMV, but IMO the defining feature of dataflow is that it is a
> feed-forward broadcast/multicast model - the goal of the original
> hardware model was to implicitly sequence computations so as to
> advance them as their input became ready rather than as explicitly
> directed by the programmer, to enable efficient use of parallel
> hardware and, in particular, to enable speculative execution whenever
> possible. I don't believe any software model can claim really to be
> dataflow unless it addresses these original goals.

That is how Cells works. One /programs/ a rule for X that happens to use
Y and Z (same as a spreadsheet) but the runtime action is more like
cause and effect: Y or Z change so X changes.

At first I did not think of it that way -- cells felt more like those
computed slots of C++ -- but after a while I came to see that Cells is
more about push (feed-forward) then it is about pull (observing).

Unless of course one uses only lazy cells. :)

By the way, there is no explicit direction (since v1 or 2). One simply
writes a nice functional rule that uses whatever other values one wants
to use and dependencies are identified and tracked automatically. It
matters that this happens at runtime because of branching code
(different dependencies over time for the same rule/instance) and
because the values accessed by the same rule may or may not be ones that
will change at runtime depending on what specific other things the rule
runs against. Knowing this allows a /big/ optimization.

>
> I concede that it is reasonable to view dataflow as a network of
> active observers rather than a network of active providers, but if you
> take the position that all the players are active agents, I believe
> that the observation model is somewhat less efficient when there are
> many linkages and that it makes more sense to have agents be
> activated/scheduled upon receipt of data. The applicable hardware
> analogy is "asynchronous rendezvous logic" which is what the original
> model was based on.
>
> As for my issue with spreadsheets - their working have traditionally
> been explained in terms of the observer model (though obviously they
> could be implemented either way - spreadsheet is not a model itself
> but an example). In any event, most people understand them in terms
> of observation and, because of that, I believe the analogy is, at
> best, misleading because it does not (I think adequately) illustrate
> the feed forward nature of the computation.

Actually, a spreadsheet certainly does feed forward: I change one cell
holding the prime rate and the rest of the model is updated. It does not
feel like dataflow because the only manifestation is screen updates, but
I think I heard about spreadsheets getting triggers which would let them
kick off more interesting effects. Imagine lights flashing and bells
ringing as you play what-if with a financial model.

>
>
>>> I do consider that Cells itself is in the
>>> spirit of dataflow although, when I tried it, my opinion was that the
>>> programmer has to do too much manual fussing with the linkage web
>> Really? There is not even a mechanism for manual fussing. Do you mean
>> sometimes you created circularities and had to rethink your rules?
>>
>> One thing that gets crazy is doing a SETF to cell X in an observer on
>> cell Y. It is not illegal, and it is useful in keeping big hairy
>> applications efficient, but it does make for some harder thinking
>> approaching "manual" in that one really needs to think about the
>> dependency graph at hand. Still a net win, though.
>>
>>> (which is also a problem in other attempts at dataflow languages like
>>> SISAL, Lucid, Oz, etc.). IMO, execution linkages should be determined
>>> automatically by the compiler.
>> No, linkages must be done at run-time because the dependency graph
>> changes at run-time. And they /are/ determined automatically in Cells.
>
> I'm not arguing "compile time" vs. "run time" - I'm saying that the
> linkages should, IMO, be implicitly specified by usage rather than
> directed by the programmer.

And that is how Cells has worked since version 1 or 2. I just write my
code, the Cells engine notices when X uses Y and thereafter updates X as
needed when Y changes.

> I don't think there should be a need for
> explicit rules on how to handle particular data - all the data should
> be "compute on demand" or "compute as ready" determined by whichever
> condition is reached first.

And there should be a lazy option, so "on demand" is the only compute.

>
> I know that Cells really was not an attempt at a cohesive dataflow
> language - I make allowances for it being an extension to Lisp rather
> than a replacement for it.

Nope, it wan't an attempt at anything. I was just trying to get my
application written. :)

kt

--
http://www.theoryyalgebra.com/index.html

From: George Neuner on 14 Dec 2009 17:54

On Mon, 14 Dec 2009 14:06:07 -0500, Kenneth Tilton
<kentilton(a)gmail.com> wrote:

>George Neuner wrote:
>>>
>> YMMV, but IMO the defining feature of dataflow is that it is a
>> feed-forward broadcast/multicast model - the goal of the original
>> hardware model was to implicitly sequence computations so as to
>> advance them as their input became ready rather than as explicitly
>> directed by the programmer, to enable efficient use of parallel
>> hardware and, in particular, to enable speculative execution whenever
>> possible. I don't believe any software model can claim really to be
>> dataflow unless it addresses these original goals.
>
>That is how Cells works. One /programs/ a rule for X that happens to use
>Y and Z (same as a spreadsheet) but the runtime action is more like
>cause and effect: Y or Z change so X changes.
>
>At first I did not think of it that way -- cells felt more like those
>computed slots of C++ -- but after a while I came to see that Cells is
>more about push (feed-forward) then it is about pull (observing).
>
>Unless of course one uses only lazy cells. :)

Ok, I'll have to look at it again. I had the impression that it was
an observer model. Perhaps I played with an early version?

>By the way, there is no explicit direction (since v1 or 2). One simply
>writes a nice functional rule that uses whatever other values one wants
>to use and dependencies are identified and tracked automatically. It
>matters that this happens at runtime because of branching code
>(different dependencies over time for the same rule/instance) and
>because the values accessed by the same rule may or may not be ones that
>will change at runtime depending on what specific other things the rule
>runs against. Knowing this allows a /big/ optimization.

The direction does make a difference if you consider active agents. It
is much more efficient to push a value to a bunch of suspended agents
than for those agents to poll for it.

>> I concede that it is reasonable to view dataflow as a network of
>> active observers rather than a network of active providers, but if you
>> take the position that all the players are active agents, I believe
>> that the observation model is somewhat less efficient when there are
>> many linkages and that it makes more sense to have agents be
>> activated/scheduled upon receipt of data. The applicable hardware
>> analogy is "asynchronous rendezvous logic" which is what the original
>> model was based on.
>>
>> As for my issue with spreadsheets - their working have traditionally
>> been explained in terms of the observer model (though obviously they
>> could be implemented either way - spreadsheet is not a model itself
>> but an example). In any event, most people understand them in terms
>> of observation and, because of that, I believe the analogy is, at
>> best, misleading because it does not (I think adequately) illustrate
>> the feed forward nature of the computation.
>
>Actually, a spreadsheet certainly does feed forward: I change one cell
>holding the prime rate and the rest of the model is updated. It does not
>feel like dataflow because the only manifestation is screen updates, but
>I think I heard about spreadsheets getting triggers which would let them
>kick off more interesting effects. Imagine lights flashing and bells
>ringing as you play what-if with a financial model.

I would say that depends. The formula cell pulls the value. You can
make an reasonable analogy to the dataflow hardware model wherein the
value cell is a latched register and the formula cell is triggered
logic, but to be valid, that presupposes that modifying the value cell
directly causes (re)evaluation of the formula cell. In an actual
spreadsheet implementation, that may or may not be the case.

Actual spreadsheet code I have seen over the years has uniformly used
backward chained evaluation: there's a list of formula cells that
refer to values. I have seen implementations keep forward links from
values to referencing formulas, but AFAIHS, those forward links were
only used as starting points for recomputing dependencies for a new
round of evaluation. The evaluation itself walked the dependency
ordered list of formulas. I've never encountered an implementation I
would definitely classify as forward chained.

>> I don't think there should be a need for explicit rules on how to
>> handle particular data - all the data should be "compute on demand"
>> or "compute as ready" determined by whichever condition is reached
>> first.
>
>And there should be a lazy option, so "on demand" is the only compute.

There certainly are times when lazy operations and structures are
useful, but I'm not a fan of general lazy computation (ala Haskell).
Mainly I see laziness as a lame attempt to performance out of serial
code. As many programmers can't write correct threaded code to save
their own lives, I believe the only hope the average developer has of
effectively using lots of cores will come from direct compiler
intervention to exploit inherent parallelism and speculatively execute
code.

The general dearth of good language and runtime support for parallel
processing has bothered me for a long time. After too many years I've
learned enough about language implementation to finally do something
about it ... but my own opus isn't yet ready for the critics.

George

From: Kenneth Tilton on 14 Dec 2009 22:22

George Neuner wrote:
> On Mon, 14 Dec 2009 14:06:07 -0500, Kenneth Tilton
> <kentilton(a)gmail.com> wrote:
>
>> George Neuner wrote:
>>> YMMV, but IMO the defining feature of dataflow is that it is a
>>> feed-forward broadcast/multicast model - the goal of the original
>>> hardware model was to implicitly sequence computations so as to
>>> advance them as their input became ready rather than as explicitly
>>> directed by the programmer, to enable efficient use of parallel
>>> hardware and, in particular, to enable speculative execution whenever
>>> possible. I don't believe any software model can claim really to be
>>> dataflow unless it addresses these original goals.
>> That is how Cells works. One /programs/ a rule for X that happens to use
>> Y and Z (same as a spreadsheet) but the runtime action is more like
>> cause and effect: Y or Z change so X changes.
>>
>> At first I did not think of it that way -- cells felt more like those
>> computed slots of C++ -- but after a while I came to see that Cells is
>> more about push (feed-forward) then it is about pull (observing).
>>
>> Unless of course one uses only lazy cells. :)
>
> Ok, I'll have to look at it again. I had the impression that it was
> an observer model. Perhaps I played with an early version?

No, it has been like this from the beginning. But you can see an
observer model if you want, that is just not how it works when Something
Happens. I'll say it again: one /codes/ declarative, functional rules
but the engine reacts to change and propagates efficiently to dependents.

>
>
>> By the way, there is no explicit direction (since v1 or 2). One simply
>> writes a nice functional rule that uses whatever other values one wants
>> to use and dependencies are identified and tracked automatically. It
>> matters that this happens at runtime because of branching code
>> (different dependencies over time for the same rule/instance) and
>> because the values accessed by the same rule may or may not be ones that
>> will change at runtime depending on what specific other things the rule
>> runs against. Knowing this allows a /big/ optimization.
>
> The direction does make a difference if you consider active agents. It
> is much more efficient to push a value to a bunch of suspended agents
> than for those agents to poll for it.

No one is polling.

>
>
>>> I concede that it is reasonable to view dataflow as a network of
>>> active observers rather than a network of active providers, but if you
>>> take the position that all the players are active agents, I believe
>>> that the observation model is somewhat less efficient when there are
>>> many linkages and that it makes more sense to have agents be
>>> activated/scheduled upon receipt of data. The applicable hardware
>>> analogy is "asynchronous rendezvous logic" which is what the original
>>> model was based on.
>>>
>>> As for my issue with spreadsheets - their working have traditionally
>>> been explained in terms of the observer model (though obviously they
>>> could be implemented either way - spreadsheet is not a model itself
>>> but an example). In any event, most people understand them in terms
>>> of observation and, because of that, I believe the analogy is, at
>>> best, misleading because it does not (I think adequately) illustrate
>>> the feed forward nature of the computation.
>> Actually, a spreadsheet certainly does feed forward: I change one cell
>> holding the prime rate and the rest of the model is updated. It does not
>> feel like dataflow because the only manifestation is screen updates, but
>> I think I heard about spreadsheets getting triggers which would let them
>> kick off more interesting effects. Imagine lights flashing and bells
>> ringing as you play what-if with a financial model.
>
> I would say that depends. The formula cell pulls the value.

No, that is the way it is coded. The engine then propagates a change to,
say, prime rate throughout the model.

> You can
> make an reasonable analogy to the dataflow hardware model wherein the
> value cell is a latched register and the formula cell is triggered
> logic, but to be valid, that presupposes that modifying the value cell
> directly causes (re)evaluation of the formula cell. In an actual
> spreadsheet implementation, that may or may not be the case.

How Lotus 1-2-3 is programmed I do not know, but I doubt it is does not
build a two-way dependency graph meaning cells know what they use and
used cells know who uses them. It's pretty simple code, after all. IIRC
in the early days one could not use a Cell higher up in the grid, or
maybe it was just that everything always got recalculated regardless.
But those early days did not last long.

>
> Actual spreadsheet code I have seen over the years has uniformly used
> backward chained evaluation: there's a list of formula cells that
> refer to values. I have seen implementations keep forward links from
> values to referencing formulas, but AFAIHS, those forward links were
> only used as starting points for recomputing dependencies for a new
> round of evaluation.

There is no way around a certain amount of "polling" because a rule can
be forced to run but then it cannot assume everyone else besides the
triggering value is current. Again, branching code make it impossible to
know. Cells uses a generational counter to decide currency and the so
the cost is negligible: we still recalculate only rules that are both
out of date and dependent on something that changed, otherwise simply
sweep the dependency graph looking for out-of-date dependents on changed
other Cells. The tree gets visited exactly once because a Cell
determined not to be need recalculation gets its generation set to the
current.

Other tools apparently run a rule hoping everyone is current and then
rerun if it turns out domeone they used got reran. Could take some fancy
analysis to decide which is more efficient and when (I suspect the
answer varies depending on the dependency graph).

> The evaluation itself walked the dependency
> ordered list of formulas. I've never encountered an implementation I
> would definitely classify as forward chained.

The correct algorithm walks only the dependency graph that /might/ be
affected, and it is simply comparing two integers. And again there is no
way to avoid this -- well, the alternative is the other model I
mentioned: don't check, just run. Re-run if it turns out you used an old
value. That could be fast in that in my experience re-runs would be
needed rarely, I just have not stopped to figure out what the
housekeeping would cost (the h/k to identify rules that ran against
obsolete values).

>
>
>>> I don't think there should be a need for explicit rules on how to
>>> handle particular data - all the data should be "compute on demand"
>>> or "compute as ready" determined by whichever condition is reached
>>> first.
>> And there should be a lazy option, so "on demand" is the only compute.
>
> There certainly are times when lazy operations and structures are
> useful, but I'm not a fan of general lazy computation (ala Haskell).

No, it just needs to be an option for particular oddball cases.

> Mainly I see laziness as a lame attempt to performance out of serial
> code. As many programmers can't write correct threaded code to save
> their own lives, I believe the only hope the average developer has of
> effectively using lots of cores will come from direct compiler
> intervention to exploit inherent parallelism and speculatively execute
> code.
>
> The general dearth of good language and runtime support for parallel
> processing has bothered me for a long time. After too many years I've
> learned enough about language implementation to finally do something
> about it ... but my own opus isn't yet ready for the critics.

I'll keep my eyes open for the ANNC.

kt

--

http://thelaughingstockatpngs.com/
http://www.facebook.com/pages/The-Laughingstock/115923141782?ref=nf

From: W. James on 17 Dec 2009 20:39

Kenneth Tilton wrote:

> I was clever enough to stipulate "an intelligent question", meaning
> one demonstrating one has given some thought to what one read and
> worked through to a synthetic consequence evidencing their thought.

I was clever enough to stipulate "an intelligent question", meaning
one demonstrating he has given some thought to what he read and
worked through to a synthetic consequence evidencing his thought.

--
"A woman's place is in the home." --- Wise Old Saying

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: NY Times
Next: complex symmetric matrices