From: RG on
In article <ymid3tvvi3j.fsf(a)blackcat.isi.edu>,
tar(a)sevak.isi.edu (Thomas A. Russ) wrote:

> Pascal Costanza <pc(a)p-cos.net> writes:
>
> > On 05/08/2010 19:01, RG wrote:
> > > Do you also find it hard to believe than that anyone would ever use an
> > > AList or a Plist in production code? ASSOC and GETF are also O(n)
> > > operations, no?
> >
> > That's a bad comparison. An alternative to alists and plists is hash
> > tables, but hash tables are slower than alists or plists if there are
> > less than 50 (or so) mappings. In contrast, arrays are always faster for
> > random access than lists.
>
> And in fact in some of our code where performance was an issue we ended
> up writing an auto-switching data structure. We had particular lookup
> lists (dictionaries) where many were short but occasionally longer ones
> would pop up. This depended on some runtime actions so it wasn't
> predictable.
>
> So our solution was to build an abstract data structure that would use
> ALISTs until the (empirically determined) performance thresdhold was
> reached and then substituting a hash table internally to hold the data.
>
> I suppose that makes this tale an endorsement of the ADD-FOO and GET-FOO
> technique.

No. Both approaches are equivalent in this regard. See
http://www.flownet.com/ron/lisp/dictionary.lisp for an existence proof.

rg
From: RG on
In article <8c21hdFfp3U1(a)mid.individual.net>,
Pascal Costanza <pc(a)p-cos.net> wrote:

> On 05/08/2010 19:01, RG wrote:
> > In article<8c0750FaueU1(a)mid.individual.net>,
> > Pascal Costanza<pc(a)p-cos.net> wrote:
> >
> >> On 05/08/2010 18:11, RG wrote:
> >>> In article<8bvcqmFbo1U1(a)mid.individual.net>,
> >>> Pascal Costanza<pc(a)p-cos.net> wrote:
> >>>
> >>>> If they all require the same boilerplate, why didn't you write a macro
> >>>> to generate it? Then you would have greatly reduced the workload again...
> >>>
> >>> It's not my code. But such a macro would not have solved the problem in
> >>> this particular case. This is a system that manages financial data
> >>> feeds from around the world. It is live 24x7. Down time is somewhere
> >>> between extremely expensive and disastrous depending on when it occurs.
> >>> When you change representation you not only have to change the code, you
> >>> also have to convert the currently-live data structures. So a macro
> >>> that generated accessors would not by itself be enough. You would
> >>> either have to reboot the system after every change, or write additional
> >>> code to convert the currently-live data structures in situ. And this is
> >>> not an uncommon situation in high-availability applications.
> >>
> >> Are you seriously saying that in such an application, a random access
> >> operator (like elt) is used on lists, or that it would ever makes sense
> >> to use elt on lists in the future? An O(n) operation?
> >>
> >> I find that very hard to believe...
> >
> > Do you also find it hard to believe than that anyone would ever use an
> > AList or a Plist in production code? ASSOC and GETF are also O(n)
> > operations, no?
>
> That's a bad comparison. An alternative to alists and plists is hash
> tables, but hash tables are slower than alists or plists if there are
> less than 50 (or so) mappings. In contrast, arrays are always faster for
> random access than lists.

That's a straw man. Hash tables are not the only alternative to ALists
and PLists.

> > Or that it might make sense to use an O(n) operation if
> > it is in a part of the system that runs is rarely used and is not
> > mission-critical?
>
> That can happen, sure.
>
> > In this particular case I have no idea what's really going on under the
> > hood. Like I said, it's not my code. (In this case I'm a user, not a
> > developer.) But what difference does it make what the particular issue
> > happens to be? Do you seriously doubt that representations in
> > production applications often need to be changed after deployment to
> > meet unanticipated demands or to support new features? And that down
> > time can be expensive, and in some cases even catastrophic?
>
> I don't doubt that run-time changes are important, to the contrary. I
> still believe that the collections is the wrong level of abstraction to
> make such changes.

Yes, I get that. What I don't get is *why* you think that. But at this
point maybe we just ought to agree to disagree.

rg
From: Rob Warnock on
Thomas A. Russ <tar(a)sevak.isi.edu> wrote:
+---------------
| Pascal Costanza <pc(a)p-cos.net> writes:
| > On 05/08/2010 19:01, RG wrote:
| > > Do you also find it hard to believe than that anyone would ever use an
| > > AList or a Plist in production code? ASSOC and GETF are also O(n)
| > > operations, no?
| >
| > That's a bad comparison. An alternative to alists and plists is hash
| > tables, but hash tables are slower than alists or plists if there are
| > less than 50 (or so) mappings. In contrast, arrays are always faster for
| > random access than lists.
|
| And in fact in some of our code where performance was an issue we ended
| up writing an auto-switching data structure. We had particular lookup
| lists (dictionaries) where many were short but occasionally longer ones
| would pop up. This depended on some runtime actions so it wasn't
| predictable.
|
| So our solution was to build an abstract data structure that would use
| ALISTs until the (empirically determined) performance thresdhold was
| reached and then substituting a hash table internally to hold the data.
+---------------

Lua "tables" do something like that, but are instead always *both*
hash tables and arrays. Elements of a table whose keys are a
"dense-enough" set of natural numbers[1] are stored in the array
part, and all other elements are stored in the hash table part
[including those with numeric keys that outside the "dense enough"
range]:

http://www.lua.org/doc/hopl.pdf
The Evolution of Lua
...
Until Lua 4.0, tables were implemented as pure hash tables,
with all pairs stored explicitly. In Lua 5.0 we introduced a
hybrid representation for tables: every table contains a hash
part and an array part, and both parts can be empty. Lua detects
whether a table is being used as an array and automatically
stores the values associated to integer indices in the array
part, instead of adding them to the hash part[31]. This division
occurs only at a low implementation level; access to table fields
is transparent, even to the virtual machine. Tables automatically
adapt their two parts according to their contents.

Reference [31] explains further:

http://www.lua.org/doc/jucs05.pdf
The implementation of Lua 5.0
...
When a table needs to grow, Lua recomputes the sizes for its hash
part and its array part. Either part may be empty. The computed
size of the array part is the largest N such that at least half
the slots between 1 and N are in use (to avoid wasting space with
sparse arrays) and there is at least one used slot between N/2+1
and N (to avoid a size N when N/2 would do). After computing the
new sizes, Lua creates the new parts and re-inserts the elements
from the old parts into the new ones.
...
As a consequence, if a table is being used as an array, it performs
as an array, as long as its integer keys are dense. Moreover, no
memory or time penalty is paid for the hash part, because it does
not even exist. The converse holds: if the table is being used as
an associative array, and not as an array, then the array part is
likely to be empty.


-Rob

[1] Yes, Lua arrays are indexed starting at *1*!! (*sigh*)
To be precise, the number 0 may certainly be used as a key,
but its associated value will go into the hash-table part
of the table, not the array part.

-----
Rob Warnock <rpw3(a)rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607

From: Thomas A. Russ on
RG <rNOSPAMon(a)flownet.com> writes:

> In article <ymid3tvvi3j.fsf(a)blackcat.isi.edu>,
> tar(a)sevak.isi.edu (Thomas A. Russ) wrote:
> > So our solution was to build an abstract data structure that would use
> > ALISTs until the (empirically determined) performance thresdhold was
> > reached and then substituting a hash table internally to hold the data.
> >
> > I suppose that makes this tale an endorsement of the ADD-FOO and GET-FOO
> > technique.
>
> No. Both approaches are equivalent in this regard. See
> http://www.flownet.com/ron/lisp/dictionary.lisp for an existence proof.

Nice interesting abstraction.

I only looked at this briefly and saw that it supported changing the
underlying implementation of dictionaries dynamically. I didn't see any
place where this happened automatically, although I might have missed
it.

Unless it also supports automatic changes to the implementation, it
seems that one would still need to have the ADD-FOO, etc. interface on
top of the dictionary in order to trigger the change-over.

Although I guess it wouldn't be too hard to extend your dictionary code
to have an automatic switch over by defining an
auto-switching-dictionary class.


--
Thomas A. Russ, USC/Information Sciences Institute
From: RG on
In article <ymiwrs3tbn7.fsf(a)blackcat.isi.edu>,
tar(a)sevak.isi.edu (Thomas A. Russ) wrote:

> RG <rNOSPAMon(a)flownet.com> writes:
>
> > In article <ymid3tvvi3j.fsf(a)blackcat.isi.edu>,
> > tar(a)sevak.isi.edu (Thomas A. Russ) wrote:
> > > So our solution was to build an abstract data structure that would use
> > > ALISTs until the (empirically determined) performance thresdhold was
> > > reached and then substituting a hash table internally to hold the data.
> > >
> > > I suppose that makes this tale an endorsement of the ADD-FOO and GET-FOO
> > > technique.
> >
> > No. Both approaches are equivalent in this regard. See
> > http://www.flownet.com/ron/lisp/dictionary.lisp for an existence proof.
>
> Nice interesting abstraction.
>
> I only looked at this briefly and saw that it supported changing the
> underlying implementation of dictionaries dynamically. I didn't see any
> place where this happened automatically, although I might have missed
> it.

No, but that's a pretty straightforward extension, at least conceptually.

> Unless it also supports automatic changes to the implementation, it
> seems that one would still need to have the ADD-FOO, etc. interface on
> top of the dictionary in order to trigger the change-over.

Huh? Why?

> Although I guess it wouldn't be too hard to extend your dictionary code
> to have an automatic switch over by defining an
> auto-switching-dictionary class.

You don't even have to do that. All you have to do is have some
profiling code somewhere that triggers appropriate calls to
change-implementation. This is not a trivial thing to do, especially if
you want it to be thread-safe without bringing everything to a
screeching halt, but it's not rocket science either.

rg