3 questions about arrays [Lisp]

Prev: Checking for declarations in macros
Next: scripting in CL -- please comment

From: Kaz Kylheku on 4 Dec 2009 23:33

On 2009-12-04, Thomas A. Russ <tar(a)sevak.isi.edu> wrote:
> BTW, just for reference, here is the fixnum sizes for a number of lisps
> that I happen to have easy access to.
>
> Mac OS X:
> bits implementation
> ---- --------------
> 29 LispWorks Personal 5.1 (32-bit)
> 60 ClozureCL x8664 (64-bit)
> 24 CLISP 2.47

This can't be right. CLISP's most-positive-fixnum is 16.7 million, which
means that the mantissa has 24 bits.

But remember that fixnum is signed; CLISP also has a
most-negative-fixnum that is -16.7 million.

Might you be neglecting to count the sign bit?

From: Spiros Bousbouras on 4 Dec 2009 23:44

On Thu, 03 Dec 2009 20:49:51 -0500
Paul Wallich <pw(a)panix.com> wrote:
> Spiros Bousbouras wrote:
> >
> > fixnums are supposed to be efficient , that's the criterion for
> > choosing the size of fixnums not the maximum size of arrays. On the
> > other hand the maximum size of arrays should be what the operating
> > system can handle even if it means that part of the array would be in
> > swap memory.
>
> Nuh-uh.
>
> 1) That's already the case in plenty of CL implementations -- unless a
> lot of people have 60 bits of physical memory they're not telling anyone
> about.

It may be the case with plenty CL implementations but not all. For
example on the computer I'm using right now most-positive-fixnum of
SBCL is 536870911. This means that I can't create a bit vector with
more than 536870911 members which is ridiculous since I have 1 GB ram.

> (Even on 32-bit operating systems, a single fixnum-denominated
> array can already use up at least all of physical memory and quite
> possibly more, depending on what's in the array.)

Of course it can use all the memory if each element is large enough.
The problem is that it might miss a lot of memory when every member of
the array is small.

> 2) Now array-size limit depends not on the lisp but on the OS and the
> disks you have handy and how much someone allocated for /swap and what
> other programs are running. Good luck with that.

I'm not sure what you mean "not on the lisp". If you do your
programming in Lisp then array size will obviously depend on the Lisp.
A good Lisp implementation in turn will make optimal use of what the
hardware and operating system make available. Actually this is true for
every programming language ; luck has nothing to do with it.

The problem here is that the standard does not allow a Lisp
implementation to do this on the computer I'm using. It's a 32 bit
computer so a fast fixnum should be at most 32 bits. I assume SBCL uses
2 bits for internal book-keeping. I don't know anything about the
internal design of SBCL but if we assume that it was a good design
decision to take these 2 bits away from the range of values then the
conundrum becomes apparent : either use a fixnum greater than 32 bits
which would have a negative impact on the speed of all integer
arithmetic or arrive at a situation where the programmer cannot create
a bit vector of size greater than 2**29 despite the 1 GB ram available.

It doesn't contradict my point if there are Lisp implementations ,
possibly many , where a similar conundrum does not arise. If there is
even one Lisp implementation where the conundrum arises then the
standard made a poor choice in enforcing the fixnum restriction. If the
standard allowed (but not enforced) for the array indices to be bignums
then the implementations where fixnum is large enough anyway would be
unaffected. But an implementation like the one I'm using would be
allowed to offer larger arrays while keeping the fixnum size to what's
optimal for the architecture.

> Or do you just mean the
> maximum allocatable file size, in which case array-size limit varies
> dynamically with the contents of the array?

I don't know what "maximum allocatable file size" is. If by "array-size
limit" you mean not what the standard specifies but the actual
practical size as a programme is running then of course it depends on
the contents of the array.

> 3) Abandoning the fixnum limit, especially given adjustability, is
> potentially really bad for performance even for arrays that aren't
> bigger than that limit;

Every time you pass from fixnums to bignums it is "potentially really
bad for performance". This is something for the programmer to worry
about. The standard could offer a declaration by which the programmer
could inform the compiler that the total size of an array will be
a fixnum. That's how it works with everything else in Lisp : the
language allows flexibility and if you want to increase performance you
restrict your flexibility through the appropriate declarations. I don't
see why array access should be different.

It's worth considering also that in the case of array access you may
not even need a declaration at all. When you create the array or adjust
its size the compiler knows if the (new) size is so large that it needs
bignums or not. If not then it can automatically arrange for all access
calculations to use fixnum arithmetic.

> if you want really bad performance, you can roll it yourself.

I don't want really bad performance , I want arrays which can be larger
than most-positive-fixnum. How can I roll this myself ?

> 4) The problem has historically been with array-size-limit rather
> smaller than the fixnum limit.

Well things change. Memory and disks get bigger while the standard
stays the same.

--
MR. KUNSTLER: I will assume with all of the people I ask you about
they are very fine young men and so on. It will save
time.
http://www.law.umkc.edu/faculty/projects/ftrials/Chicago7/Daley.html

From: Spiros Bousbouras on 4 Dec 2009 23:44

On 4 Dec 2009 08:57:37 GMT
Tamas K Papp <tkpapp(a)gmail.com> wrote:
> On Fri, 04 Dec 2009 00:52:28 +0000, Spiros Bousbouras wrote:
>
> > fixnums are supposed to be efficient , that's the criterion for choosing
> > the size of fixnums not the maximum size of arrays. On the other hand
> > the maximum size of arrays should be what the operating system can
> > handle even if it means that part of the array would be in swap memory.
>
> Fixnums _can_ be made efficient, if the implementation allows that
> (but it is not obligated to do it). I would guess that the underlying
> motivation behind making array sizes fixnum is that one can address
> elements using row-major-aref and fixnum arithmetic, which is indeed
> a very efficient thing to do in most reasonable implementations.
>
> You are approaching this problem from the wrong perspective.

I'm approaching it from the perspective of what gives the programmer
the most choice and is in accord with the dynamic nature of Lisp. I
believe this is the right perspective.

> There is
> nothing to prevent fixnums from being very large, and you can use them
> to address your array, so there is no effective limitation.

On every architecture there is a natural size for fixnums which makes
arithmetic with them the fastest so a reasonable implementation will
probably choose this size for fixnums. It's not obligated by the
standard to do so but likely will. For an example of the unpleasant
dilemma which the standard creates for implementations see my reply to
Paul Wallich.

> >> Really, the reason is that Common Lisp is not purely a high level
> >> programming language, it is specified in such a way that implementation
> >> can easily be efficient. If you allowed bignums as index for arrays,
> >> each AREF would imply a bignum multiplication, which would be slow.
> >
> > Multiply what with what ?
>
> Aref converts indexes to a flat row major index. What is multiplied
> with what is an exercise left to the reader.

So the issue of multiplication doesn't even arise with vectors. For
arrays which are not vectors it could be handled with appropriate
declarations. Again , see my reply to Paul Wallich.

> > 5 , so ? In fact if the point for the limitation in the size of arrays
> > is to never have a bignum as an array subscript then each dimension
> > could be up to (1+ most-positive-fixnum) so the definition of
> > ARRAY-DIMENSION-LIMIT should be different in order to allow for such a
> > possibility.
>
> Again, that is not the point. The point is that row major indexes are
> fixnums. See array-row-major-index.

Right. So the standard could allow ARRAY-DIMENSION-LIMIT and
ARRAY-TOTAL-SIZE-LIMIT to be at most (1+ most-positive-fixnum) (which
means that both should be allowed to be bignums). This would still
ensure that ARRAY-ROW-MAJOR-INDEX can work with only fixnums while
giving the programmer 1 extra subscript to work with. Not that 1
subscript matters much of course , my main beef is with the decision to
only allow fixnums. Having made that decision I can kind of see why
they opted for simplicity and specified ARRAY-DIMENSION-LIMIT and
ARRAY-TOTAL-SIZE-LIMIT to also be fixnums at the cost of taking 1 extra
subscript away from the programmer.

> I am curious: why are you worried about this?

I'm not worried , I'm curious.

> Have you ran into a
> difficulty where, for example, you find fixnums on 64-bit SBCL
> limiting?

I haven't tried a 64-bit SBCL but with a 32-bit one yes it's limiting.
I'm almost certain that I could go around the limitation by either
using arrays of arrays or using C but it would be annoying.

--
The last scene is a killer, with the professor turning into the
protoplasm of life itself, and his wife turning into a glowing shell
of rock-like flesh, with her inner fires glowing through the crevices
(the effect is something like an overheated Spiderman).
Roger Ebert on "Altered states"

From: Spiros Bousbouras on 4 Dec 2009 23:44

On 4 Dec 2009 15:54:00 GMT
Tamas K Papp <tkpapp(a)gmail.com> wrote:
> On Fri, 04 Dec 2009 07:19:28 -0800, Pillsy wrote:
>
> > On Dec 4, 3:57 am, Tamas K Papp <tkp...(a)gmail.com> wrote: [...]
> >> I am curious: why are you worried about this? Have you ran into a
> >> difficulty where, for example, you find fixnums on 64-bit SBCL
> >> limiting?
> >
> > The fixnums on some popular implementations (like CLISP on my 32-bit XP
> > machine) have a dramatically smaller range. 17 million element arrays
> > really are too small for many applications.

Eww , that's pathetic.

> Yes, but I am under the impression that CLISP is not the
> implementation one would pick for computationally intensive
> applications, so I see no problem with it making this particular
> choice.

Wanting large arrays doesn't necessarily mean you want computationally
intensive applications. Having said that I have seen people here say
that CLISP is fast.

> I like the CL standard because it leaves a lot of choice to the
> implementor regarding these efficiency hacks, and at the same time the
> programmer can rely on certain things being true.

That's what declarations are for.

> Eg if you are
> writing some method which handles array indexes, you can always rely
> on them being fixnums, regardless of the implementation. And then the
> implementation is allowed to say what it considers a fixnum, and if
> you don't like that, just pick another implementation.

What if you want to use a 32-bit computer ?

--
Who's your mama ?

From: Spiros Bousbouras on 4 Dec 2009 23:44

On 04 Dec 2009 10:01:06 -0800
tar(a)sevak.isi.edu (Thomas A. Russ) wrote:
> Spiros Bousbouras <spibou(a)gmail.com> writes:
>
> > > > 2. Why the requirement that the dimensions of an array are fixnums ?
> > >
> > > Why not? This is not a big restriction, if an implementation wants
> > > bigger arrays, it only has to provide bigger fixnums.
> >
> > fixnums are supposed to be efficient , that's the criterion for
> > choosing the size of fixnums not the maximum size of arrays.
>
> Yes. And you want to make sure array access is efficient. Thus the
> limitation on only allowing fixnums as indicies

No , you want to make sure that the programmer has the freedom to
choose the trade-off which is most appropriate for his application. If
he is willing to sacrifice some speed in order to gain larger arrays he
should have that choice.

> > On the
> > other hand the maximum size of arrays should be what the operating
> > system can handle even if it means that part of the array would be in
> > swap memory.
>
> Um, the question of fixnum sizes and the physical/virtual memory sizes
> are completely orthogonal.

I 99.99% agree. However in the part of my post you're quoting I'm not
making a connection between size of fixnum and size of memory , I'm
making a connection between size of arrays and size of memory.

> As Pascal pointed out, most lisp systems can
> generally index their entire address space (at a word level) using their
> fixnums,

I'm not sure what the "at a word level" part means but I believe I
understand your overall gist. So what about those lisp systems which
can't index their entire address space using their fixnums ? From what
Pillsy said CLisp probably can't.

> so that means that fixnums would be sufficient to address all
> of the memory.

What if you want a large bit vector ? Are you supposed to use something
larger than a bit for the array element and then calculate bit
addresses and do your own bit fiddling ?

> Even if a lot of systems use 29 rather than 30 bits for
> fixnums, this still gives you most of memory, and some copying GCs need
> can only safely use half of the virtual memory in order to guarantee
> enough space to copy into....
>
> Besides, it would be ludicrous to expect that the fixnum size of a lisp
> implementation would depend on the actual amount of physical RAM
> installed in a computer, rather than the processor (and software)
> architecture.

I agree. On the other hand it makes sense to expect that the maximum
array size would depend on the actual amount of physical RAM installed
in a computer so the standard is at fault for creating a connection
between fixnum size and array size.

> > > Really, the reason is that Common Lisp is not purely a high level
> > > programming language, it is specified in such a way that
> > > implementation can easily be efficient. If you allowed bignums as
> > > index for arrays, each AREF would imply a bignum multiplication, which
> > > would be slow.
> >
> > Multiply what with what ?
>
> How do you address a multiple dimension array in RAM without
> multiplying?

If all the dimensions are powers of 2 then you can do it by ORing and
shifting. Perhaps there are some nice tricks available in other cases
too.

> > > Moreover, there would be no point, because in general
> > > the size of arrays slots is one word, and fixnum is all you need to
> > > index all the words you can store in memory. (eg. with a 32-bit
> > > processor, a fixnum of 30-bit is enough to index the whole memory.
> >
> > Is it ?
>
> Sure. Why do you think it isn't?

I'm thinking it *may* not be because you can cover a larger range of
values with 32 bits than with 30.

> > 5 , so ? In fact if the point for the limitation in the size of arrays
> > is to never have a bignum as an array subscript then each dimension
> > could be up to (1+ most-positive-fixnum) so the definition of
> > ARRAY-DIMENSION-LIMIT should be different in order to allow for such a
> > possibility.
>
> Um, not really. Because the ARRAY-DIMENSION-LIMIT has to apply to all
> arrays, including single-dimensional arrays that you want to displace to
> your multi-dimensional array.
>
> I also suspect that ROW-MAJOR-AREF also plays into this limit.

As I've said to my reply to Tamas "the standard could allow
ARRAY-DIMENSION-LIMIT and ARRAY-TOTAL-SIZE-LIMIT to be at most (1+
most-positive-fixnum) (which means that both should be allowed to be
bignums). This would still ensure that ARRAY-ROW-MAJOR-INDEX can work
with only fixnums while giving the programmer 1 extra subscript to work
with". Issue "ARRAY-DIMENSION-LIMIT-IMPLICATIONS" acknowledges this. It
says

Version 1 of this proposal had two parts:
1. to allow ARRAY-DIMENSION-LIMIT to be a bignum (the smallest
bignum) in implementations which wanted to permit the full
fixnum range for array size.

> There is a lot of interaction between various parts of the specification
> that you have to understand and consider when examining the design
> decisions. It isn't enough to focus only on one small part without
> considering how that impacts the language definition as a whole.

Feel free to point out any parts which you think are relevant. Which
makes me wonder is there an "official" rationale for the Lisp standard
somewhere ?

--
Our shoppers returned, and we raised a jaunty toast
To the solutions provider that delivers the most
As the ordeal did teach me, there's just no disputing
The performance and value of Sun network computing
http://mirror.optus.com.au/pub/flash/sun2003/SunHolidayPoem12.pdf

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: Checking for declarations in macros
Next: scripting in CL -- please comment