Allocatable versus automatic arrays [Fortran]

Prev: New gfortran bug
Next: optimized code crashes under ifort

From: robin on 2 Jun 2010 09:20

"helvio" <helvio.vairinhos(a)googlemail.com> wrote in message
news:0a10cc09-880c-4c42-9b9d-21a499ec18ff(a)y12g2000vbr.googlegroups.com...
| Hi all,
|
| I have a huge F90 code composed by several modules, with several
| module procedures each, and a main program. No external procedures are
| used. I've come across this situation: some modules have very large
| arrays declared in the global scope (with sizes of order ~50000 each),
| but some of these arrays are only used conditionally. They might be
| used elsewhere, but there's also the possibility that they might not.
| The situation is the following, schematically:

If U is used conditionally, ALLOCATABLE is fine.
That's the kind of thing for which it is intended to be used.

I notice that N is 50,000.
Is that some maximum value, or is it just a value that is larger
than anything you expect.
For instance, could N be read in?

From: Richard Maine on 2 Jun 2010 11:35

helvio <helvio.vairinhos(a)googlemail.com> wrote:

> In sum, I think my doubts reduce to the question of whether the
> efficiency of accessing the physical memory depends on the size of the
> allocated memory, or if it is independent of it.

It should be independent of it, or anyway close enough that you won't be
able to measure the difference.
>
> I also have the following related question: can the ALLOCATION /
> DEALLOCATION statements slow down the program if they are called
> multiple times, as compared with a single static declaration of "U"?

Yes, definitely. Of course, as with many things, that's only going to
matter if it is in an inner enough loop to be called lots of times.

It would seem that some of the classic advice on performance
optimization is in order. I don't feel like digging up the exact quotes;
there are some pretty well known ones. But roughly...

1. Worry about making the code right before you worry about making it
fast.

2. Improvements in algorithm are worth far more than code tweaks.

3. Even major improvements in code performance aren't going to matter
unless they are in time-critical potions of the program in the first
place. That one applies to your question above. Allocation and
deallocation do take time, but if much computation happens between the
allocation and deallocation, then their time usage is not likely to
matter relative to the computation.

4. When you do get to trying to tweak code to improve performance,
*MEASURE* the effects with your own code. Even experts can and regularly
do get surprised and things do vary from code to code. That means you
should not just accept performance judgements that people might give you
here. Yes, "people" includes me.

--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain

From: helvio on 2 Jun 2010 12:14

On Jun 2, 4:35 pm, nos...(a)see.signature (Richard Maine) wrote:
> helvio <helvio.vairin...(a)googlemail.com> wrote:
> > In sum, I think my doubts reduce to the question of whether the
> > efficiency of accessing the physical memory depends on the size of the
> > allocated memory, or if it is independent of it.
>
> It should be independent of it, or anyway close enough that you won't be
> able to measure the difference.

Thank you! :)

> > I also have the following related question: can the ALLOCATION /
> > DEALLOCATION statements slow down the program if they are called
> > multiple times, as compared with a single static declaration of "U"?
>
> Yes, definitely. Of course, as with many things, that's only going to
> matter if it is in an inner enough loop to be called lots of times.

Yup! These statements are indeed called many times, in one of the most
time consuming areas of my code. But the amount of matrix
multiplications among rank-2 subarrays of the U's and V's between
ALLOCATE and DEALLOCATE will definitely overshadow the amount of time
taken to allocate them.

My main worry was that the program could slow down if memory access
depended significantly on the size of the total allocated memory
(because I will have to access the memory locations of the V's many
times for matrix-multiplying their rank-2 subarrays). I always had
this idea in my head that physical memory access is one of the slowest
elementary processes, I just don't have a feeling for how
significantly slow it is.

But since you say that memory access is essentially independent of the
size of allocated memory, all I have to worry about is not to exceed
the available physical memory (N = 50000 was just an example). ;)

> It would seem that some of the classic advice on performance
> optimization is in order. I don't feel like digging up the exact quotes;
> there are some pretty well known ones.

Yup! This doesn't stop me at all from writing my code! I asked it
mostly as an academic question, to learn a little bit more.

> 1. Worry about making the code right before you worry about making it
> fast.

*thumbs up*

> 2. Improvements in algorithm are worth far more than code tweaks.

*thumbs up*

> 3. Even major improvements in code performance aren't going to matter
> unless they are in time-critical potions of the program in the first
> place. That one applies to your question above. Allocation and
> deallocation do take time, but if much computation happens between the
> allocation and deallocation, then their time usage is not likely to
> matter relative to the computation.

Yup! Not a problem.

> 4. When you do get to trying to tweak code to improve performance,
> *MEASURE* the effects with your own code. Even experts can and regularly
> do get surprised and things do vary from code to code. That means you
> should not just accept performance judgements that people might give you
> here. Yes, "people" includes me.

It's not a major tweak, it's just about choosing between two
straightforward ways of declaring arrays, both of which work. I'll
stick to one of them until it's time to test my code. Only then I will
measure the difference between the two options. And if I witness any
significant effects, then I might come back to this post and make a
comment about it.

Thanks a lot to all! You're always very helpful!
--helvio

From: glen herrmannsfeldt on 2 Jun 2010 17:16

helvio <helvio.vairinhos(a)googlemail.com> wrote:
> On Jun 2, 4:35�pm, nos...(a)see.signature (Richard Maine) wrote:
>> helvio <helvio.vairin...(a)googlemail.com> wrote:
>> > In sum, I think my doubts reduce to the question of whether the
>> > efficiency of accessing the physical memory depends on the size of the
>> > allocated memory, or if it is independent of it.

>> It should be independent of it, or anyway close enough that
>> you won't be able to measure the difference.

In some theoretical calculations log(n) is used, and as an
approximation that probably isn't so bad.

>> > I also have the following related question: can the ALLOCATION /
>> > DEALLOCATION statements slow down the program if they are called
>> > multiple times, as compared with a single static declaration of "U"?

>> Yes, definitely. Of course, as with many things, that's only going to
>> matter if it is in an inner enough loop to be called lots of times.

> Yup! These statements are indeed called many times, in one of the most
> time consuming areas of my code. But the amount of matrix
> multiplications among rank-2 subarrays of the U's and V's between
> ALLOCATE and DEALLOCATE will definitely overshadow the amount of time
> taken to allocate them.

It is mostly a problem in object-oriented programming. Objects
have to be allocated and deallocated, often many times. A matrix
usually won't be allocated in the inner loop, but two loops out
(for the two dimensions of the matrix).

> My main worry was that the program could slow down if memory access
> depended significantly on the size of the total allocated memory
> (because I will have to access the memory locations of the V's many
> times for matrix-multiplying their rank-2 subarrays). I always had
> this idea in my head that physical memory access is one of the slowest
> elementary processes, I just don't have a feeling for how
> significantly slow it is.

Well, it is but the rules are more complicated. Consider the two:

DO I=1,N
DO J=1,N
A(I,J)=B(I,J)+C(I,J)
ENDDO
ENDDO

DO J=1,N
DO I=1,N
A(I,J)=B(I,J)+C(I,J)
ENDDO
ENDDO

The number of memory accesses is the same for both,
but the times might be very different.

> But since you say that memory access is essentially independent of the
> size of allocated memory, all I have to worry about is not to exceed
> the available physical memory (N = 50000 was just an example). ;)

Probably you should stay below about half physical memory.
The OS may be using some, and that can make a big difference.

>> It would seem that some of the classic advice on performance
>> optimization is in order. I don't feel like digging up the
>> exact quotes; there are some pretty well known ones.

> Yup! This doesn't stop me at all from writing my code! I asked it
> mostly as an academic question, to learn a little bit more.

>> 1. Worry about making the code right before you worry
>> about making it fast.

> *thumbs up*

>> 2. Improvements in algorithm are worth far more than code tweaks.

> *thumbs up*

>> 3. Even major improvements in code performance aren't going to matter
>> unless they are in time-critical potions of the program in the first
>> place. That one applies to your question above. Allocation and
>> deallocation do take time, but if much computation happens between the
>> allocation and deallocation, then their time usage is not likely to
>> matter relative to the computation.

Well, if you add a bunch of 2x2 matrices you might notice...

> Yup! Not a problem.

>> 4. When you do get to trying to tweak code to improve performance,
>> *MEASURE* the effects with your own code. Even experts can and regularly
>> do get surprised and things do vary from code to code. That means you
>> should not just accept performance judgements that people might give you
>> here. Yes, "people" includes me.

Code to Code, compiler to compiler, system to system.
Way too many ways to keep track of.

> It's not a major tweak, it's just about choosing between two
> straightforward ways of declaring arrays, both of which work. I'll
> stick to one of them until it's time to test my code. Only then I will
> measure the difference between the two options. And if I witness any
> significant effects, then I might come back to this post and make a
> comment about it.

For smaller arrays, one guess is that it takes one more memory
access for automatic over static, and one more for allocatable
over automatic. That can be less true as they get larger, though.

-- glen

From: robin on 2 Jun 2010 21:57

"helvio" <helvio.vairinhos(a)googlemail.com> wrote in message
news:5a8433aa-8b46-4485-98b0-dab0b822b1d7(a)f14g2000vbn.googlegroups.com...

I also have the following related question: can the ALLOCATION /
DEALLOCATION statements slow down the program if they are called
multiple times, as compared with a single static declaration of "U"?
e.g. by introducing a loop in my example above:

do i=1,M
call using_UV ! U is allocated here
call kill_U ! U is deallocated here
end do

ALLOCATE and DEALLOCATE do take extra time,
but it is unlikely you could notice the time, let alone measure it.

CALLing a subroutine will take more time than ALLOCATE does.

You would have to ALLOCATE / DEALLOCATE a million
times before the time becomes significant,
and even then, the time taken by the remainder of the loop
will be far far far greater than the time taken by ALLOCATE / DEALLOCATE.

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: New gfortran bug
Next: optimized code crashes under ifort