optimization [MFC]

Prev: Skinning dialog box using Feature pack
Next: System tray menu questions.

From: Joseph M. Newcomer on 16 Mar 2010 13:50

See below...
On Tue, 16 Mar 2010 21:38:46 +0800, "Bill Brehm" <don't want spam> wrote:

>"Your choice of CArray is probably a Bad Idea."
>
>It IS turning out to be a bad idea. This was one of the things I said I
>would get around to asking. If I give a growby of 10000, the program runs at
>a reasonable speed. When I set it much higher, like 100,000,000, thinking
>that i'll have fewer reallocations, it runs really slowly. I would have
>expected the opposite until 1e8 primes were calculated, then a large delay
>during reallocation. But it's slow from the start.
***
Someone one complained that they could only add about 5000 elements/second to CArray. I
tried the same experiment, and found that indeed that was the number I was getting, in
Debug mode. They were adding 1.5M elements to the CArray. I set the expansion quantum to
500,000 and measured in Release mode (which is much faster) and got 15M elements/second.
CArray without using SetSize to modify the quantum is horrendous performance,
approximately exponential in time with the number of elements. std::vector is very close
to linear (I did the measurements, but never published them).
*****
>I did some
>PerformanceQueries and found that adding something to the CArray (when it
>grew by 10000) was ten times slow than doing a division as part of the prime
>check. I don't know if it's a paging issue or some inefficiency in CArray
>itself. I guess it's much worse with a large growby but i don't know why and
>i didn't measure to check.
****
Add 1 element: allocate a new array of size+1; copy size elements from the old array, add
the new element to the end. Expoential peformance; the time goes up as approximately the
square of the number of elements (do the arithmetic for the copy operation). Never
believe ANYTHING you measure in Debug mode; it is irrelevant and will always be horrible.
This is because there are bounds checks and new-storage-iniitailization that take time.
Most paging performance is irrelevant unless your vector gets into the hundreds of
megabytes in size, at which point you definitely have the wrong representation.
****
>
>Would the release version be much faster than the debug version (less
>checking for bounds and other stuff)?
****
Absolutely! See previous comment; you CANNOT rely on "Debug" measurements for true
performance guarantees.
****
>Is there a way (with task manager
>maybe) to tell if the CArray is staying in memory or being swapped out to
>disk? Is there a way to force the CArray to stay in memory and not swap out?
>Any guess why it might be so slow?
****
Watch for increases in paging faults using the perfmon tool. A substantial increase in
paging frequency is a Bad Sign. Note that mking your algorithm cache-aware can generally
improve performance by 10X, but you need to know the cache parameters for each computer
(which you can discover) to write code like this, and it doesn't always work due to the
nature of the particular algorithm involved.
****
>
>"you have made a fundaemental failure in thinking here"
>
>Maybe so, but it turns out that the CList is much faster than the CArray, in
>my program at least. Too bad it takes four times the memory and I run out
>before I can reach 2^32. Also, it takes much longer to clear out the CList
>than the CArray, which is logical since it's made up of many tiny parts
>instead of a few large parts.
****
But tha fundamental failure in thinking is thinking that the extra space actually matters.
It may not. Note that while adding to an array with SetSize(..., 0) (the default), is
exponential in time, adding to a list is constant time, so burning the extra space would
not matter. Unless you exceed the working set size, at which point, it becomes a paging
problem, or if the elements are widely scattered in memory, or the storage allocator
becomes the performance bottleneck (which it can). If you anticipate 1,000,000 elements,
do a SetSize(0, 1000000); when the array is empty; then the array append is constant time
up to 1,000,000 elements also, and at 1,000,001 there is a blip i n the performance as the
array is reallocated to size 2,000,000 and then time is constant again to 2,000,000. IF
you are not doing the SetSize, then your assumption are all broken and you will get
exponential cost. So you cannot
(a) predict performance if you are in Debug mode
(b) predict performance if you have not forced preallocation (SetSize)
which is handled "by magic" in std::vector
You have apparently committed both sins here, so your choices don't have a sound basis.
joe
****
>
>I found another issue with CList which I think I would consider a bug. When
>I Serialize back from disk into a CArray, anything in the array is wiped out
>and overwritten. When I do that into the CList, the disk data is appended to
>the end of what is in the list. I didn't see that described in the help but
>I also didn't find any mention of this "bug" in a quick Google search.
****
No, serialization is intended to replace the contents, so you are seeing the specified
behavior. Appending was never considered as a part of the design. In fact, it would have
been considered a bug by the designers if serialization appended.
****
>
>"I wouldn't touch MFC serialization as far as I could throw a mainframe."
>
>Ha ha, how far can you throw a mainframe? I am learning to agree with you on
>this. I don't think I ever used it before. I started to use it this time
>because it is much faster and uses less space than saving and loading in
>text format. But if it's not trustworthy then it's not worth it.
****
Given that I'm actually physically handicapped, and probably couldn't throw a mere laptop
very far, throwing a 1-ton mainframe isn't likely. Probably measured in nanometers. The
real problem is schema evolution, a problem I had to address when I designed what is a lot
like XML back in 1977 (except our design was far better). Today, I use XML for my data
representation.
*****
>
>"Note that most prime algorithms are variants of the Seive of Eastosthenes"
>
>Yes, I'm using that. My interest in playing with primes was triggered by
>trying to learn about encryption and modular arithmetic used in RSA. But my
>first program ever was a prime number generator running on some HP Basic
>computer back in the early 70's, so it's interesting to me to bring that up
>to date. My program also can run the old way by dividing by primes until the
>sqrt of the number under test. I think my original Basic program divided by
>all odd numbers because there wasn't enough memory to store the array of
>primes on that old computer.
****
Note that RSA wants to use massive prime numbers (which are almost impossible to factor).
I have been talking with a colleague who is an encryption guy and we are talking about
developing a course in encryption when we would use RSA-8 (8-bit primes) so students could
work out the encryption with just a calculator or even pencil and paper and then show why
short key are vulnerable by asking them to work out encrypted data (since 8-bit primes
would be easy to crack), to give some real feel for the algorithms. He has taken a course
from Ron Rivest (the R of RSA) and is really into this stuff.
****
>
>Actually the sieve technique should scale to large integers (if you mean the
>64 bit type and not arbitrarily large) once one has an array of primes to
>2^32 and if the sieving is done in segments.
****
It doesn't scale; note that modern encryption might use 1024, 2048 or 4096-bit primes, and
facroting these with moden computers and moden algorithms takes intervals such that the
Sun is likely to expand into a Red Giant and engulf the Earth before the algorithm
terminates.
joe
****
>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
>news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a(a)4ax.com...
>> See below...
>> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam> wrote:
>>
>>>I am working on a program to calculate primes. It's just for fun, not for
>>>work or school, if that matters here. I'm trying to optimize it and I'll
>>>have some more questions about that later.
>>>
>>>Today I changed the CArray used to hold the primes to a CList based on a
>>>compile time selection, figuring I would save the time of reallocating and
>>>copying the CArray every time I went over the maximum allocated size. I'm
>>>aware of the pros and cons of CArray and CList in general.
>> ****
>> Your choice of CArray is probably a Bad Idea. First, you should SetSize
>> and set an
>> explicit reallocation quantum (the second parameter of SetSize) to
>> minimize the amount of
>> reallocation that is done. Or use std::vector which does this implicitly,
>> and had far,
>> far better performance than CArray.
>> ****
>>>A disadvantage is
>>>that the CList of primes takes (about) four time as much memory as the
>>>CArray holding the same number of primes. The primes are 32 bit ints so I
>>>don't know why it's four times instead of three times (previous pointer,
>>>prime and next pointer). What else is in there?
>> ****
>> you have made a fundaemental failure in thinking here. You think, for
>> example, that the
>> size of any elmeent matters, which is the wrong way to think about the
>> problem. What
>> matter is the memory footprint, and the induced paging, which is
>> potentially far worse in
>> a CLIst. Note that most prime algorithms are variants of the Seive of
>> Eastosthenes, which
>> means they don't scal to large integer (such as are using for encryption)
>> and you can find
>> enumerations of all the primes in the range of 0..2**32-1 on the Web and
>> in reference
>> books. LIss are nearly always less efficient than arrays due to
>> locality-of-reference
>> issues that affect cache behavior and paging.
>> *****
>>>
>>>I tried something that I wasn't sure would work but it did. I serialized a
>>>CArray to disk and then later I serialized it from disk to a CList. It
>>>worked. It's not a total surprise, because it wouldn't make sense to store
>>>the next and previous CList pointers into a serialize archive. My question
>>>here is, does anyone know if these were made intentionally compatible and
>>>if
>>>I can count on that working in all cases?
>> ****
>> I wouldn't touch MFC serialization as far as I could throw a mainframe. I
>> would not use
>> it for anything I cared about.
>> joe
>> ****
>>>
>>>Thanks...
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Hector Santos on 16 Mar 2010 16:41

Bill Brehm wrote:

> "Your choice of CArray is probably a Bad Idea."
>
> It IS turning out to be a bad idea. This was one of the things I said I
> would get around to asking. If I give a growby of 10000, the program runs at
> a reasonable speed. When I set it much higher, like 100,000,000, thinking
> that i'll have fewer reallocations, it runs really slowly. I would have
> expected the opposite until 1e8 primes were calculated, then a large delay
> during reallocation. But it's slow from the start. I did some
> PerformanceQueries and found that adding something to the CArray (when it
> grew by 10000) was ten times slow than doing a division as part of the prime
> check. I don't know if it's a paging issue or some inefficiency in CArray
> itself. I guess it's much worse with a large growby but i don't know why and
> i didn't measure to check.

Bill, your solution is a memory map. Find yourself a good "CFileMap"
class that allows you to an element type to it and handles it as a
array. We have our own that allows us to do this:

struct TArrayItems {
int x;
int y;
};

CFileMap<TArrayItems> myArray;

myArray.Open("some file name even temp");

myArray[6].x = 1;
myArray[6].y = 2;

....
myArray.Delete(10);

myArray.Close();

etc.

Such a good class will use the CArray or any other collection class as
a base class for file mapping.

--
HLS

From: "Bill Brehm" don't want on 17 Mar 2010 21:21

"Someone one complained that they could only add about 5000 elements/second
to CArray"

I am well aware of the growby parameter of SetSize() and have had it in my
program from the start. Yet CArray has an odd behavior in my program that
doesn't follow what one would expect knowing how the growby parameter is
supposed to make CArray work.

I did some tests with my program using both debug mode and release mode.
Also I varied the growby parameter. Lastly, I can store the primes in a
CArray, a CList or a preallocated array of unsigned ints (needs just over
200,000,000 to fit the primes up to 2^32). I use the sieve algorithm for all
timing results.

Results for debug mode:

int[] - around 43000 primes per second.
CList - around 39000 primes per second. The slightly lower speed is probably
due to run time allocation of the memory for new elements.
CArray, growby 100,000 - start at around 35000 pps then tapers down to 16000
pps after 400,000 primes have been calculated. suddenly drops to 8000 pps at
around 500,000 primes and tapers down to 4000 pps around 1,000,000 primes
calculated. one might expect it to slow down as it must reallocate and copy
an ever growing array ten times during this test. I don't understand the
sudden drop at 500,000. I know that displaying the results on the GUI takes
a lot of time, so I only display every 1000th prime. The timing will include
the reallocation time but only once in every 100,000 primes generated and
therefore only once in ever 100 display updates.
CArray, growby 100 - start at around 40000 pps then tapers down to 8000 pps
after 500,000 primes have been calculated then tapers down to 2700 pps
around 1,000,000 primes calculated. Not surprising that it's slower with
more reallocations.
Here where it gets weird.
CArray, growby 10,000,000 - start at around 400 - 500 pps. It's too slow to
wait for 1,000,000 primes to be generated to see the speed there. At 245,000
primes, the speed is the same. There is a longer delay at the start to
allocate the memory.

The sieve size is 1,000,000 odd numbers which generates 100,000 to 150,000
primes in the first few passes. The sieve runs very quickly. The time being
measured at such low speeds is basically the time to copy data into the
CArray.

Results for release mode:

int[], CList and CArray with growby 10,000,000 - around 57,000 pps from
start and still at 1,000,000 primes and even at 3,000,000 primes. CList
takes a long time to give back the memory when I remove all the elements
from it.
CArray with growby 100 - around 57,000 pps to start and tapers down to
13,000 at 1,000,000 primes. This is not surprising as there will be 10
reallocations for every time the timing update is displayed and so the
reallocation time is included in the timing calculation.

So I would conclude that in release mode, everything works logically. In
debug mode, I might have blamed page faults as a reason to slow down with a
large CArray growby, but then why doesn't it affect the int[] which is the
same size and allocated in one big chunk like the CArray is. I suspect there
is something funny (too much overhead) in CArray in debug mode for large
growby sizes.

"But tha fundamental failure in thinking is thinking that the extra space
actually matters. It may not."

It does in my case because I reach the 2GB limit before I reach the end of
the primes that I want to calculate up to 2^32. There are 200,000,000+
primes in that set and at 4 bytes per prime for CArray or int[], I can fit
in 800MB but with CList I would need 2.4GB at 12 bytes or 3,2GB at 16 bytes
per prime.

"It doesn't scale; note that modern encryption might use 1024, 2048 or
4096-bit primes"

So you were referring to arbitrarily large numbers. when you said 'large
integer', I thought you might be referring to LARGE_INTEGER which is only 64
bits. Using the primes up to 2^32, I could use the sieve to calculate all
the primes up to LARGE_INTEGER. There just wouldn't be enough RAM or disk
space (on my PC) to hold them because there are somewhere between 10^15 and
10^16 of them.

"No, serialization is intended to replace the contents, so you are seeing
the specified
behavior. Appending was never considered as a part of the design. In fact,
it would have
been considered a bug by the designers if serialization appended."

In my case it IS appending. If I start out with one, two or three elements
in the CList, after I serialize from disk, those original elements are still
in there. I don't know why but I think and agree with you that they should
not be.

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
news:fofvp5tejd5101f2pu7974u9upb2qmim9t(a)4ax.com...
> See below...
> On Tue, 16 Mar 2010 21:38:46 +0800, "Bill Brehm" <don't want spam> wrote:
>
>>"Your choice of CArray is probably a Bad Idea."
>>
>>It IS turning out to be a bad idea. This was one of the things I said I
>>would get around to asking. If I give a growby of 10000, the program runs
>>at
>>a reasonable speed. When I set it much higher, like 100,000,000, thinking
>>that i'll have fewer reallocations, it runs really slowly. I would have
>>expected the opposite until 1e8 primes were calculated, then a large delay
>>during reallocation. But it's slow from the start.
> ***
> Someone one complained that they could only add about 5000 elements/second
> to CArray. I
> tried the same experiment, and found that indeed that was the number I was
> getting, in
> Debug mode. They were adding 1.5M elements to the CArray. I set the
> expansion quantum to
> 500,000 and measured in Release mode (which is much faster) and got 15M
> elements/second.
> CArray without using SetSize to modify the quantum is horrendous
> performance,
> approximately exponential in time with the number of elements.
> std::vector is very close
> to linear (I did the measurements, but never published them).
> *****
>>I did some
>>PerformanceQueries and found that adding something to the CArray (when it
>>grew by 10000) was ten times slow than doing a division as part of the
>>prime
>>check. I don't know if it's a paging issue or some inefficiency in CArray
>>itself. I guess it's much worse with a large growby but i don't know why
>>and
>>i didn't measure to check.
> ****
> Add 1 element: allocate a new array of size+1; copy size elements from the
> old array, add
> the new element to the end. Expoential peformance; the time goes up as
> approximately the
> square of the number of elements (do the arithmetic for the copy
> operation). Never
> believe ANYTHING you measure in Debug mode; it is irrelevant and will
> always be horrible.
> This is because there are bounds checks and new-storage-iniitailization
> that take time.
> Most paging performance is irrelevant unless your vector gets into the
> hundreds of
> megabytes in size, at which point you definitely have the wrong
> representation.
> ****
>>
>>Would the release version be much faster than the debug version (less
>>checking for bounds and other stuff)?
> ****
> Absolutely! See previous comment; you CANNOT rely on "Debug" measurements
> for true
> performance guarantees.
> ****
>>Is there a way (with task manager
>>maybe) to tell if the CArray is staying in memory or being swapped out to
>>disk? Is there a way to force the CArray to stay in memory and not swap
>>out?
>>Any guess why it might be so slow?
> ****
> Watch for increases in paging faults using the perfmon tool. A
> substantial increase in
> paging frequency is a Bad Sign. Note that mking your algorithm
> cache-aware can generally
> improve performance by 10X, but you need to know the cache parameters for
> each computer
> (which you can discover) to write code like this, and it doesn't always
> work due to the
> nature of the particular algorithm involved.
> ****
>>
>>"you have made a fundaemental failure in thinking here"
>>
>>Maybe so, but it turns out that the CList is much faster than the CArray,
>>in
>>my program at least. Too bad it takes four times the memory and I run out
>>before I can reach 2^32. Also, it takes much longer to clear out the CList
>>than the CArray, which is logical since it's made up of many tiny parts
>>instead of a few large parts.
> ****
> But tha fundamental failure in thinking is thinking that the extra space
> actually matters.
> It may not. Note that while adding to an array with SetSize(..., 0) (the
> default), is
> exponential in time, adding to a list is constant time, so burning the
> extra space would
> not matter. Unless you exceed the working set size, at which point, it
> becomes a paging
> problem, or if the elements are widely scattered in memory, or the storage
> allocator
> becomes the performance bottleneck (which it can). If you anticipate
> 1,000,000 elements,
> do a SetSize(0, 1000000); when the array is empty; then the array append
> is constant time
> up to 1,000,000 elements also, and at 1,000,001 there is a blip i n the
> performance as the
> array is reallocated to size 2,000,000 and then time is constant again to
> 2,000,000. IF
> you are not doing the SetSize, then your assumption are all broken and you
> will get
> exponential cost. So you cannot
> (a) predict performance if you are in Debug mode
> (b) predict performance if you have not forced preallocation (SetSize)
> which is handled "by magic" in std::vector
> You have apparently committed both sins here, so your choices don't have a
> sound basis.
> joe
> ****
>>
>>I found another issue with CList which I think I would consider a bug.
>>When
>>I Serialize back from disk into a CArray, anything in the array is wiped
>>out
>>and overwritten. When I do that into the CList, the disk data is appended
>>to
>>the end of what is in the list. I didn't see that described in the help
>>but
>>I also didn't find any mention of this "bug" in a quick Google search.
> ****
> No, serialization is intended to replace the contents, so you are seeing
> the specified
> behavior. Appending was never considered as a part of the design. In
> fact, it would have
> been considered a bug by the designers if serialization appended.
> ****
>>
>>"I wouldn't touch MFC serialization as far as I could throw a mainframe."
>>
>>Ha ha, how far can you throw a mainframe? I am learning to agree with you
>>on
>>this. I don't think I ever used it before. I started to use it this time
>>because it is much faster and uses less space than saving and loading in
>>text format. But if it's not trustworthy then it's not worth it.
> ****
> Given that I'm actually physically handicapped, and probably couldn't
> throw a mere laptop
> very far, throwing a 1-ton mainframe isn't likely. Probably measured in
> nanometers. The
> real problem is schema evolution, a problem I had to address when I
> designed what is a lot
> like XML back in 1977 (except our design was far better). Today, I use
> XML for my data
> representation.
> *****
>>
>>"Note that most prime algorithms are variants of the Seive of
>>Eastosthenes"
>>
>>Yes, I'm using that. My interest in playing with primes was triggered by
>>trying to learn about encryption and modular arithmetic used in RSA. But
>>my
>>first program ever was a prime number generator running on some HP Basic
>>computer back in the early 70's, so it's interesting to me to bring that
>>up
>>to date. My program also can run the old way by dividing by primes until
>>the
>>sqrt of the number under test. I think my original Basic program divided
>>by
>>all odd numbers because there wasn't enough memory to store the array of
>>primes on that old computer.
> ****
> Note that RSA wants to use massive prime numbers (which are almost
> impossible to factor).
> I have been talking with a colleague who is an encryption guy and we are
> talking about
> developing a course in encryption when we would use RSA-8 (8-bit primes)
> so students could
> work out the encryption with just a calculator or even pencil and paper
> and then show why
> short key are vulnerable by asking them to work out encrypted data (since
> 8-bit primes
> would be easy to crack), to give some real feel for the algorithms. He
> has taken a course
> from Ron Rivest (the R of RSA) and is really into this stuff.
> ****
>>
>>Actually the sieve technique should scale to large integers (if you mean
>>the
>>64 bit type and not arbitrarily large) once one has an array of primes to
>>2^32 and if the sieving is done in segments.
> ****
> It doesn't scale; note that modern encryption might use 1024, 2048 or
> 4096-bit primes, and
> facroting these with moden computers and moden algorithms takes intervals
> such that the
> Sun is likely to expand into a Red Giant and engulf the Earth before the
> algorithm
> terminates.
> joe
> ****
>>
>>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
>>news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a(a)4ax.com...
>>> See below...
>>> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam>
>>> wrote:
>>>
>>>>I am working on a program to calculate primes. It's just for fun, not
>>>>for
>>>>work or school, if that matters here. I'm trying to optimize it and I'll
>>>>have some more questions about that later.
>>>>
>>>>Today I changed the CArray used to hold the primes to a CList based on a
>>>>compile time selection, figuring I would save the time of reallocating
>>>>and
>>>>copying the CArray every time I went over the maximum allocated size.
>>>>I'm
>>>>aware of the pros and cons of CArray and CList in general.
>>> ****
>>> Your choice of CArray is probably a Bad Idea. First, you should SetSize
>>> and set an
>>> explicit reallocation quantum (the second parameter of SetSize) to
>>> minimize the amount of
>>> reallocation that is done. Or use std::vector which does this
>>> implicitly,
>>> and had far,
>>> far better performance than CArray.
>>> ****
>>>>A disadvantage is
>>>>that the CList of primes takes (about) four time as much memory as the
>>>>CArray holding the same number of primes. The primes are 32 bit ints so
>>>>I
>>>>don't know why it's four times instead of three times (previous pointer,
>>>>prime and next pointer). What else is in there?
>>> ****
>>> you have made a fundaemental failure in thinking here. You think, for
>>> example, that the
>>> size of any elmeent matters, which is the wrong way to think about the
>>> problem. What
>>> matter is the memory footprint, and the induced paging, which is
>>> potentially far worse in
>>> a CLIst. Note that most prime algorithms are variants of the Seive of
>>> Eastosthenes, which
>>> means they don't scal to large integer (such as are using for
>>> encryption)
>>> and you can find
>>> enumerations of all the primes in the range of 0..2**32-1 on the Web and
>>> in reference
>>> books. LIss are nearly always less efficient than arrays due to
>>> locality-of-reference
>>> issues that affect cache behavior and paging.
>>> *****
>>>>
>>>>I tried something that I wasn't sure would work but it did. I serialized
>>>>a
>>>>CArray to disk and then later I serialized it from disk to a CList. It
>>>>worked. It's not a total surprise, because it wouldn't make sense to
>>>>store
>>>>the next and previous CList pointers into a serialize archive. My
>>>>question
>>>>here is, does anyone know if these were made intentionally compatible
>>>>and
>>>>if
>>>>I can count on that working in all cases?
>>> ****
>>> I wouldn't touch MFC serialization as far as I could throw a mainframe.
>>> I
>>> would not use
>>> it for anything I cared about.
>>> joe
>>> ****
>>>>
>>>>Thanks...
>>>>
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer(a)flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 18 Mar 2010 13:43

Sorry, because I can't see clearly, I think I sent a reply by hitting the wrong keystroke
combinattion....

What I had said was that allocation requires extra overhead. What I was about to say was
that the most efficient representation would be a bit vector of bits that were 1 for a
prime and 0 for a non-prime; this would require 2^32 bits, or 2^29 bytes, If you further
apply RLE (run length encoding) to this bit vector, you get an even more compact
representation which can be optimized by treating the special case of a 1-bit surrounded
by a lot of 0 bits with a separate encoding. I would consider examining this over using a
vector of primes.
joe

More below...

On Thu, 18 Mar 2010 09:21:28 +0800, "Bill Brehm" <don't want spam> wrote:

>"Someone one complained that they could only add about 5000 elements/second
>to CArray"
>
>I am well aware of the growby parameter of SetSize() and have had it in my
>program from the start. Yet CArray has an odd behavior in my program that
>doesn't follow what one would expect knowing how the growby parameter is
>supposed to make CArray work.
>
>I did some tests with my program using both debug mode and release mode.
>Also I varied the growby parameter. Lastly, I can store the primes in a
>CArray, a CList or a preallocated array of unsigned ints (needs just over
>200,000,000 to fit the primes up to 2^32). I use the sieve algorithm for all
>timing results.
>
>Results for debug mode:
>
>int[] - around 43000 primes per second.
>CList - around 39000 primes per second. The slightly lower speed is probably
>due to run time allocation of the memory for new elements.
>CArray, growby 100,000 - start at around 35000 pps then tapers down to 16000
>pps after 400,000 primes have been calculated. suddenly drops to 8000 pps at
>around 500,000 primes and tapers down to 4000 pps around 1,000,000 primes
>calculated. one might expect it to slow down as it must reallocate and copy
>an ever growing array ten times during this test. I don't understand the
>sudden drop at 500,000. I know that displaying the results on the GUI takes
>a lot of time, so I only display every 1000th prime. The timing will include
>the reallocation time but only once in every 100,000 primes generated and
>therefore only once in ever 100 display updates.
>CArray, growby 100 - start at around 40000 pps then tapers down to 8000 pps
>after 500,000 primes have been calculated then tapers down to 2700 pps
>around 1,000,000 primes calculated. Not surprising that it's slower with
>more reallocations.
>Here where it gets weird.
>CArray, growby 10,000,000 - start at around 400 - 500 pps. It's too slow to
>wait for 1,000,000 primes to be generated to see the speed there. At 245,000
>primes, the speed is the same. There is a longer delay at the start to
>allocate the memory.
***
In case the last message didn't make it, what I indicated here was the first element
involves adding 10,000,000 elements, which is going to badly skew your first average. If
you add "1" or "3" outside the measurement interval to factor that startup transient out
of your measurements you might get a more interesting (and less biased) result. It all
depends on what you are trying to measure.
****
>
>The sieve size is 1,000,000 odd numbers which generates 100,000 to 150,000
>primes in the first few passes. The sieve runs very quickly. The time being
>measured at such low speeds is basically the time to copy data into the
>CArray.
>
>Results for release mode:
>
>int[], CList and CArray with growby 10,000,000 - around 57,000 pps from
>start and still at 1,000,000 primes and even at 3,000,000 primes. CList
>takes a long time to give back the memory when I remove all the elements
>from it.
>CArray with growby 100 - around 57,000 pps to start and tapers down to
>13,000 at 1,000,000 primes. This is not surprising as there will be 10
>reallocations for every time the timing update is displayed and so the
>reallocation time is included in the timing calculation.
>
>So I would conclude that in release mode, everything works logically. In
>debug mode, I might have blamed page faults as a reason to slow down with a
>large CArray growby, but then why doesn't it affect the int[] which is the
>same size and allocated in one big chunk like the CArray is. I suspect there
>is something funny (too much overhead) in CArray in debug mode for large
>growby sizes.
****
There is a lot of initialization that goes on in debug mode that you don't get in release
mode. This can result in a lot of paging and in a lot of data copies of the
initialization values. A big blip in the radar. Measure the time it takes to add the
first element to that huge growby array and you might want to factor that out.
****
>
>"But tha fundamental failure in thinking is thinking that the extra space
>actually matters. It may not."
>
>It does in my case because I reach the 2GB limit before I reach the end of
>the primes that I want to calculate up to 2^32. There are 200,000,000+
>primes in that set and at 4 bytes per prime for CArray or int[], I can fit
>in 800MB but with CList I would need 2.4GB at 12 bytes or 3,2GB at 16 bytes
>per prime.
****
Plus storage allocator overhead plus allocation quantum roundup, you may be allocating
between 32 and 72 bytes per storage element. ALlocation in debug mode is much larger.
****
>
>"It doesn't scale; note that modern encryption might use 1024, 2048 or
>4096-bit primes"
>
>So you were referring to arbitrarily large numbers. when you said 'large
>integer', I thought you might be referring to LARGE_INTEGER which is only 64
>bits. Using the primes up to 2^32, I could use the sieve to calculate all
>the primes up to LARGE_INTEGER. There just wouldn't be enough RAM or disk
>space (on my PC) to hold them because there are somewhere between 10^15 and
>10^16 of them.
****
LARGE_INTEGER, which is only 64 bits, is too small for effective encryption. It is
possible to crack 64-bit encryption in hours to days, making it ineffective. 256-bit to
4096-bit encryption is considered state-of-the-art for PKI today. Really secure
encryption uses one-time pads generated by truly random processes (such as radioactive
decay)
****
>
>"No, serialization is intended to replace the contents, so you are seeing
>the specified
>behavior. Appending was never considered as a part of the design. In fact,
>it would have
>been considered a bug by the designers if serialization appended."
>
>In my case it IS appending. If I start out with one, two or three elements
>in the CList, after I serialize from disk, those original elements are still
>in there. I don't know why but I think and agree with you that they should
>not be.
*****
one of the reasons I don't trust serialization is that it has too many "unspecified"
features. I can write serialization code in seconds that does what I want, and I know the
specs.
joe
*****
>
>
>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
>news:fofvp5tejd5101f2pu7974u9upb2qmim9t(a)4ax.com...
>> See below...
>> On Tue, 16 Mar 2010 21:38:46 +0800, "Bill Brehm" <don't want spam> wrote:
>>
>>>"Your choice of CArray is probably a Bad Idea."
>>>
>>>It IS turning out to be a bad idea. This was one of the things I said I
>>>would get around to asking. If I give a growby of 10000, the program runs
>>>at
>>>a reasonable speed. When I set it much higher, like 100,000,000, thinking
>>>that i'll have fewer reallocations, it runs really slowly. I would have
>>>expected the opposite until 1e8 primes were calculated, then a large delay
>>>during reallocation. But it's slow from the start.
>> ***
>> Someone one complained that they could only add about 5000 elements/second
>> to CArray. I
>> tried the same experiment, and found that indeed that was the number I was
>> getting, in
>> Debug mode. They were adding 1.5M elements to the CArray. I set the
>> expansion quantum to
>> 500,000 and measured in Release mode (which is much faster) and got 15M
>> elements/second.
>> CArray without using SetSize to modify the quantum is horrendous
>> performance,
>> approximately exponential in time with the number of elements.
>> std::vector is very close
>> to linear (I did the measurements, but never published them).
>> *****
>>>I did some
>>>PerformanceQueries and found that adding something to the CArray (when it
>>>grew by 10000) was ten times slow than doing a division as part of the
>>>prime
>>>check. I don't know if it's a paging issue or some inefficiency in CArray
>>>itself. I guess it's much worse with a large growby but i don't know why
>>>and
>>>i didn't measure to check.
>> ****
>> Add 1 element: allocate a new array of size+1; copy size elements from the
>> old array, add
>> the new element to the end. Expoential peformance; the time goes up as
>> approximately the
>> square of the number of elements (do the arithmetic for the copy
>> operation). Never
>> believe ANYTHING you measure in Debug mode; it is irrelevant and will
>> always be horrible.
>> This is because there are bounds checks and new-storage-iniitailization
>> that take time.
>> Most paging performance is irrelevant unless your vector gets into the
>> hundreds of
>> megabytes in size, at which point you definitely have the wrong
>> representation.
>> ****
>>>
>>>Would the release version be much faster than the debug version (less
>>>checking for bounds and other stuff)?
>> ****
>> Absolutely! See previous comment; you CANNOT rely on "Debug" measurements
>> for true
>> performance guarantees.
>> ****
>>>Is there a way (with task manager
>>>maybe) to tell if the CArray is staying in memory or being swapped out to
>>>disk? Is there a way to force the CArray to stay in memory and not swap
>>>out?
>>>Any guess why it might be so slow?
>> ****
>> Watch for increases in paging faults using the perfmon tool. A
>> substantial increase in
>> paging frequency is a Bad Sign. Note that mking your algorithm
>> cache-aware can generally
>> improve performance by 10X, but you need to know the cache parameters for
>> each computer
>> (which you can discover) to write code like this, and it doesn't always
>> work due to the
>> nature of the particular algorithm involved.
>> ****
>>>
>>>"you have made a fundaemental failure in thinking here"
>>>
>>>Maybe so, but it turns out that the CList is much faster than the CArray,
>>>in
>>>my program at least. Too bad it takes four times the memory and I run out
>>>before I can reach 2^32. Also, it takes much longer to clear out the CList
>>>than the CArray, which is logical since it's made up of many tiny parts
>>>instead of a few large parts.
>> ****
>> But tha fundamental failure in thinking is thinking that the extra space
>> actually matters.
>> It may not. Note that while adding to an array with SetSize(..., 0) (the
>> default), is
>> exponential in time, adding to a list is constant time, so burning the
>> extra space would
>> not matter. Unless you exceed the working set size, at which point, it
>> becomes a paging
>> problem, or if the elements are widely scattered in memory, or the storage
>> allocator
>> becomes the performance bottleneck (which it can). If you anticipate
>> 1,000,000 elements,
>> do a SetSize(0, 1000000); when the array is empty; then the array append
>> is constant time
>> up to 1,000,000 elements also, and at 1,000,001 there is a blip i n the
>> performance as the
>> array is reallocated to size 2,000,000 and then time is constant again to
>> 2,000,000. IF
>> you are not doing the SetSize, then your assumption are all broken and you
>> will get
>> exponential cost. So you cannot
>> (a) predict performance if you are in Debug mode
>> (b) predict performance if you have not forced preallocation (SetSize)
>> which is handled "by magic" in std::vector
>> You have apparently committed both sins here, so your choices don't have a
>> sound basis.
>> joe
>> ****
>>>
>>>I found another issue with CList which I think I would consider a bug.
>>>When
>>>I Serialize back from disk into a CArray, anything in the array is wiped
>>>out
>>>and overwritten. When I do that into the CList, the disk data is appended
>>>to
>>>the end of what is in the list. I didn't see that described in the help
>>>but
>>>I also didn't find any mention of this "bug" in a quick Google search.
>> ****
>> No, serialization is intended to replace the contents, so you are seeing
>> the specified
>> behavior. Appending was never considered as a part of the design. In
>> fact, it would have
>> been considered a bug by the designers if serialization appended.
>> ****
>>>
>>>"I wouldn't touch MFC serialization as far as I could throw a mainframe."
>>>
>>>Ha ha, how far can you throw a mainframe? I am learning to agree with you
>>>on
>>>this. I don't think I ever used it before. I started to use it this time
>>>because it is much faster and uses less space than saving and loading in
>>>text format. But if it's not trustworthy then it's not worth it.
>> ****
>> Given that I'm actually physically handicapped, and probably couldn't
>> throw a mere laptop
>> very far, throwing a 1-ton mainframe isn't likely. Probably measured in
>> nanometers. The
>> real problem is schema evolution, a problem I had to address when I
>> designed what is a lot
>> like XML back in 1977 (except our design was far better). Today, I use
>> XML for my data
>> representation.
>> *****
>>>
>>>"Note that most prime algorithms are variants of the Seive of
>>>Eastosthenes"
>>>
>>>Yes, I'm using that. My interest in playing with primes was triggered by
>>>trying to learn about encryption and modular arithmetic used in RSA. But
>>>my
>>>first program ever was a prime number generator running on some HP Basic
>>>computer back in the early 70's, so it's interesting to me to bring that
>>>up
>>>to date. My program also can run the old way by dividing by primes until
>>>the
>>>sqrt of the number under test. I think my original Basic program divided
>>>by
>>>all odd numbers because there wasn't enough memory to store the array of
>>>primes on that old computer.
>> ****
>> Note that RSA wants to use massive prime numbers (which are almost
>> impossible to factor).
>> I have been talking with a colleague who is an encryption guy and we are
>> talking about
>> developing a course in encryption when we would use RSA-8 (8-bit primes)
>> so students could
>> work out the encryption with just a calculator or even pencil and paper
>> and then show why
>> short key are vulnerable by asking them to work out encrypted data (since
>> 8-bit primes
>> would be easy to crack), to give some real feel for the algorithms. He
>> has taken a course
>> from Ron Rivest (the R of RSA) and is really into this stuff.
>> ****
>>>
>>>Actually the sieve technique should scale to large integers (if you mean
>>>the
>>>64 bit type and not arbitrarily large) once one has an array of primes to
>>>2^32 and if the sieving is done in segments.
>> ****
>> It doesn't scale; note that modern encryption might use 1024, 2048 or
>> 4096-bit primes, and
>> facroting these with moden computers and moden algorithms takes intervals
>> such that the
>> Sun is likely to expand into a Red Giant and engulf the Earth before the
>> algorithm
>> terminates.
>> joe
>> ****
>>>
>>>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
>>>news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a(a)4ax.com...
>>>> See below...
>>>> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam>
>>>> wrote:
>>>>
>>>>>I am working on a program to calculate primes. It's just for fun, not
>>>>>for
>>>>>work or school, if that matters here. I'm trying to optimize it and I'll
>>>>>have some more questions about that later.
>>>>>
>>>>>Today I changed the CArray used to hold the primes to a CList based on a
>>>>>compile time selection, figuring I would save the time of reallocating
>>>>>and
>>>>>copying the CArray every time I went over the maximum allocated size.
>>>>>I'm
>>>>>aware of the pros and cons of CArray and CList in general.
>>>> ****
>>>> Your choice of CArray is probably a Bad Idea. First, you should SetSize
>>>> and set an
>>>> explicit reallocation quantum (the second parameter of SetSize) to
>>>> minimize the amount of
>>>> reallocation that is done. Or use std::vector which does this
>>>> implicitly,
>>>> and had far,
>>>> far better performance than CArray.
>>>> ****
>>>>>A disadvantage is
>>>>>that the CList of primes takes (about) four time as much memory as the
>>>>>CArray holding the same number of primes. The primes are 32 bit ints so
>>>>>I
>>>>>don't know why it's four times instead of three times (previous pointer,
>>>>>prime and next pointer). What else is in there?
>>>> ****
>>>> you have made a fundaemental failure in thinking here. You think, for
>>>> example, that the
>>>> size of any elmeent matters, which is the wrong way to think about the
>>>> problem. What
>>>> matter is the memory footprint, and the induced paging, which is
>>>> potentially far worse in
>>>> a CLIst. Note that most prime algorithms are variants of the Seive of
>>>> Eastosthenes, which
>>>> means they don't scal to large integer (such as are using for
>>>> encryption)
>>>> and you can find
>>>> enumerations of all the primes in the range of 0..2**32-1 on the Web and
>>>> in reference
>>>> books. LIss are nearly always less efficient than arrays due to
>>>> locality-of-reference
>>>> issues that affect cache behavior and paging.
>>>> *****
>>>>>
>>>>>I tried something that I wasn't sure would work but it did. I serialized
>>>>>a
>>>>>CArray to disk and then later I serialized it from disk to a CList. It
>>>>>worked. It's not a total surprise, because it wouldn't make sense to
>>>>>store
>>>>>the next and previous CList pointers into a serialize archive. My
>>>>>question
>>>>>here is, does anyone know if these were made intentionally compatible
>>>>>and
>>>>>if
>>>>>I can count on that working in all cases?
>>>> ****
>>>> I wouldn't touch MFC serialization as far as I could throw a mainframe.
>>>> I
>>>> would not use
>>>> it for anything I cared about.
>>>> joe
>>>> ****
>>>>>
>>>>>Thanks...
>>>>>
>>>> Joseph M. Newcomer [MVP]
>>>> email: newcomer(a)flounder.com
>>>> Web: http://www.flounder.com
>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: "Bill" don't want more on 19 Mar 2010 01:48

Hi Joe,

I had considered using the bit vector idea idea, but didn't proceed with it
because I was concerned about the amount of time it would take to find for
the next prime compared to just indexing into an array. But the bit vector
needs only 500MB for primes up to 2^32 while the array of unsigned ints
needs ~800MB. Furthermore, the bit vector of boolean bits can serve as the
sieve itself too, meaning I don't need two separate memory structures and
don't need to copy the found primes from the sieve over to the store.What I
hadn't thought of is compressing with RLE. That's a good idea and I might
just play around with both of these later.

I've acheived the goal of calculating the primes up to 2^32. What I started
out with, using division, using CArray and using debug mode, would have
taken months to complete the calculations. What I have now, using the sieve,
using unsigned int[] and, well, still using debug mode because it's fast
enough after the other two bottlenecks were found, can do the whole job in
an hour or so. Thanks for the tips, ideas, discussions and help in finding
those bottlenecks.

I think I'll move on to writing my own class to handle arbitrarily large
integers and see where I want to go from there. I won't be trying to
calculate all the primes up to any number anymore, but I can use it for
other prime, modulo abd encryption experiments.

There was one more question related to speed that I haven't asked yet. When
the program is running full speed, the task manager shows it's only using
49 - 51 percent of the CPU. The idle process is using most of the rest. I
did have a YieldTime() call in the loop to peek and process Windows messages
to keep the GUI active. But when I remove that it went to to 39%
utilization. Setting the priority to real time to high or realtime didn't
help. Do you know why it's not making full use of the CPU and how to force
that? My processor does have two CPUs, but the affinity is set to both and
all the CPU usage numbers add up to 100%, not 200%, so I believe the 50% is
not due to only running on one of the two processors.

Thanks,

Bill

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
news:vro4q5t923qa256ej597fbkl88fg0dot6t(a)4ax.com...
> Sorry, because I can't see clearly, I think I sent a reply by hitting the
> wrong keystroke
> combinattion....
>
> What I had said was that allocation requires extra overhead. What I was
> about to say was
> that the most efficient representation would be a bit vector of bits that
> were 1 for a
> prime and 0 for a non-prime; this would require 2^32 bits, or 2^29 bytes,
> If you further
> apply RLE (run length encoding) to this bit vector, you get an even more
> compact
> representation which can be optimized by treating the special case of a
> 1-bit surrounded
> by a lot of 0 bits with a separate encoding. I would consider examining
> this over using a
> vector of primes.
> joe
>
> More below...
>
> On Thu, 18 Mar 2010 09:21:28 +0800, "Bill Brehm" <don't want spam> wrote:
>
>>"Someone one complained that they could only add about 5000
>>elements/second
>>to CArray"
>>
>>I am well aware of the growby parameter of SetSize() and have had it in my
>>program from the start. Yet CArray has an odd behavior in my program that
>>doesn't follow what one would expect knowing how the growby parameter is
>>supposed to make CArray work.
>>
>>I did some tests with my program using both debug mode and release mode.
>>Also I varied the growby parameter. Lastly, I can store the primes in a
>>CArray, a CList or a preallocated array of unsigned ints (needs just over
>>200,000,000 to fit the primes up to 2^32). I use the sieve algorithm for
>>all
>>timing results.
>>
>>Results for debug mode:
>>
>>int[] - around 43000 primes per second.
>>CList - around 39000 primes per second. The slightly lower speed is
>>probably
>>due to run time allocation of the memory for new elements.
>>CArray, growby 100,000 - start at around 35000 pps then tapers down to
>>16000
>>pps after 400,000 primes have been calculated. suddenly drops to 8000 pps
>>at
>>around 500,000 primes and tapers down to 4000 pps around 1,000,000 primes
>>calculated. one might expect it to slow down as it must reallocate and
>>copy
>>an ever growing array ten times during this test. I don't understand the
>>sudden drop at 500,000. I know that displaying the results on the GUI
>>takes
>>a lot of time, so I only display every 1000th prime. The timing will
>>include
>>the reallocation time but only once in every 100,000 primes generated and
>>therefore only once in ever 100 display updates.
>>CArray, growby 100 - start at around 40000 pps then tapers down to 8000
>>pps
>>after 500,000 primes have been calculated then tapers down to 2700 pps
>>around 1,000,000 primes calculated. Not surprising that it's slower with
>>more reallocations.
>>Here where it gets weird.
>>CArray, growby 10,000,000 - start at around 400 - 500 pps. It's too slow
>>to
>>wait for 1,000,000 primes to be generated to see the speed there. At
>>245,000
>>primes, the speed is the same. There is a longer delay at the start to
>>allocate the memory.
> ***
> In case the last message didn't make it, what I indicated here was the
> first element
> involves adding 10,000,000 elements, which is going to badly skew your
> first average. If
> you add "1" or "3" outside the measurement interval to factor that startup
> transient out
> of your measurements you might get a more interesting (and less biased)
> result. It all
> depends on what you are trying to measure.
> ****
>>
>>The sieve size is 1,000,000 odd numbers which generates 100,000 to 150,000
>>primes in the first few passes. The sieve runs very quickly. The time
>>being
>>measured at such low speeds is basically the time to copy data into the
>>CArray.
>>
>>Results for release mode:
>>
>>int[], CList and CArray with growby 10,000,000 - around 57,000 pps from
>>start and still at 1,000,000 primes and even at 3,000,000 primes. CList
>>takes a long time to give back the memory when I remove all the elements
>>from it.
>>CArray with growby 100 - around 57,000 pps to start and tapers down to
>>13,000 at 1,000,000 primes. This is not surprising as there will be 10
>>reallocations for every time the timing update is displayed and so the
>>reallocation time is included in the timing calculation.
>>
>>So I would conclude that in release mode, everything works logically. In
>>debug mode, I might have blamed page faults as a reason to slow down with
>>a
>>large CArray growby, but then why doesn't it affect the int[] which is the
>>same size and allocated in one big chunk like the CArray is. I suspect
>>there
>>is something funny (too much overhead) in CArray in debug mode for large
>>growby sizes.
> ****
> There is a lot of initialization that goes on in debug mode that you don't
> get in release
> mode. This can result in a lot of paging and in a lot of data copies of
> the
> initialization values. A big blip in the radar. Measure the time it
> takes to add the
> first element to that huge growby array and you might want to factor that
> out.
> ****
>>
>>"But tha fundamental failure in thinking is thinking that the extra space
>>actually matters. It may not."
>>
>>It does in my case because I reach the 2GB limit before I reach the end of
>>the primes that I want to calculate up to 2^32. There are 200,000,000+
>>primes in that set and at 4 bytes per prime for CArray or int[], I can fit
>>in 800MB but with CList I would need 2.4GB at 12 bytes or 3,2GB at 16
>>bytes
>>per prime.
> ****
> Plus storage allocator overhead plus allocation quantum roundup, you may
> be allocating
> between 32 and 72 bytes per storage element. ALlocation in debug mode is
> much larger.
> ****
>>
>>"It doesn't scale; note that modern encryption might use 1024, 2048 or
>>4096-bit primes"
>>
>>So you were referring to arbitrarily large numbers. when you said 'large
>>integer', I thought you might be referring to LARGE_INTEGER which is only
>>64
>>bits. Using the primes up to 2^32, I could use the sieve to calculate all
>>the primes up to LARGE_INTEGER. There just wouldn't be enough RAM or disk
>>space (on my PC) to hold them because there are somewhere between 10^15
>>and
>>10^16 of them.
> ****
> LARGE_INTEGER, which is only 64 bits, is too small for effective
> encryption. It is
> possible to crack 64-bit encryption in hours to days, making it
> ineffective. 256-bit to
> 4096-bit encryption is considered state-of-the-art for PKI today. Really
> secure
> encryption uses one-time pads generated by truly random processes (such as
> radioactive
> decay)
> ****
>>
>>"No, serialization is intended to replace the contents, so you are seeing
>>the specified
>>behavior. Appending was never considered as a part of the design. In
>>fact,
>>it would have
>>been considered a bug by the designers if serialization appended."
>>
>>In my case it IS appending. If I start out with one, two or three elements
>>in the CList, after I serialize from disk, those original elements are
>>still
>>in there. I don't know why but I think and agree with you that they should
>>not be.
> *****
> one of the reasons I don't trust serialization is that it has too many
> "unspecified"
> features. I can write serialization code in seconds that does what I
> want, and I know the
> specs.
> joe
> *****
>>
>>
>>
>>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
>>news:fofvp5tejd5101f2pu7974u9upb2qmim9t(a)4ax.com...
>>> See below...
>>> On Tue, 16 Mar 2010 21:38:46 +0800, "Bill Brehm" <don't want spam>
>>> wrote:
>>>
>>>>"Your choice of CArray is probably a Bad Idea."
>>>>
>>>>It IS turning out to be a bad idea. This was one of the things I said I
>>>>would get around to asking. If I give a growby of 10000, the program
>>>>runs
>>>>at
>>>>a reasonable speed. When I set it much higher, like 100,000,000,
>>>>thinking
>>>>that i'll have fewer reallocations, it runs really slowly. I would have
>>>>expected the opposite until 1e8 primes were calculated, then a large
>>>>delay
>>>>during reallocation. But it's slow from the start.
>>> ***
>>> Someone one complained that they could only add about 5000
>>> elements/second
>>> to CArray. I
>>> tried the same experiment, and found that indeed that was the number I
>>> was
>>> getting, in
>>> Debug mode. They were adding 1.5M elements to the CArray. I set the
>>> expansion quantum to
>>> 500,000 and measured in Release mode (which is much faster) and got 15M
>>> elements/second.
>>> CArray without using SetSize to modify the quantum is horrendous
>>> performance,
>>> approximately exponential in time with the number of elements.
>>> std::vector is very close
>>> to linear (I did the measurements, but never published them).
>>> *****
>>>>I did some
>>>>PerformanceQueries and found that adding something to the CArray (when
>>>>it
>>>>grew by 10000) was ten times slow than doing a division as part of the
>>>>prime
>>>>check. I don't know if it's a paging issue or some inefficiency in
>>>>CArray
>>>>itself. I guess it's much worse with a large growby but i don't know why
>>>>and
>>>>i didn't measure to check.
>>> ****
>>> Add 1 element: allocate a new array of size+1; copy size elements from
>>> the
>>> old array, add
>>> the new element to the end. Expoential peformance; the time goes up as
>>> approximately the
>>> square of the number of elements (do the arithmetic for the copy
>>> operation). Never
>>> believe ANYTHING you measure in Debug mode; it is irrelevant and will
>>> always be horrible.
>>> This is because there are bounds checks and new-storage-iniitailization
>>> that take time.
>>> Most paging performance is irrelevant unless your vector gets into the
>>> hundreds of
>>> megabytes in size, at which point you definitely have the wrong
>>> representation.
>>> ****
>>>>
>>>>Would the release version be much faster than the debug version (less
>>>>checking for bounds and other stuff)?
>>> ****
>>> Absolutely! See previous comment; you CANNOT rely on "Debug"
>>> measurements
>>> for true
>>> performance guarantees.
>>> ****
>>>>Is there a way (with task manager
>>>>maybe) to tell if the CArray is staying in memory or being swapped out
>>>>to
>>>>disk? Is there a way to force the CArray to stay in memory and not swap
>>>>out?
>>>>Any guess why it might be so slow?
>>> ****
>>> Watch for increases in paging faults using the perfmon tool. A
>>> substantial increase in
>>> paging frequency is a Bad Sign. Note that mking your algorithm
>>> cache-aware can generally
>>> improve performance by 10X, but you need to know the cache parameters
>>> for
>>> each computer
>>> (which you can discover) to write code like this, and it doesn't always
>>> work due to the
>>> nature of the particular algorithm involved.
>>> ****
>>>>
>>>>"you have made a fundaemental failure in thinking here"
>>>>
>>>>Maybe so, but it turns out that the CList is much faster than the
>>>>CArray,
>>>>in
>>>>my program at least. Too bad it takes four times the memory and I run
>>>>out
>>>>before I can reach 2^32. Also, it takes much longer to clear out the
>>>>CList
>>>>than the CArray, which is logical since it's made up of many tiny parts
>>>>instead of a few large parts.
>>> ****
>>> But tha fundamental failure in thinking is thinking that the extra space
>>> actually matters.
>>> It may not. Note that while adding to an array with SetSize(..., 0)
>>> (the
>>> default), is
>>> exponential in time, adding to a list is constant time, so burning the
>>> extra space would
>>> not matter. Unless you exceed the working set size, at which point, it
>>> becomes a paging
>>> problem, or if the elements are widely scattered in memory, or the
>>> storage
>>> allocator
>>> becomes the performance bottleneck (which it can). If you anticipate
>>> 1,000,000 elements,
>>> do a SetSize(0, 1000000); when the array is empty; then the array append
>>> is constant time
>>> up to 1,000,000 elements also, and at 1,000,001 there is a blip i n the
>>> performance as the
>>> array is reallocated to size 2,000,000 and then time is constant again
>>> to
>>> 2,000,000. IF
>>> you are not doing the SetSize, then your assumption are all broken and
>>> you
>>> will get
>>> exponential cost. So you cannot
>>> (a) predict performance if you are in Debug mode
>>> (b) predict performance if you have not forced preallocation (SetSize)
>>> which is handled "by magic" in std::vector
>>> You have apparently committed both sins here, so your choices don't have
>>> a
>>> sound basis.
>>> joe
>>> ****
>>>>
>>>>I found another issue with CList which I think I would consider a bug.
>>>>When
>>>>I Serialize back from disk into a CArray, anything in the array is wiped
>>>>out
>>>>and overwritten. When I do that into the CList, the disk data is
>>>>appended
>>>>to
>>>>the end of what is in the list. I didn't see that described in the help
>>>>but
>>>>I also didn't find any mention of this "bug" in a quick Google search.
>>> ****
>>> No, serialization is intended to replace the contents, so you are seeing
>>> the specified
>>> behavior. Appending was never considered as a part of the design. In
>>> fact, it would have
>>> been considered a bug by the designers if serialization appended.
>>> ****
>>>>
>>>>"I wouldn't touch MFC serialization as far as I could throw a
>>>>mainframe."
>>>>
>>>>Ha ha, how far can you throw a mainframe? I am learning to agree with
>>>>you
>>>>on
>>>>this. I don't think I ever used it before. I started to use it this time
>>>>because it is much faster and uses less space than saving and loading in
>>>>text format. But if it's not trustworthy then it's not worth it.
>>> ****
>>> Given that I'm actually physically handicapped, and probably couldn't
>>> throw a mere laptop
>>> very far, throwing a 1-ton mainframe isn't likely. Probably measured in
>>> nanometers. The
>>> real problem is schema evolution, a problem I had to address when I
>>> designed what is a lot
>>> like XML back in 1977 (except our design was far better). Today, I use
>>> XML for my data
>>> representation.
>>> *****
>>>>
>>>>"Note that most prime algorithms are variants of the Seive of
>>>>Eastosthenes"
>>>>
>>>>Yes, I'm using that. My interest in playing with primes was triggered by
>>>>trying to learn about encryption and modular arithmetic used in RSA. But
>>>>my
>>>>first program ever was a prime number generator running on some HP Basic
>>>>computer back in the early 70's, so it's interesting to me to bring that
>>>>up
>>>>to date. My program also can run the old way by dividing by primes until
>>>>the
>>>>sqrt of the number under test. I think my original Basic program divided
>>>>by
>>>>all odd numbers because there wasn't enough memory to store the array of
>>>>primes on that old computer.
>>> ****
>>> Note that RSA wants to use massive prime numbers (which are almost
>>> impossible to factor).
>>> I have been talking with a colleague who is an encryption guy and we are
>>> talking about
>>> developing a course in encryption when we would use RSA-8 (8-bit primes)
>>> so students could
>>> work out the encryption with just a calculator or even pencil and paper
>>> and then show why
>>> short key are vulnerable by asking them to work out encrypted data
>>> (since
>>> 8-bit primes
>>> would be easy to crack), to give some real feel for the algorithms. He
>>> has taken a course
>>> from Ron Rivest (the R of RSA) and is really into this stuff.
>>> ****
>>>>
>>>>Actually the sieve technique should scale to large integers (if you mean
>>>>the
>>>>64 bit type and not arbitrarily large) once one has an array of primes
>>>>to
>>>>2^32 and if the sieving is done in segments.
>>> ****
>>> It doesn't scale; note that modern encryption might use 1024, 2048 or
>>> 4096-bit primes, and
>>> facroting these with moden computers and moden algorithms takes
>>> intervals
>>> such that the
>>> Sun is likely to expand into a Red Giant and engulf the Earth before the
>>> algorithm
>>> terminates.
>>> joe
>>> ****
>>>>
>>>>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
>>>>news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a(a)4ax.com...
>>>>> See below...
>>>>> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam>
>>>>> wrote:
>>>>>
>>>>>>I am working on a program to calculate primes. It's just for fun, not
>>>>>>for
>>>>>>work or school, if that matters here. I'm trying to optimize it and
>>>>>>I'll
>>>>>>have some more questions about that later.
>>>>>>
>>>>>>Today I changed the CArray used to hold the primes to a CList based on
>>>>>>a
>>>>>>compile time selection, figuring I would save the time of reallocating
>>>>>>and
>>>>>>copying the CArray every time I went over the maximum allocated size.
>>>>>>I'm
>>>>>>aware of the pros and cons of CArray and CList in general.
>>>>> ****
>>>>> Your choice of CArray is probably a Bad Idea. First, you should
>>>>> SetSize
>>>>> and set an
>>>>> explicit reallocation quantum (the second parameter of SetSize) to
>>>>> minimize the amount of
>>>>> reallocation that is done. Or use std::vector which does this
>>>>> implicitly,
>>>>> and had far,
>>>>> far better performance than CArray.
>>>>> ****
>>>>>>A disadvantage is
>>>>>>that the CList of primes takes (about) four time as much memory as the
>>>>>>CArray holding the same number of primes. The primes are 32 bit ints
>>>>>>so
>>>>>>I
>>>>>>don't know why it's four times instead of three times (previous
>>>>>>pointer,
>>>>>>prime and next pointer). What else is in there?
>>>>> ****
>>>>> you have made a fundaemental failure in thinking here. You think, for
>>>>> example, that the
>>>>> size of any elmeent matters, which is the wrong way to think about the
>>>>> problem. What
>>>>> matter is the memory footprint, and the induced paging, which is
>>>>> potentially far worse in
>>>>> a CLIst. Note that most prime algorithms are variants of the Seive of
>>>>> Eastosthenes, which
>>>>> means they don't scal to large integer (such as are using for
>>>>> encryption)
>>>>> and you can find
>>>>> enumerations of all the primes in the range of 0..2**32-1 on the Web
>>>>> and
>>>>> in reference
>>>>> books. LIss are nearly always less efficient than arrays due to
>>>>> locality-of-reference
>>>>> issues that affect cache behavior and paging.
>>>>> *****
>>>>>>
>>>>>>I tried something that I wasn't sure would work but it did. I
>>>>>>serialized
>>>>>>a
>>>>>>CArray to disk and then later I serialized it from disk to a CList. It
>>>>>>worked. It's not a total surprise, because it wouldn't make sense to
>>>>>>store
>>>>>>the next and previous CList pointers into a serialize archive. My
>>>>>>question
>>>>>>here is, does anyone know if these were made intentionally compatible
>>>>>>and
>>>>>>if
>>>>>>I can count on that working in all cases?
>>>>> ****
>>>>> I wouldn't touch MFC serialization as far as I could throw a
>>>>> mainframe.
>>>>> I
>>>>> would not use
>>>>> it for anything I cared about.
>>>>> joe
>>>>> ****
>>>>>>
>>>>>>Thanks...
>>>>>>
>>>>> Joseph M. Newcomer [MVP]
>>>>> email: newcomer(a)flounder.com
>>>>> Web: http://www.flounder.com
>>>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>>>
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer(a)flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: Skinning dialog box using Feature pack
Next: System tray menu questions.