From: "Bill Brehm" don't want on
I am working on a program to calculate primes. It's just for fun, not for
work or school, if that matters here. I'm trying to optimize it and I'll
have some more questions about that later.

Today I changed the CArray used to hold the primes to a CList based on a
compile time selection, figuring I would save the time of reallocating and
copying the CArray every time I went over the maximum allocated size. I'm
aware of the pros and cons of CArray and CList in general. A disadvantage is
that the CList of primes takes (about) four time as much memory as the
CArray holding the same number of primes. The primes are 32 bit ints so I
don't know why it's four times instead of three times (previous pointer,
prime and next pointer). What else is in there?

I tried something that I wasn't sure would work but it did. I serialized a
CArray to disk and then later I serialized it from disk to a CList. It
worked. It's not a total surprise, because it wouldn't make sense to store
the next and previous CList pointers into a serialize archive. My question
here is, does anyone know if these were made intentionally compatible and if
I can count on that working in all cases?

Thanks...


From: Joseph M. Newcomer on
See below...
On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam> wrote:

>I am working on a program to calculate primes. It's just for fun, not for
>work or school, if that matters here. I'm trying to optimize it and I'll
>have some more questions about that later.
>
>Today I changed the CArray used to hold the primes to a CList based on a
>compile time selection, figuring I would save the time of reallocating and
>copying the CArray every time I went over the maximum allocated size. I'm
>aware of the pros and cons of CArray and CList in general.
****
Your choice of CArray is probably a Bad Idea. First, you should SetSize and set an
explicit reallocation quantum (the second parameter of SetSize) to minimize the amount of
reallocation that is done. Or use std::vector which does this implicitly, and had far,
far better performance than CArray.
****
>A disadvantage is
>that the CList of primes takes (about) four time as much memory as the
>CArray holding the same number of primes. The primes are 32 bit ints so I
>don't know why it's four times instead of three times (previous pointer,
>prime and next pointer). What else is in there?
****
you have made a fundaemental failure in thinking here. You think, for example, that the
size of any elmeent matters, which is the wrong way to think about the problem. What
matter is the memory footprint, and the induced paging, which is potentially far worse in
a CLIst. Note that most prime algorithms are variants of the Seive of Eastosthenes, which
means they don't scal to large integer (such as are using for encryption) and you can find
enumerations of all the primes in the range of 0..2**32-1 on the Web and in reference
books. LIss are nearly always less efficient than arrays due to locality-of-reference
issues that affect cache behavior and paging.
*****
>
>I tried something that I wasn't sure would work but it did. I serialized a
>CArray to disk and then later I serialized it from disk to a CList. It
>worked. It's not a total surprise, because it wouldn't make sense to store
>the next and previous CList pointers into a serialize archive. My question
>here is, does anyone know if these were made intentionally compatible and if
>I can count on that working in all cases?
****
I wouldn't touch MFC serialization as far as I could throw a mainframe. I would not use
it for anything I cared about.
joe
****
>
>Thanks...
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Goran on
On Mar 15, 4:06 pm, "Bill Brehm" <don't want spam> wrote:
> I tried something that I wasn't sure would work but it did. I serialized a
> CArray to disk and then later I serialized it from disk to a CList. It
> worked. It's not a total surprise, because it wouldn't make sense to store
> the next and previous CList pointers into a serialize archive. My question
> here is, does anyone know if these were made intentionally compatible and if
> I can count on that working in all cases?

Yes, but I doubt that was intentional. The reason is really this: both
CList<> and CArray<> simply serialize the number of elements, then
each element, from first to last. If you think about it, what else
should they do? So this will work for lists and arrays, and I would
guess that you can also altercate between e.g. CArray<BYTE> and
CByteArray and such, __EXCEPT__ if you use "shift" operators and
serialize pointers to concrete class (that is, if you use ar <<
pToCByteArray, you will have trouble using ar >> pToTemplateByteArray
later on).

Goran.
From: "Bill Brehm" don't want on
"Your choice of CArray is probably a Bad Idea."

It IS turning out to be a bad idea. This was one of the things I said I
would get around to asking. If I give a growby of 10000, the program runs at
a reasonable speed. When I set it much higher, like 100,000,000, thinking
that i'll have fewer reallocations, it runs really slowly. I would have
expected the opposite until 1e8 primes were calculated, then a large delay
during reallocation. But it's slow from the start. I did some
PerformanceQueries and found that adding something to the CArray (when it
grew by 10000) was ten times slow than doing a division as part of the prime
check. I don't know if it's a paging issue or some inefficiency in CArray
itself. I guess it's much worse with a large growby but i don't know why and
i didn't measure to check.

Would the release version be much faster than the debug version (less
checking for bounds and other stuff)? Is there a way (with task manager
maybe) to tell if the CArray is staying in memory or being swapped out to
disk? Is there a way to force the CArray to stay in memory and not swap out?
Any guess why it might be so slow?

"you have made a fundaemental failure in thinking here"

Maybe so, but it turns out that the CList is much faster than the CArray, in
my program at least. Too bad it takes four times the memory and I run out
before I can reach 2^32. Also, it takes much longer to clear out the CList
than the CArray, which is logical since it's made up of many tiny parts
instead of a few large parts.

I found another issue with CList which I think I would consider a bug. When
I Serialize back from disk into a CArray, anything in the array is wiped out
and overwritten. When I do that into the CList, the disk data is appended to
the end of what is in the list. I didn't see that described in the help but
I also didn't find any mention of this "bug" in a quick Google search.

"I wouldn't touch MFC serialization as far as I could throw a mainframe."

Ha ha, how far can you throw a mainframe? I am learning to agree with you on
this. I don't think I ever used it before. I started to use it this time
because it is much faster and uses less space than saving and loading in
text format. But if it's not trustworthy then it's not worth it.

"Note that most prime algorithms are variants of the Seive of Eastosthenes"

Yes, I'm using that. My interest in playing with primes was triggered by
trying to learn about encryption and modular arithmetic used in RSA. But my
first program ever was a prime number generator running on some HP Basic
computer back in the early 70's, so it's interesting to me to bring that up
to date. My program also can run the old way by dividing by primes until the
sqrt of the number under test. I think my original Basic program divided by
all odd numbers because there wasn't enough memory to store the array of
primes on that old computer.

Actually the sieve technique should scale to large integers (if you mean the
64 bit type and not arbitrarily large) once one has an array of primes to
2^32 and if the sieving is done in segments.

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
news:b0htp5l4t2bi28po0b6gfqbfkctt27up6a(a)4ax.com...
> See below...
> On Mon, 15 Mar 2010 23:06:24 +0800, "Bill Brehm" <don't want spam> wrote:
>
>>I am working on a program to calculate primes. It's just for fun, not for
>>work or school, if that matters here. I'm trying to optimize it and I'll
>>have some more questions about that later.
>>
>>Today I changed the CArray used to hold the primes to a CList based on a
>>compile time selection, figuring I would save the time of reallocating and
>>copying the CArray every time I went over the maximum allocated size. I'm
>>aware of the pros and cons of CArray and CList in general.
> ****
> Your choice of CArray is probably a Bad Idea. First, you should SetSize
> and set an
> explicit reallocation quantum (the second parameter of SetSize) to
> minimize the amount of
> reallocation that is done. Or use std::vector which does this implicitly,
> and had far,
> far better performance than CArray.
> ****
>>A disadvantage is
>>that the CList of primes takes (about) four time as much memory as the
>>CArray holding the same number of primes. The primes are 32 bit ints so I
>>don't know why it's four times instead of three times (previous pointer,
>>prime and next pointer). What else is in there?
> ****
> you have made a fundaemental failure in thinking here. You think, for
> example, that the
> size of any elmeent matters, which is the wrong way to think about the
> problem. What
> matter is the memory footprint, and the induced paging, which is
> potentially far worse in
> a CLIst. Note that most prime algorithms are variants of the Seive of
> Eastosthenes, which
> means they don't scal to large integer (such as are using for encryption)
> and you can find
> enumerations of all the primes in the range of 0..2**32-1 on the Web and
> in reference
> books. LIss are nearly always less efficient than arrays due to
> locality-of-reference
> issues that affect cache behavior and paging.
> *****
>>
>>I tried something that I wasn't sure would work but it did. I serialized a
>>CArray to disk and then later I serialized it from disk to a CList. It
>>worked. It's not a total surprise, because it wouldn't make sense to store
>>the next and previous CList pointers into a serialize archive. My question
>>here is, does anyone know if these were made intentionally compatible and
>>if
>>I can count on that working in all cases?
> ****
> I wouldn't touch MFC serialization as far as I could throw a mainframe. I
> would not use
> it for anything I cared about.
> joe
> ****
>>
>>Thanks...
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm


From: Giovanni Dicanio on
"Bill Brehm" <don't want spam> ha scritto nel messaggio
news:e9or$3QxKHA.1796(a)TK2MSFTNGP02.phx.gbl...

> If I give a growby of 10000, the program runs at a reasonable speed. When
> I set it much higher, like 100,000,000, thinking that i'll have fewer
> reallocations, it runs really slowly. I would have expected the opposite
> until 1e8 primes were calculated, then a large delay during reallocation.

One of the problems of CArray is its reallocation policy. IIRC, CArray uses
an arithmetic growth policy, which is not smart.
Instead, std::vector uses a geometric growth policy (using a 1.5x factor for
increasing capacity, IIRC).

In any case, I think that if you set CArray initial size big enough, you
won't have speed problems.
Did you try something like SetSize(100000000) ?
Then you can use a separated integer index to identify the first free slot
in the array, to add new found primes.

Giovanni