From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:uxSdCP6yKHA.5132(a)TK2MSFTNGP05.phx.gbl...
> Peter Olcott wrote:
>
>> void Process() {
>> clock_t finish;
>> clock_t start = clock();
>> double duration;
>> uint32 num = 0;
>> for (uint32 N = 0; N < Max; N++)
>> num = Data[num];
>> finish = clock();
>> duration = (double)(finish - start) / CLOCKS_PER_SEC;
>> printf("%4.2f Seconds\n", duration);
>> }
>>
>
>
What explains why I am only using 120 MB of the 12 GB of RAM
bandwidth?
Is there a way to stress test memory even more?


From: Peter Olcott on

"Geoff" <geoff(a)invalid.invalid> wrote in message
news:qf5lq556kt4oh22dq12c73sp161ogbgqgr(a)4ax.com...
> On Wed, 24 Mar 2010 15:39:26 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:

>>Here is an interesting note that I don't understand. A
>>slight revision (to make it a little less CPU intensive,
>>thus more memory intensive) only actually achieves 21 MB
>>per
>>second of the 12 GB / second maximum RAM speed. (RAM
>>speed
>>reported by MemTest86).
>>
>>const uint32 size = 100000000;
>>std::vector<uint32> Data;
>>uint32 Max = 0x3fffffff;
>>
>>
>>void Process() {
>> clock_t finish;
>> clock_t start = clock();
>> double duration;
>> uint32 num;
>> for (uint32 N = 0; N < Max; N++)
>> num = Data[num];
>> finish = clock();
>> duration = (double)(finish - start) / CLOCKS_PER_SEC;
>> printf("%4.2f Seconds\n", duration);
>> }
>>
>>Another thing that I don't understand is that it crashes
>>when
>> num = Data[num];
>>is replaced by
>> num = Data[N];
>>
>
> Two bugs exist.
>
> 1. You never initialize num, so executing num = Data[num]
> will access
> a garbage address in debug mode and will attempt to read
> Data[0] in
> release mode since the OS will zero-fill num for you.
>

Right

> 2. You never resize the vector space "Data" before you
> attempt to
> access it.
>
> Data.resize(Max);

Wrong. (reserve() has almost the same effect as resize() )


int main() {
printf("Size in bytes--->%d\n", size * 4);
Data.reserve(size);
for (uint32 N = 0; N < size; N++) {
uint32 Random = rand() * rand();
Random %= size;
Data.push_back( Random );
}

char N;
printf("Hit any key to Continue:");
scanf("%c", &N);

Process();

return 0;
}


From: Hector Santos on
Geoff wrote:

>
> My version is Windows XP. Compiling in VC 6.0 in 32 bit just to check.
>
> BTW, num is invariant over his loop, therefore he was loading from the
> same location Max times, hardly a test of memory access speed,
> wouldn't you say? In release mode a fully optimized version ran in 0.0
> secs, the compiler deciding that it could obtain 100% core utilization
> by getting rid of the memory access altogether. :)

Thats what I first thought, but he was added a randomness to the access.

What I had in my code was:

DWORD num;
for(DWORD r = 0; r < nRepeat; r++) {
for (DWORD i=0; i < size; i++) {
DWORD j = i; // assume serial access
if (bUseRandomIndex) {
j = (rand()*rand())%size;
if (j >= size) continue;
}
num = data[j];
}
}

So I can test for a serialize indexing of the huge data array and
compare that with a random j index of access.

He was basically do the same thing, by initialize the data array with
random index values. then starting at num=0 (with the fix), he would
basically accomplish the same random indicing.

uint32 num = 0;
for (uint32 N = 0; N < Max; N++)
num = Data[num];

Six and one, 1/2 dozen, I guess. But I also told him that he can
produce a scenario set of random numbers where:

loop 0: num = 0 ---> data[0] --> 5
loop 1: num = 5 ---> data[5] --> 0
loop 2: num = 0 ---> data[0] --> 5
loop 3: num = 5 ---> data[5] --> 0
loop 4: num = 0 ---> data[0] --> 5
loop 5: num = 5 ---> data[5] --> 0

etc. So in that regard, I think my method eliminates that.

Anyway, random jumping across the array spectrum does indeed seem
increase the access time as opposing to a serial access. For multiple
threads, it approaches a worst case scenario. When serialize, one
thread benefits the other. I guess it would be the same idea of a
hard drive head jumping all over the place.

What I found interesting is how a pure array is much faster than
std::vector at smaller sizes, but at some certain size they show the
same times. I plan to explore why, probably the code overhead in the
std::vector is showing up at the small sizes, but factors out when larger.

--
HLS
From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:uIE36l6yKHA.5288(a)TK2MSFTNGP05.phx.gbl...
> Peter Olcott wrote:
>
>
>> I learned this from an email from Ward Cunningham the
>> inventor of CRC cards. I asked him if he could only
>> choose a single criterion measure of code quality what
>> would it be. He said code size, eliminate redundant code.
>
> He probably meant about reusability Peter.

Yes and code size (lines-of-code) quantified the degree of
re-use.

>
> In programming, you can code for size or speed. redundant
> code is faster because you reduce stack overhead. When
> you code for size, you are reusing code which has stack
> overhead.

No that is not quite it. Fewer lines-of-code are fewer
lines-of-code that you ever have to deal with. By maximizing
re-use changes get propagated with fewer changes.

>
> But in the today's world of super fast machines and
> bloated windows, higher dependency on dlls, proxies and
> p-code RTL, and high code generated sizes, the code vs
> speed ideas is, IMO, a thing of the past.
>
> Cases in point:
>
> 1) .NET, reusability, higher stack overhead, but faster
> machines makes it all feasible.
>
> 2) The evolution of templates. Once a code for speed with
> the expense of redundant code and bigger size, today, it
> is doesn't really matter and is more virtualize with
> functional coding and interfacing.
>
> You do want speed, don't get me wrong, but you are not
> going to waste type not creating reusable code. One
> thing you can do quickly with functions is to use the
> inline statement. This is good for low overhead black box
> functions:
>
> inline
> const DWORD &GetRandom(const DWORD &size)
> {
> return (rand()*rand())%size;
> }
>
> This gives the smaller functional programming sizing,yet
> some speed considerations with reduce stack overhead.
>
>
> --
> HLS


From: Liviu on
"Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote in message
news:0YmdnXNrsfoxMjfWnZ2dnUVZ_oadnZ2d(a)giganews.com...
> "Geoff" <geoff(a)invalid.invalid> wrote in message
> news:qf5lq556kt4oh22dq12c73sp161ogbgqgr(a)4ax.com...
>> On Wed, 24 Mar 2010 15:39:26 -0500, "Peter Olcott"
>> <NoSpam(a)OCR4Screen.com> wrote:
>>>Here is an interesting note that I don't understand. A
>>>slight revision (to make it a little less CPU intensive,
>>>thus more memory intensive) only actually achieves 21 MB per second
>>>of the 12 GB / second maximum RAM speed.
>>>(RAM speed reported by MemTest86).
>>>
>>>const uint32 size = 100000000;
>>>std::vector<uint32> Data;
>>>uint32 Max = 0x3fffffff;
>>>
>>>void Process() {
>>> clock_t finish;
>>> clock_t start = clock();
>>> double duration;
>>> uint32 num;
>>> for (uint32 N = 0; N < Max; N++)
>>> num = Data[num];
>>> finish = clock();
>>> duration = (double)(finish - start) / CLOCKS_PER_SEC;
>>> printf("%4.2f Seconds\n", duration);
>>> }
>>>
>>>Another thing that I don't understand is that it crashes
>>>when
>>> num = Data[num];
>>>is replaced by
>>> num = Data[N];
>>
>> Two bugs exist.
>>
>> 1. You never initialize num [...]
>>
> Right

Which gives a C4700 compiler warning, btw. Assuming you somehow
overlooked, or chose to ignore that, the debugger would have stopped
on the offending line with all the necessary clues to figure out why.

You _did_ run it under the debugger before asking for help on usenet
and waiting many hours to learn the all too obvious answer... right?

>> 2. You never resize the vector space "Data" before you attempt to
>> access it.
>>
>> Data.resize(Max);
>
> Wrong. (reserve() has almost the same effect as resize() )
>
> int main() [...]

Geoff was replying to your earlier post, which showed no 'reserve'.
You only included the main() part and 'reserve' in the later followup.

Liviu