From: Liviu on
"Malachy Moses" <malachy.moses(a)gmail.com> wrote in message
news:7e0c77f2-76d7-4a83-bb21-0a5269260ece(a)c34g2000pri.googlegroups.com...
>On Mar 24, 1:39 pm, "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote:
>>
>> [...]
>>
>> Another thing that I don't understand is that it crashes
>> when
>> num = Data[num];
>> is replaced by
>> num = Data[N];
>
> I suppose that this is an example of code you wrote that "mostly works
> the first time with only the most trivial logic errors", as gloatingly
> boasted by you in your post over here:
> http://groups.google.com/group/microsoft.public.vc.mfc/tree/browse_frm/thread/c84953c13066a4f3/4b4e98102ea69979?rnum=11&_done=%2Fgroup%2Fmicrosoft.public.vc.mfc%2Fbrowse_frm%2Fthread%2Fc84953c13066a4f3%3F#doc_dd77f6384b369bb2
>
> Quote from your post:
>
> "I really can't stand debugging my own code, that is why I
> take very extreme measures to prevent bugs from occurring in
> the first place. I carefully design every single line of
> code, and check the design many times before I even try to
> compile it. Most of the code that I write mostly works the
> first time with only the most trivial logic errors. "
>
> Really. Such modesty. I can see two errors immediately in only 14
> lines of code, for an error rate of 14.3%.
>
> I apologize for the ad hominem attack. Although it's unlike me, I am
> compelled by the vast quantities of fine resources (from the likes of
> Joe and Hector) that are being wasted on this stuff.

Ditto for the laudable patience and restraint of the main contributors.

As for Peter, and assuming he is the same one who stirred other mammoth
threads over the years on various newsgroups (e.g. "Can a regular TM
provide Write Only Memory?") I believe he is a uniquely skilled in his
own way, and often entertaining, professional poster.

However, his latest waves here shifted towards Linux servers and STL,
so he may at some point figure that this msvc/mfc group is no longer the
best, fittest, most appropriate one to patronize ;-)

Liviu


From: Hector Santos on
Peter once again you are looking into things that really isn't help
you in the long run.

You are looking at how the MACHINE performs when you FIRST really need
to see how your program is designed to run under WINDOWS with virtual
I/O and memory multi-threaded considerations.

If you are not going to explore your loading needs using the threaded
model even using the updated version of YOUR CODE with the threading
logic added to it, then thats it for me here.

--

Peter Olcott wrote:

> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
> news:eSd4d07yKHA.2644(a)TK2MSFTNGP04.phx.gbl...
>> Peter Olcott wrote:
>>
>>> Why am I only getting 121 MB/ Sec of the 12 GB / Sec that
>>> MemTest86 reported that I have?
>> I don't know what data and these rates you are referring
>> too. What is is and how it is calculated? I don't know
>> what memtest86 is (and I don't wish to download it.)
>>
>> --
>> HLS
>
> MemTest86 is a world renown little utility that provides all
> of your memory specs, and runs through many memory
> diagnostic tests. It basically will tell you if any part of
> your memory is not functioning correctly. It Reported all of
> my cache sizes and speeds and the RAM size and speed. The
> slowest speed that it reported was RAM Speed of 12 GB / Sec.
> (It actually reported 11,852 MB /Sec, I rounded a little).
>
>



--
HLS
From: Peter Olcott on

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:erPYzN8yKHA.1236(a)TK2MSFTNGP06.phx.gbl...
> Peter once again you are looking into things that really
> isn't help you in the long run.
>

Understanding the underlying performance details of memory
will help me in the immediate term and the long run. Maybe
this is a Joe question.

> You are looking at how the MACHINE performs when you FIRST
> really need to see how your program is designed to run
> under WINDOWS with virtual I/O and memory multi-threaded
> considerations.

Nope, not that. I am looking into how the essential process
will run across varied machine platforms and architectures.
To do this I must know the commonalities and variances.

>
> If you are not going to explore your loading needs using
> the threaded model even using the updated version of YOUR
> CODE with the threading logic added to it, then thats it
> for me here.
>

I have already addressed that issue. If four processes can
run at essentially the same speed as one process, then there
is no need to look at four threads. That would be redundant
wouldn't it?

> --
>
> Peter Olcott wrote:
>
>> "Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>> message news:eSd4d07yKHA.2644(a)TK2MSFTNGP04.phx.gbl...
>>> Peter Olcott wrote:
>>>
>>>> Why am I only getting 121 MB/ Sec of the 12 GB / Sec
>>>> that MemTest86 reported that I have?
>>> I don't know what data and these rates you are referring
>>> too. What is is and how it is calculated? I don't know
>>> what memtest86 is (and I don't wish to download it.)
>>>
>>> --
>>> HLS
>>
>> MemTest86 is a world renown little utility that provides
>> all of your memory specs, and runs through many memory
>> diagnostic tests. It basically will tell you if any part
>> of your memory is not functioning correctly. It Reported
>> all of my cache sizes and speeds and the RAM size and
>> speed. The slowest speed that it reported was RAM Speed
>> of 12 GB / Sec. (It actually reported 11,852 MB /Sec, I
>> rounded a little).
>
>
>
> --
> HLS


From: Joseph M. Newcomer on
See below...
On Wed, 24 Mar 2010 19:26:47 -0400, Hector Santos <sant9442(a)nospam.gmail.com> wrote:

>Peter Olcott wrote:
>
>
>> I learned this from an email from Ward Cunningham the
>> inventor of CRC cards. I asked him if he could only choose a
>> single criterion measure of code quality what would it be.
>> He said code size, eliminate redundant code.
>
>He probably meant about reusability Peter.
>
>In programming, you can code for size or speed. redundant code is
>faster because you reduce stack overhead. When you code for size, you
>are reusing code which has stack overhead.
****
Back in the days when I did optimizing compilers for a living, we worried about code size
because smaller code means fewer instructions. But in modern programming, bigger code can
run faster (e.g., redundant code that improves cache hits; more complex code that
determines optimal cache management, etc.) Also, note that in the 64-bit compiler, all
the space you will ever need for function calls is preallocated on the stack before
execution starts of any of the function code, and no parameter is EVER "pushed" onto the
stack; instead the one-and-only "master call frame" is preloaded with MOV instructions to
put the parameters in the right place for the CALL instruction. Turns out this is faster,
because stack manipulation breaks the prefetch pipe which is doing speculative fetches of
[ESP] or [EBP] and therefore the pipe breaks if the register changes after the speculative
fetch; whereas with the 64-bit code, because there are no stack manipulations EXCEPT when
the function prolog is executed, there can be no pipe breaking and you get maximum
advantage of the speculative reads in the prefetcher. A subtle point, but the code looks
really crappy, and it isn't. Similarly, the compiler is really poor at redundant
store-load elimination because it is so fast in the hardware that you save very little if
the sequence is
MOV [ESP+220], EAX
MOV EBX[ESP+220]
in my day in optimizing compilers, people would point and laugh at such code, but with
speculative reads, reads from the write pipe, and dynamic register renaming, there is
nothing lost by this sequence of instructions. At least not enough that the compiler
should spend time eliminating it. OTOH, the notion of forming a value in the parameter
register (since the first four parameters are passed in registers in 64-bit code, always),
the compiler makes heavy use of something we invented at CMU, "register targeting", to
develop the result in the register where it is needed (this is part of the register
allocation algorithm in the compiler). The 64-bit compilers do pretty good register
targeting. So "optimization" relates not to lines of code, or instructions generated, but
to effectiveness. It is often the case that beginners thing that fewer lines of code mean
better programs, but that is illusory; sometimes bigger programs run faster. Key is to
get the right abstractions so the "redundancy" of concept is minimized, but not
necessariliy the number of lines of code. This is because most programming is taught by
people who learned on the PDP-11 and other machines where code size REALLY mattered A LOT,
and the performance (0.3 MIPS, compared to the 100+ MIPS of moden x86s, 64K total RAM vs.
2-4GB virtual address space) that all the wrong things were "optimized", and they TEACH
this as if it is still meaningful. Conceptual simplicity is nice, but not when it
compromises ease of coding, robustness, or performance.

Sometimes the role of "redundant code" is to implement abstractions which end up producing
more code than not using the abstraction; this reduces performance while increasing
conceptual simplicity and maintainability. Utlimately, there is no single "best"
approach, just a number of approaches which empirically have proven themselves to be
effective; in general, any approach which single-mindedly applies some metric is flawed.
I've seen coding fads come and go, and I've learned not to believe in any of them. While
the most effective is "Keep It Simple", that has its own drawbacks; it gets code up
quickly,and the code is robust, but it can have performance problems. One of my goals
with VS2010 is to use its measurment tools to determine why one of my large programs is
underperforming; it takes too long to do certain things, and I want it to run faster. So
I'm going to measure the hell out of it, and figure out just where the time is going. If
it is in the automation interface, I can't do much, because it is what it is, but if it is
in my code I'm going to do something about it. And the code is going to get more complex.
****
>
>But in the today's world of super fast machines and bloated windows,
>higher dependency on dlls, proxies and p-code RTL, and high code
>generated sizes, the code vs speed ideas is, IMO, a thing of the past.
****
"Code bloat" is one of those terms used by people who haven't written large systems.
Generally, that code is there for a reason, and I've never encountered the idea that
p-code (or MSIL) contributes to "code bloat" since it is far more compact than x86 machine
code, and the JIT compiler compiles it into code that is really pretty good.
****
>
>Cases in point:
>
>1) .NET, reusability, higher stack overhead, but faster machines makes
>it all feasible.
****
Can you explain what "higher stack overhead" means, really? And why .NET has it and other
code (such as C++) doesn't?
****
>
>2) The evolution of templates. Once a code for speed with the expense
>of redundant code and bigger size, today, it is doesn't really matter
>and is more virtualize with functional coding and interfacing.
****
Templates are a mechanism for encapsulating abstraction; no promises were ever made about
performance. OTOH, I used to say "If you can't write code six levels of abstraction above
the machine and get half an instruction from it, your compiler is defective". And in
those days, we had the BEST optimizing compilers in existence. Today, Microsoft C++ runs
circles around our efforts, and makes them look like amateur compilers by comparison. So
it is not clear why template programming has to produce inefficient code; with the qualify
of optimization I've seen (including whole-program optimization and LTCG) you really can
work six lays of abstraction above the machine and generate half an instruction!
****
>
>You do want speed, don't get me wrong, but you are not going to waste
>type not creating reusable code. One thing you can do quickly with
>functions is to use the inline statement. This is good for low
>overhead black box functions:
****
And it is largely a waste of time because the whole-program optimizing compiler inlines
stuff for you anyway, so inline directives are usually redundant and buy very llttle.

In fact, in writing certain kinds of programs, it takes __declspec(noinline) to prevent
the automatic inlining of code when I need an actual function to exist!
****
>
>inline
>const DWORD &GetRandom(const DWORD &size)
>{
> return (rand()*rand())%size;
>}
>
>This gives the smaller functional programming sizing,yet some speed
>considerations with reduce stack overhead.
****
Stack overhead is free in the 64-bit compiler so a lot of these "principles" of good
programming are no longer meaningful...It's like the people who say "function calls
should never be used because they are expensive". Rubbish! In a piplined superscalar,
say a 2.8GHz machine, it can execute two instructions per clock cycle, and a clock cycle
is 350ps. So a CALL takes 175ps (that's PICO-seconds, so we are talking sub-nanosecond
timings here!) and a RET takes 175ps, so the whole call/return sequence takes 1/3 of a
NANOsecond, so don't tell me about calls being expensive! (And putting parameters on the
stack is cheap, also, since they only have to hit the L1 cache to be available to the
called function! And that doesn't count something called "stack-top simulation" that goes
on in the older chips, and isn't done in the 64-bit chips because it isn't needed!) So
calls are cheap even if you don't inline (which was always the rationale for inlining).
BUt look at real code, and you'll see that functions which should be inilned are ALREADY
inlined whether you add the __inline directive or not! Optimized 64-bit code is
completely incomprehensible!
joe
***
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on
See below...

On Wed, 24 Mar 2010 20:07:32 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:uIE36l6yKHA.5288(a)TK2MSFTNGP05.phx.gbl...
>> Peter Olcott wrote:
>>
>>
>>> I learned this from an email from Ward Cunningham the
>>> inventor of CRC cards. I asked him if he could only
>>> choose a single criterion measure of code quality what
>>> would it be. He said code size, eliminate redundant code.
>>
>> He probably meant about reusability Peter.
>
>Yes and code size (lines-of-code) quantified the degree of
>re-use.
****
Back when I was at the SEI we studied metrics used to measure reuse, and lines of code was
the least useful and poorest predictor. I spent a couple years looking at this problem,
talking to some of the key designers and managers in the field, and generally they would
say "we use lines of code, but nobody trusts it" and give several cogent reasons it was
meaningless as a metric of reuse, productivity, or anything else.
****
>
>>
>> In programming, you can code for size or speed. redundant
>> code is faster because you reduce stack overhead. When
>> you code for size, you are reusing code which has stack
>> overhead.
>
>No that is not quite it. Fewer lines-of-code are fewer
>lines-of-code that you ever have to deal with. By maximizing
>re-use changes get propagated with fewer changes.
****
The old "top-down" programming argument. It doesn't actually work, but Dijkstra made it
sound cool. It looks good until you try to use it in practice, then it crumbles to dust.
The reason is that decisions get bound top to bottom, and changing a top-level decision
ripples the whole way down the design tree; and if, at the lower level, you need to make a
change, it ripples upward. Actually "rips" more correctly describes the effect.

Parnas got it right with the notion of module interfaces, screw the lines of code. A
friend of mine got an award for software accomplishment when he changed the code review
process to review ONLY the "interface" files (in the true sense of INTERFACE, that is, no
private methods, no variables, period; only public methods are in an interface). He said
that it was every bit as productive, and the code reviews went faster. IBM though so to,
and gave him a corporate recognition award for contributing to software productivity.

Lines of code don't matter. Interfaces are all that matter. Parnas said this in the late
1960s and early 1970s, and essentially forty years of history have proven him right.

Part of the role of a good compiler is to peek across interfaces and produce good code in
spite of the abstractions. The MS C++ compiler is perhaps the best optimizing compiler I
have ever seen, and I've seen a lot. I've heard the Intel compiler is pretty good, too,
but I can't afford it.

[disclosure: Dave Parnas was one of my professors at CMU, taught one of the toughest
courses I ever took, which was operating systems, and lectured in the hardware course. I
have a deep respect for him. He is one of the founders of the field of Software Safety,
and works at a completely different level of specification than mere mortals]
joe
****
>
>>
>> But in the today's world of super fast machines and
>> bloated windows, higher dependency on dlls, proxies and
>> p-code RTL, and high code generated sizes, the code vs
>> speed ideas is, IMO, a thing of the past.
>>
>> Cases in point:
>>
>> 1) .NET, reusability, higher stack overhead, but faster
>> machines makes it all feasible.
>>
>> 2) The evolution of templates. Once a code for speed with
>> the expense of redundant code and bigger size, today, it
>> is doesn't really matter and is more virtualize with
>> functional coding and interfacing.
>>
>> You do want speed, don't get me wrong, but you are not
>> going to waste type not creating reusable code. One
>> thing you can do quickly with functions is to use the
>> inline statement. This is good for low overhead black box
>> functions:
>>
>> inline
>> const DWORD &GetRandom(const DWORD &size)
>> {
>> return (rand()*rand())%size;
>> }
>>
>> This gives the smaller functional programming sizing,yet
>> some speed considerations with reduce stack overhead.
>>
>>
>> --
>> HLS
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm