Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Hector Santos on 22 Mar 2010 23:54

Peter Olcott wrote:

>> So go ahead and run a 2nd instance and why Windows do
>> things that will defy your logic.
>
> I don't care if you are right or not any more.

No NOT! I am right, but its not a matter of being right, it is what it
is and every windows programmer with sense in his brain knows this and
yeah, Microsoft says so:

http://support.microsoft.com/kb/555223

In modern operating systems, including Windows, application
programs and many system processes *ALWAYS* reference memory using
virtual memory addresses which are automatically translated to real
(RAM) addresses by the hardware. Only core parts of the operating
system kernel bypass this address translation and use real memory
addresses directly.

Virtual Memory is always in use, *EVEN* when the memory required
by all running processes does not exceed the amount of RAM
installed on the system.

and every link I provided says so, and finally, the simulator I
posted, which if you had any chips in your head, is your engineering
ticket for solving your performance and mongoose integration problem,
says so!

> It has already cost me too much time.

You should of listen from the beginning.

> I will implement it as a single thread that is
> connected to a FIFO queue.

And it will end up being a GIGO!

--
HLS

From: Joseph M. Newcomer on 23 Mar 2010 00:06

On Mon, 22 Mar 2010 20:19:32 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>

>
>I always boil everything down to its bare essence, and
>remove any extraneous details that do not specifically and
>directly completely pertain the precise point at hand. I use
>categorical thinking, not item by item detail by detail
>unless these details can be shown to be 100% completely
>relevant to the exact precise point at hand.
***
The minutiae, and details, are all that differentiate an asynchronous pipelined
superscalar architecture from a simple von Neumann architecture. Note that a modern Xeon
is thousands of times faster than an 8088 even though the clock speeds are not more than
1000 times faster. It's ALL in the defails. So while you can pretend to understand the
x86 by reading the instruction set manual, the REAL TRUTH is vastly more complex. You
cannot ignore the details and expect to achieve the goals you want to achieve. Then you
host, on this complex architecture, a sophisticated modern operating system that runs with
virtual memory, and you have something that has no simple "essensce" but in fact is a
complex composite of thousands of little details. Do you know what "dynamic register
renaming" is? Well, it buys you a LOT of performance.
****
>
>You seem to come at things from the opposite point of view
>carefully examining every little nuance of a detail just in
>case it might possibly be at least slightly relevant. There
>are cases where caching can not improve performance, so try
>to see if we can categorically eliminate the need to look at
>this before I proceed a micro step towards considering any
>of its details.
****
So go run the experiement and see what your data says!
****
>
>>
>> And all we are telling you is that you know so little of
>> what is going on at every level
>> of storage management that your flat statements about
>> peformance have no basis, and that
>> you should run experiments to see if your guesses are
>> correct or not, and you keep telling
>> us that by sheer guesswork you can arrive at a conclusion
>> that highly experienced
>> performance people would never dare pressent without
>> substantiating data. Yet you claim
>> you MUST be right. Hector and I pretty much claim that we
>> want to see NUMBERS that prove
>> what is going on. Ever-so-slowly you produce one number
>> or another, whereas I don't think
>> either of us would have made ANY statements without a
>> WHOLE LOT MORE actual measurements
>> to prove or disprove our hyphotheses. You keep saying
>> what MUST be so, and in only ONE
>> case (the page fault example) have you actually gone out
>> and gotten substantiating data to
>> prove you are right, that with an excess or RAM the page
>> faults drop to zero.
>>
>> My big objection is that you refuse to make measurements
>> because you are so convinced of
>> your correctness that it doesn't occur to you that it is
>> actually working to your
>> advantage to be wrong (if you're wrong, you are losing
>> performance you might have gotten).
>> And your one flawed experiment, two massive processes on
>> the same core, is not a valid
>> measure of anything except what happens if you run two
>> masive processes on the same core.
>
>How could I very quickly measure exactly how much of the
>total memory bandwidth that my process takes?
***
Why do you care? What you care about is end-to-end transaction time. And we've suggested
that you try an experiment to see what the effects of multithreading are, and you insist
that you will learn nothing from such an experiment because using your superficial
analysis methodology you have a full and complete understanding of what is going on. I
would not presume that I had a clue about what is going on, and I'd measure the hell out
of it, under a variety of scenarios, no the least of which is sending wildly different
images in a way that gets N threads running concurrently on an N-core architecture. Then
I'd KNOW what is going on, instead of trying to work it out from irrelevant experiments
and poor and superficial comprehension of what is going on.
****
>
>I am pretty sure that it takes most all of it, thus proving
>at least one of my points without the need for further
>investigation on this points. It would prove that adding
>another thread can't possibly help. How do I quickly and
>accurately measure my processes memory bandwidth usage?
***
That's my point. You're "pretty sure", but you have NOT A SINGLE FACT that substantiates
this. You have ONE completely irrelevant experiment. I'm clueless about what is going to
happen, and if I had a business model that required high performance, I'd start by getting
the data that would let me make technical decisions, or tell me which ones were right and
wrong. But you are dead certain that without ANY data, you KNOW what is going to happen.
You may be right, you may be wrong, but you don't actually KNOW. Instead, you boil the
computer facts down to their essence, "sequential von Neuman machine", and try to infer
behavior from that model. Yet this computer model has not been used for decades. At one
time, multilevel caching only happened in supercomputers; now it is on-chip. Asynchronous
pipelined architectures existed only at the megadollar computer systems, now they cost a
couple hundred dollars and run on a few watts of power. The von Neumann model has been
dead as an implementation strategy for decades. I'll bet you don't know that the x86 is
really a RISC machine, in fact, an asynchronous, pipelined, superscalar RISC machine, the
the x86 instruction set is illusory. True. Look it up in any Intel manual!
****
>
>All of this will soon be moot anyway because my updated
>process will have substantially different memory access
>requirements.
****
And you are, of course, going to measure multithreaded performance so you KNOW what you
are talking about! If you show us REAL numbers from REAL and RELEVANT experiments, then
you can tell us all to go take a flying leap if we've been wrong. But you can't say this
without EVIDENCE. Maybe it is my interactions with the legal system (I have a Certificate
in Forensic Science And The Law from our local law school. and have been an expert witness
in numerous cases of software contract peformance, patents and copyrights, and you can't
face up to a battery of highly-paid lawyers and say "I THINK this is true"; you had better
have a 30-page report, with graphs, to make your point. In one case (which we won), I
spent more than a week getting performance graphs to show why the software was defective.
And those graphs, when presented in court, basically killed off the assertions of the
defense, who simply said "We trusted our programmers who said the code was correct". So
my FACTS trumped their OPINION. My opinion, in your case, is that you will be surprised
by the performance numbers, and you say you won't be. So prove me wrong. Show me some
FACTS. As an expert witness, I am obliged to be able to say "In my OPINION, to a degree
of scientific accuracy, I believe this to be TRUE" [and you had better know about Frye and
Duabert juristdictions]. If I walked into a deposition with the weak evidence you have,
I'd come out looking like a chunk of beef that had met an unfriendly meat grinder in a
back alley. Hell, I come out FEELING that way even when I have graphs, photos, a citation
list worthy of any dissertation, and a host of FACTS that I can PROVE beyond any shadow of
a doubt, in some cases these have to be obtained by experiments which THEY CAN REPLICATE
so they will get the same answers I got, proving I was right and they are wrong. I have
to lay out the experimental methodology and show WHY the experiment is valid. Only my
background in physics allows me to do this, because my physics professor would not accept
sloppy experimental methodology. He'd fail your analysis because it has neither
theoretical nor experimental validity. Since there is no theory that can apply, only
experimental evidence can be presented.
joe

***
>
>> You did not run two massive processes sharing a single
>> memory segment, or multiple
>> processes on multiple cores, or muliple processes sharing
>> a single memory segment on
>> multiple cores, or any of the other interestring variants
>> that should be measured. Yet you
>> have an entire business stragegy which is predicated on
>> high performance, and you ignore
>> every suggestion that might lead to improved performance.
>>
>> Only government economists extrapolate from a single data
>> point.
>> joe
>> ****
>>>
>>>Now that I have a way to empirically validate your
>>>theories
>>>against mine (that I dreamed up last night while sleeping)
>>>I
>>>will do this.
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Hector Santos on 23 Mar 2010 00:12

Btw, Peter, the Microsoft link below,

http://support.microsoft.com/kb/555223

is a shorter version of the expanded article here by

http://members.shaw.ca/bsanders/WindowsGeneralWeb/RAMVirtualMemoryPageFileEtc.htm

It is probably the best technical summary I have personally seen for
this topic and I suggest to bookmark it (I did) to serve as a
refresher of how memory works under Windows.

--
HLS

Hector Santos wrote:

> Peter Olcott wrote:
>
>
>>> So go ahead and run a 2nd instance and why Windows do things that
>>> will defy your logic.
>>
>> I don't care if you are right or not any more.
>
>
> No NOT! I am right, but its not a matter of being right, it is what it
> is and every windows programmer with sense in his brain knows this and
> yeah, Microsoft says so:
>
> http://support.microsoft.com/kb/555223
>
> In modern operating systems, including Windows, application
> programs and many system processes *ALWAYS* reference memory using
> virtual memory addresses which are automatically translated to real
> (RAM) addresses by the hardware. Only core parts of the operating
> system kernel bypass this address translation and use real memory
> addresses directly.
>
> Virtual Memory is always in use, *EVEN* when the memory required
> by all running processes does not exceed the amount of RAM
> installed on the system.
>
> and every link I provided says so, and finally, the simulator I posted,
> which if you had any chips in your head, is your engineering ticket for
> solving your performance and mongoose integration problem, says so!
>
> > It has already cost me too much time.
>
> You should of listen from the beginning.
>
> > I will implement it as a single thread that is
> > connected to a FIFO queue.
>
> And it will end up being a GIGO!
>

From: Hector Santos on 23 Mar 2010 00:14

Peter Olcott wrote:

> Try running your process again using a std::vector<unsigned
> int>
> Make sure that you initialize all of this to the subscript
> of the init loop.
> Make sure that the process monitor shows that the amount of
> memory you are allocating is the same amount that total
> memory is reduced by.
> Make sure that you only use 1/2 of total memory or less.
> Make a not of the page fault behavior.

>

> I will try the same thing.

You better! :)

I'll BE BACK!

--
HLS

From: Joseph M. Newcomer on 23 Mar 2010 00:12

See below...
On Mon, 22 Mar 2010 21:07:35 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>message news:l45gq55hlc3sn35e2q6vq1ur6dbvsqvqr5(a)4ax.com...
>> See below...
>>
>> On Mon, 22 Mar 2010 16:59:34 -0500, "Peter Olcott"
>> <NoSpam(a)OCR4Screen.com> wrote:
>>
>>>
>>>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>>>message
>>>news:%23F2oLmgyKHA.5360(a)TK2MSFTNGP06.phx.gbl...
>>>> Peter Olcott wrote:
>>>>
>>>>> Joe kept insisting and continues to insist that my data
>>>>> is not resident in memory.
>>>>
>>>>
>>>> If you have a 32 bit Windows OS, you are limited to just
>>>> 2GB RAW ACCESS and 4GB of VIRTUAL MEMORY.
>>>
>>>Yes, and that is another thing. I kept saying that I have
>>>a
>>>64bit OS, and Joe kept forming his replies in terms of a
>>>32-bit OS.
>> ****
>> And how long did I keep saying "Unless you are running a
>> WIn32 process in Win64" but you
>> did not clarify that you were running on Win64. So in the
>> absence of any explicit
>> statement I had to assume you were running in Win32.
>> ****
>>>
>>>>
>>>> If your process is loading 4GB, you are using virtual
>>>> memory.
>>>>
>>>>> After loading my data and waiting twelve hours the
>>>>> process monitor reports zero page faults, when I
>>>>> execute
>>>>> my process and run it to completion.
>>>>
>>>>
>>>> You're lying, you told me you have PAGE FAULTS but it
>>>> settle down to zero, which is NORMAL. But start a 2nd
>>>> process and you will get page faults.
>>>
>>>I only get the page faults until the data is loaded. After
>>>the data is loaded I get essentially no more page faults,
>>>even after waiting twelve hours before running my process
>>>to
>>>completion. After proving that my data is resident in RAM
>>>Joe continues to chide me for claiming that my data is
>>>resident in RAM.
>> ****
>> If you used a memory-mapped file correctly, yu would have
>> very low-cost page faults
>> because you would be mapping to existing pages. But you
>> seem to not want to hear that
>> memory-mapped files will improve performance, particularly
>> in a multiple-process
>> environment.
>> joe
>> ****
>
>I don't want to hear about memory mapped files because I
>don't want to hear about optimizing virtual memory usage
>because I don't want to hear about virtual memory until it
>is proven beyond all possible doubt that my process does not
>(and can not be made to be) resident in actual RAM all the
>time.
****
"I don't want to hear about the best way to optimie my performance because I am clueless
about how virtual memory works and have my own belief about it, and I don't even want to
hear that memory-mapped files have the same performance characteristics as ordinary pages
and will be memory resident if there is nothing that forces them out, because I don't want
to listen to any suggestion that might actually work"

SInce you don't understand virtual memory, and you CERTAINLY don't understand how
memory-mapped files work, your rationale of why you don't want to hear about them is, to
put it midly, completely silly.
****
>
>Since a test showed that my process did remain in actual RAM
>for at least twelve hours, this is sufficient evidence to
>show that all of these lines of reason have at least for the
>moment become completely moot. The only thing that could
>make them less than completely moot would be proof that my
>process can not remain resident in RAM all the time.
***
And doesn't this suggest that trying the multithreaded experiment is worthwhile? And why
do you think memory-mapped files will not exhibit the SAME behavior? OH, never mind, you
don't want to know that there are alternative solutions that might be more effective than
what you are currently using, even one that can improve multiprocess behavior. SO you are
saying "don't tell me the world can be made better, I don't want to make it better"
joe

****
>
>>>
>>>You guys just playing head games with me?
>> ****
>> We are trying to help you, in spite of your best efforts
>> to tell us we are wrong. You
>> insist that simplistic experiments which gave you a single
>> data point give you a basis for
>> extrapolating an entire family of performance information,
>> and we are saying "You don't
>> KNOW until you've MEASURED" and you insist that
>> measurement is not relevant because you
>> MUST be right. All I'm saying is that you MIGHT be right,
>> and once you do the
>> measurements, you might find out that you are completely
>> WRONG, which works to your
>> advantage. So run the damn expeimet, already!
>> joe
>>
>> ****
>>>
>>>>
>>>> I also asked, now 5 times, to provide the MEMORY LOAD
>>>> percentage which I even provided with a simple C program
>>>> that you can compile, and you did not:
>>>>
>>>> // File: V:\bin\memload.cpp
>>>>
>>>> #include <stdio.h>
>>>> #include <windows.h>
>>>>
>>>> void main(char argc, char *argv[])
>>>> {
>>>> MEMORYSTATUS ms;
>>>> ms.dwLength = sizeof(ms);
>>>> GlobalMemoryStatus(&ms);
>>>> printf("Memory Load: %d%%",ms.dwMemoryLoad);
>>>> }
>>>>
>>>> Why can't you even do that?
>>>>
>>>>> How does this not prove Joe is wrong (At least in the
>>>>> specific instance of one execution of my process)?
>>>>> (1) The process monitor is lying.
>>>>> (2) Page faults do not measure virtual memory usage.
>>>>
>>>> There are now what 4-5 participants in the thread who
>>>> are
>>>> telling your thinking is wrong and lack a understanding
>>>> of
>>>> the Windows and Intel hardware.
>>>>
>>>> lets get a few more like this guy with a somewhat layman
>>>> description:
>>>>
>>>> http://blogs.sepago.de/helge/2008/01/09/windows-x64-all-the-same-yet-very-different-part-1/
>>>>
>>>> and the #1 guy at Microsoft today!
>>>>
>>>> http://blogs.technet.com/markrussinovich/archive/2008/07/21/3092070.aspx
>>>>
>>>> If you DEFY what Mark Russinovich is saying here, you
>>>> are
>>>> CRAZY!
>>>>
>>>> --
>>>> HLS
>>>
>> Joseph M. Newcomer [MVP]
>> email: newcomer(a)flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system