Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Peter Olcott on 12 Apr 2010 15:51

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:OYR8rcm2KHA.5420(a)TK2MSFTNGP05.phx.gbl...
> Peter Olcott wrote:
>
>
>>> Who does the HTTP delegation? Or will you have FOUR
>>> different web servers with four different FORM pages?
>>> How do you control the cross domain security and
>>> business thread entry points where someone is trying to
>>> cheat you out of a few dollars?
>>
>> (1) Everyone authenticates
>
>
> How do does this take?

How do that do what?

>
>> One physical CPU with a single core and two hyperthreads.
>
>
> Is that possible?

PENTIUM 4 technology

>>> A single CORE does not have HYPERTHRREADING!!! That is
>>> only possible with a multi-core CPU.
>>
>> Not according to Intel terminology. Hyperthreading was
>> what Intel came out with before they came out with
>> multiple cores. Now that they came out with multiple
>> cores, some of these multi-core CPUs also have two
>> hyperthreads per core, and some do not.
>
>
> Show a reference where it says 1 CPU with one CORE offers
> Hyperthreading. Maybe the mix up is what CORE means.

Look up the PENTIUM 4, the more recent ones came with
hyperthreading.
http://en.wikipedia.org/wiki/Pentium_4

>
> You can not have hyperthreading without atleast TWO
> LOGICAL processors. These can be called cores.
>
> http://en.wikipedia.org/wiki/Hyper-threading
>
> Hyper-threading is an Intel-proprietary technology
> used
> to improve parallelization of computations (doing
> multiple
> tasks at once) performed on PC microprocessors. For
> each
> processor core that is physically present, the
> operating
> system addresses two virtual processors, and shares
> the

Yep its the [virtual processor] part that is the
hyperthread, and the PENTIUM 4 chip that only had a single
core also had hyperthreading.

> > I always design completely before I begin coding,
>
> and you do it so poorly you never get anything done. You
> would be totally lost without the help of people on the
> net.

Do you know the Ernest Hemingway said that the early
revisions of his own work were horribly awful and he did not
get this right until the 20th complete re-write? Such is the
same with good system design.

>> So then are there are specific dysfunctional aspects with
>> the simple time slicing that I proposed immediately above
>> that you can point out?
>
>
> First, you are not time slicing, you are yielding the rest
> of your quantum when you sleep. The OS time slicing YOU.
> You need to learn how to use the terms and ideas
> correctly.

Whoops you already lost something there. There are TWO count
em TWO options. Only one of the options involves yielding,
the other one involves time slicing.

>
> Do you mean signaling like I TOLD YOU to use (you did not
> propose it)?

I assumed that you proposed a Windows thing that can not be
done on Linux, I converted it into a Linux/Unix equivalent.

> It depends on what you believe is best performance wise.
> Are you attempting to throttle a request? Slow it down?
> Put incoming low priority request to "sleep" until the 1
> single fifo processor is complete with the current high
> priority request?

THIS DESIGN REQUIREMENT IS IMMUTABLE
The design goal is to somehow provide the means for the high
priority jobs to has absolute priority over the lower
priority jobs these lower priority jobs all have equal
priority relative to one another. The proposed approach
must meet the above stated design goal as closely as
possible on a PENTIUM 4 computer.

I proposed Two approaches time slicing and yielding.

> If you want block it to give even MORE quantum to the HP
> thread, then you can wait on a kernel object which will be
> a VERY efficient "SLEEP" until it is signaled to wake up.
>
> IMO, with my experience, it is better to leave your
> threads alone and only boost the high priority thread and
> see how that performs. If you see a lower thread hogging
> up your high priority, then you probably to make it wait
> for HP completion or throttle it. But the whole point of
> all this is you don't really know until you see this all
> in action with a LOADING - how your request are coming in
> various rates and frequencies.

So basically I need to see how each of the TWO options
performance under various combinations of heavy loads. I am
guessing that time slicing will be good enough, especially
if the database writes prove to be the bottleneck.

> I can give you one recent specific experience. POP3
> server. We tried to increase the priority on one thread
> for an important action it is terribly affected the rest
> of the server. By far, by letting the OS do its job with
> scheduling, we obtain the best performance across the
> board. But trying to give the HP thread full power, you
> might find the the rest is not satisfactory, even its a
> low tier request item. Remember, first impressions count -
> if they see a poor slow trial, they might even bother
> subscribing to your service.

There are 40 different levels of process priority under
Linux, I am confident that these would provide sufficient
prioritization. If they don't testing will reveal this.

>> PENTIUM 4, it might not even have hyperthreading.
>
>
> Ok, so you going with a pure single processor. Hey, that
> might be better, but you don't know until you tried a
> multi-core or multi-cpu with multiple threads. Remember
> what you will be saving, context switching.

The machine along with all support and internet bandwidth is
rented into perpetuity. One of the ways I manage my finances
is to absolutely minimize recurring costs. More cores costs
300% more money.

>> A single log, and additionally a separate log for each
>> paid job stored in the customer database with the output
>> data.
>
>
> So you realize these need syncing. The logging can use
> fast appending, but the the customer database will need
> syncing specially if using SQLITE3. For this, go MYSQL to
> avoid the full locks SQLITES provides.

Deducting a dime from the customers account will require a
file lock under SQLite. It has occurred to me that even a
single disk seek per transaction may cut the number of
transactions down to 111 per second, regardless of SQL
provider. Even a fast SCSI would only slightly more than
double this.

>> Not at all. The web server delegaes the jobs to queues.
>
>
> That is what what I said and you will be load balancing
> othewise YOU will be hosed.

No load balancing. the web server merely sends the jobs to
their respective queues. Any load balancing is done on the
other end.

>>>> As long as the ROW lookup maps to the file byte offset
>>>> we are good.
>>>
>>> Then get another data manager because SQLITE will not
>>> give you this.
>>
>> And you know this how?
>
>
> Because we, many of customers and 3rd party developers use
> SQLITE3 in many our products. It uses ISAM/BTREE behind
> the scenes but that doesn't mean you have access to - you
> will be violating the integrity.

As long as my access to any table ROW is in constant time
that is seek time, then I have what I need. If it is either
not constant time (linear search) or some factor such as 2 X
seek time (indexed access) then I do not have what I need.
If I do not have what I need then the SQL engine becomes the
bottleneck that must be replaced for transaction logging. It
will still be required for user authentication.

>> Eventually it has to always be a file byte offset thing
>> in the underlying physical implementation because
>> Unix/Linux only knows about files in terms of a sequences
>> of bytes. If I tell the file manager that I want record
>> number 12463 and each record has exactly 12 bytes then it
>> must seek (12463 -1 * 12) byte offset.
>
>
> First, you are giving LINUX/UNIX a bad reputation with
> this primitive thinking. Second, SQL is not a packed
> record concept. Sorry. You want an ISAM or even varying
> record concept. But not SQL. Wrong TOOL.
>
> Now, can you create a TABLE with a field that is a BLOB,
> where you save a fixed length binary field?

In that case you must save the variable length most likely
in a table with fixed length records. Access to this data is
still by using file byte offset. File byte offset is what
lies underneath every single database access.

>>> Since your HTTP request are not bulk, each one is
>>> handled separated. At the rate you expect for the TPS,
>>> you will be MURDERED.
>>
>> It will merely be reduced to the maximum number of
>> transactions that can be written to disk. Most of this
>> time would be disk drive seek time. My cheap drive has
>> about 9 ms seek time that's 111 seeks per second. So
>> SQLite might be a bottleneck unless a single transaction
>> takes a single seek.
>
>
> Gawd, you such a bad designer.

I am a suburb designer if you quit with your damn
condescension and instead explain why you think something is
wrong, I can correct your misconceptions. I ALWAYS formulate
all of my design in terms of the very low level
implementation details. This provide much better designs.

The above analysis shows that disk drive seek time may very
well form the ultimate binding constraint of my TPS. In
order to have very reliable transactions and fault tolerance
I must explicitly give up any optimizations that would
prevent disk seek time from becoming the binding constraint.
Every single transaction must be committed to disk
immediately. If I could batch them together, I might combine
these into a single seek time. I can not combine these
together, thus one seek time each.

> Not with SQL or database systems that allow for SQL
> row/record/cursor level I/O access.

I just explained in much greater detail why disk drive seek
time may still be the binding constraint.

From: Jerry Coffin on 12 Apr 2010 15:52

In article <eOGBHzm2KHA.1708(a)TK2MSFTNGP05.phx.gbl>, sant9442
@nospam.gmail.com says...
>
> Jerry Coffin wrote:
>
> > The difference from what he's doing right now is that instead of
> > being restricted to running on one single-core processor,
>
>
> He's got a Quad with 8GB monster machine. :)

I thought I saw something like that at one point too, but the latest
round has it running on a single-core, hyperthreaded Pentium IV --
maybe the Computer History Museum has decided to raise some money by
moving into the ISP market?

--
Later,
Jerry.

From: Hector Santos on 12 Apr 2010 15:56

Joseph M. Newcomer wrote:

>>
>> The funny thing is you think seriously think you are normal! I
>> realize we have go beyond the call to duty ourselves to help you, but
>> YOU really think you are of a sound mind. You are the one that really
>> should be so lucky, these public groups are not moderated - you would
>> be the #1 person locked out. Maybe that is what happen in the linux
>> forums - people told you to go away - "go to the WINDOWS FORUMS and
>> cure them!"
> ***
> I think this was a typo, and you meant "curse"...
> ****

Good Catch!!

--
HLS

From: Joseph M. Newcomer on 12 Apr 2010 15:55

See below..
On Mon, 12 Apr 2010 09:31:48 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:%23aTn4cf2KHA.3568(a)TK2MSFTNGP04.phx.gbl...
>> Peter Olcott wrote:
>
>>> No, the latest analysis indicates that I am back up to
>>> 100 because the webserver and the OCR execute in
>>> parallel.
>>
>>
>> No, it shows EXACTLY what the simple equation
>>
>> TPS = N * 1000/ WORK LOAD
>>
>> and the charts I provided to you is SAYING, that if if you
>> want 100 TPS, with a 20 ms WORK LOAD, yuo need N=2
>> Handlers!
>>
>
>hyperthreading = two handlers
****
No hyperthreading == N handlers, where N is the number of threads being used (no matter
how they are partitioned among processes). Note that hyperthreading does NOT give you 2x
the computing resource, more like 1.3x the computing resource.
****
>
>> But again, this is an idealize equalized loading system -
>> a single queue with two handlers. One request coming in
>> at a time. That is not reality unless you synchronize the
>> incoming queuing and perform load balancing.
>
>A single queue with two handlers will not by itself provide
>the prioritization that I need.
****
So add more handlers. You seem to have missed the idea that "time slicing" allows you to
have an arbitrary number of handlers!
****
>
>> So what? How do you control the request that are coming
>> in. You said you want 100 TPS, but that load can come in
>> in 500 msecs! Now your simple equation is:
>>
>> 100 request/500 ms = N /20ms work load
>>
>> Solve for N and N = 4 handlers, threads, separate
>> processors, who cares how they are concurrently running
>> for that 500 ms time span, hyperthreaded or each on their
>> own CPU or machine - you need 4 handlers - period!
>
>Most of the wall clock time is in the web server,
>communicating with the client and the web server creates one
>thread per HTTP request. As far as the OCR processor is
>concerned it must finish its high priority jobs in 10 ms,
>one at a time in FIFO order. It is not even aware of the
>HTTP delays.
****
That's expensive; a well-done HTTP server would have a thread pool to optimize this time
and use a SQMS approach to queueing requests up for its thread pool. Then the dequeue
thread would be handling the scheduling to the worker threads (again, no matter how they
are partititioned among processes, but you have this hangup about processes and threads
being different; of course we ignore pthreads because that is an outdated library for
serious thread work)
>
>>> The only way this site is going to ever get too long of a
>>> queue is if too many free jobs are submitted. Do you
>>> really think that this site is ever going to be making
>>> $10.00 per second? If not then I really don't have to
>>> worry about queue length. In any case I will keep track
>>> of the average and peak loads.
>>
>>
>> Fine, if you are going to do do thread delegation and load
>> balancing, fine. All I am pointing out in this lesson is
>> that your modeling is flawed for the work loading you
>> expect to get and will not using this
>> Many Thread to 1 FIFO queuing framework.
>
>I still see four different queues as a better solution for a
>single core processor. It is both simpler and more
>efficient. One of the types of jobs will take 210,000 ms and
>this job absolutely positively can not screw up my maximum
>100 ms real time threshold for my high priority jobs. Joe's
>solution is simply broken in this case.
****
Try to find a bright 10-year-old to help you with the complex arithmetic involved here.

And it is NOT "simpler" because you keep postulating these inter-process "signaling"
mechanisms to put the other processes "to sleep", apparently ignoring one of the other
advanced secret concepts (the one that goes beyond "time slicing" called "thread
priority", but if I tell you about it, I will be in danger of violating my Vow Of Secrecy
I took back when I learned operating systems from Dave Parnas and later from Nico
Habermann, Edsgar Dijkstra's best student).

How is it that having your 210,000 ms job lose a timeslice to your 10ms job "screws up"
anything? Duh! But I guess you never heard of "time slicing" so you can be forgiven.
****
>
>> 1) Get any web server with CGI or PHP script mapping
>> support.
>I am not going to learn a whole new computer language to do
>something that I can already do better and in less time
>otherwise.
****
Oh, so you can build this kludge with inter-process signaling in less time than you can
learn PHP (which is pretty simple, so simple that even non-programmers can learn it in a
day; a friend of mine teaches a 1-day PHP course to non-programmers so he knows this is
true)
joe
****
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 12 Apr 2010 16:04

See below...
On Fri, 9 Apr 2010 20:36:34 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>message news:4lmur51ea3nju0dnl7ms6vcurv9f0q9nlc(a)4ax.com...
>> See below...
>> On Thu, 08 Apr 2010 22:16:14 -0400, Hector Santos
>> <sant9442(a)nospam.gmail.com> wrote:
>>
>>>> Some of the above have fixed queue lengths don't they?
>>>
>>>
>>>No, because the question doesn't apply and I doubt you
>>>understand it,
>>>because you have a very primitive understanding of queuing
>>>concepts.
>>>No matter what is stated, you don't seem to go beyond a
>>>basic layman
>>>abstract thinking - FIFO. And your idea of how this
>>>"simplicity" is
>>>applied is flawed because of the lack of basic
>>>understanding.
>> ***
>> Note that I agree absolutely with this! The concept that
>> a fixed-sized queue matters at
>> all shows a total cluelessness.
>
>Bullshit. At least one of the queuing models discards input
>when queue length exceeds some limit.
****
Prove it by citing an appropriate document. And if that document says that the API call
returns an error if the data is discarded, then if any data is lost, it is the fault of
your app. But you said TCP/IP lost data, which is one of your false assumptions. It is
based on no factual information whatsoever.
****
>
>>>There were plenty of links where people had issues - even
>>>for LINUS
>> ****
>> If you ignore the issue of what happens if either side of
>> the pipe fail, or the operating
>> system crashes. But hey, reliability is not NEARLY as
>> important as having buffer lengths
>> that grow (if this is actually true of linux named pipes).
>> This is part of the Magic
>> Morphing Requirements, where "reliability" got replaced
>> with "pipes that don't have fixed
>> buffer sizes".
>> ****
>
>The other issue is reentrancy. I remember from my MS-DOS
>(ISR) interrupt service routine development that some
>processes are occasionally in states that can not be
>interrupted. One of these states is file I/O. Now the whole
>issue of critical sections and other locking issues has to
>be dealt with. A simple FIFO made using a named pipe
>bypasses these issues.
****
ROTFL!

You have confused MS-DOS with a real operating system, for example; you think
interruptibility matters in the slightest, and you have lost the basic concept of how
locks work. Did you think a named pipe operation is not interrupted DOZENS of times while
it is in progress?

This is what I mean by clueless. You have made so many false assumptions here that it is
hard to see if there is any real knowledge around. Certainly, the above paragraph
demonstrates terminal cluelessness.
****
>
>>>
>>>For what you want to use it for, my engineering sense
>>>based on
>>>experience tells me you will have problems, especially YOU
>>>for this
>>>flawed design of yours. Now you have 4 Named Pipes that
>>>you have to
>>>manage. Is that under 4 threads? But you are not
>>>designing for
>>>threads.
>
>That right I discarded threads in favor of processes a long
>time ago.
****
Although there was no "sound reasoning" for this decision. Only much later did you reveal
that you have confused the pseudo-thread library with real threads. And that is very
unsound reasoning.
****
>
>> the message yes, another no. Is the 1 OCR process going to
>>>handle all four pipes? Or 4 OCR processes? Does each
>>>OCR have their
>>>own Web Server? Did you work out how the listening
>>>servers will bind
>>>the IPs? Are you using virtual domains? sub-domains?
>>>Multi-home IP
>>>machine?
>
>One web server (by its own design) has multiple threads that
>communication with four OCR processes (not threads) using
>some form of IPC, currently Unix/Linux named pipes.
****
And none of us understand why it needs to be this complex.
****
>
>> ****
>> The implmenetation proposals have so many holes in it that
>> they would be a puttter's
>> dream, or possibly a product of Switzerland. This design
>> guarantees maxium conflict for
>
>And yet you continue to fail to point out the nature of
>these holes using sound reasoning. I correctly address the
>possible reasoning, and you simply ignore what I say.
****
Let's see: TCP/IP loses data. Threads necessarily have interactions within a process. It
is necessary to invent an IPC signaling method to tell a process to "go to sleep" so a
higher-priority thread can run. How many more examples of unsound reasoning do I have to
identify before it is clear that you have serious holes in your reasoning?
*****
>
>> resources and maximum unused resources, but what does
>> maximum resource utilization and
>> minimum respons time have to do with the design? It
>> guarantees priority inversion,
>
>Yeah it sure does when you critique your misconception of my
>design instead of the design itself. I use the term
>PROCESSES and you read the term THREADS. Please read what I
>actually say, not what you expect that I will say or have
>said.
****
The reason I don't distinguish between threads and processes is that they are the same,
for purposes of the scheduler. You later showed that you had confused REAL threads with
the unix pthreads (pseudo-threads) library, which explains YOUR confusion!
joe
****
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system