|
From: John Savard on 6 Sep 2005 06:46 On Sun, 4 Sep 2005 01:15:40 +0200, "Skybuck Flying" <nospam(a)hotmail.com> wrote, in part: >To keep the system responsive windows >xp would need to shorten each time slice and thereby creating more contex >switches thus leaving less processing power to do anything usefull. To avoid context switching, you don't need to have another whole core. Just use register renaming, and have a multi-threaded CPU. First, make the fastest possible single core you can. Then, make it multithreaded, so that the only context switches you have are the unavoidable ones due to procedure calls. Then, if you still have die area, add cores. John Savard http://home.ecn.ab.ca/~jsavard/index.html http://www.quadibloc.com/index.html _________________________________________ Usenet Zone Free Binaries Usenet Server More than 140,000 groups Unlimited download http://www.usenetzone.com to open account
From: Anne & Lynn Wheeler on 6 Sep 2005 12:18 jsavard(a)excxn.aNOSPAMb.cdn.invalid (John Savard) writes: > To avoid context switching, you don't need to have another whole > core. Just use register renaming, and have a multi-threaded CPU. > > First, make the fastest possible single core you can. > > Then, make it multithreaded, so that the only context switches you > have are the unavoidable ones due to procedure calls. > > Then, if you still have die area, add cores. that was sort of the 370/195 dual i-stream proposal from the early 70s. the issue was that most codes kept the pipeline only about half-full. having dual instruction counters and dual registers .... with pipeline tagging registers and i-stream had a change of maintaining close to aggregate, peak thruput of the pipeline. amdahl in the early 80s ... had another variation on that. running straight virtual machine hypervisor ... resulted in context switch on privilege instructions and i/o interrupts (between the virtual machine and the virtual machine hypervisor), including saving registers, other state, etc (and then restoring). starting with virtual machine assist on the 370/158 and continuing with ECPS on the 138/148 to the LPAR support on modern machines ... more and more of the virtual machine hypervisor was being implemented in the microcode of the real machine ... aka the real machine instruction implementation (for some instructions) would recognize whether it was in real machine state or virtual machine state and modify the instruction decode and execution appropriately. one of the issues was that microcode tended to be a totally different beast and with little in the way of software support tools. http://www.garlic.com/~lynn/subtopic.html#mcode amdahl 370 clones implemented an intermediate layer called macrocode that effectively looked and tasted almost exactly like 370 ... but had its own independent machine state. this basically allowed almost exactly moving some virtual machine hypervisor code to macrocode level .... w/o the sometimes difficult translation issues ... while eliminating standard context switching overhead (register and other state save/restore). it was also an opportunity to do some quick speed up. standard 370 architecture allows for self-modifying instructions ... before stuff like speculative execution (with rollback) support ... the checking for catching whether the previous instruction had modified the current (following) instruction being decoded & scheduled for execution .... frequently doubled 370 instruction elapsed time processing. macrocode architecture was specified as not supporting self-modifying 370 instruction streams. i've frequently claimed that the 801/risc formulation in the mid-70s, http://www.garlic.com/~lynn/subtopic.html#801 was opportunity to do the exact opposite of other stuff in the period: combination of the failure of future system project (extremely complex hardware architecture) http://www.garlic.com/~lynn/subtopic.html#futuresys and heavy overhead performance paid by 370 architecture supporting self-modifying instruction stream and very strong memory coherency (and overhead) in smp implementations. separating instruction and data caches and providing for no coherency support between stores to the data cache and what was in the instruction cache ... precluded even worrying about self-modifying instruction operation. no support for for any kind of cache coherency .... down to the very lowest of levels ... also precluded such support for multiprocessing operations. i've sometimes observed that ibm/motorola/apple/etc somerset was sort of taking the rios risc core and marrying it with 88k cache coherencyl. recent, slightly related posting http://www.garlic.com/~lynn/2005o.html#37 What ever happened to Tandem and NonStop OS? -- Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
From: robertwessel2@yahoo.com on 6 Sep 2005 19:00 Skybuck Flying wrote: > <robertwessel2(a)yahoo.com> wrote in message > news:1125731066.086811.73210(a)z14g2000cwz.googlegroups.com... > > > > The number of context switches is not typically reduced in any > > significant way by going to a dual (or multi-) CPU system. The vast > > majority of context switches occur because a thread blocks or calls on > > another thread or context for a service (really the same thing), and > > those are *not* impacted by the number of CPUs in the system. Context > > switches due to time slice expirations *can* be reduced by a multiple > > CPU configuration, but only if the number of threads tends to be close > > to the number of CPUs. If there are many runnable thread, and the time > > slice interval does not change, even that can be slower on the dual > > processor system, since slices will occur after only half the number of > > instructions. In any event, time slice triggered context switches are > > *very* rare. At best a few tens to a few hundred per second. > > As far as I can tell windows xp doesn't care how many threads are running, > it will simply give each thread the same ammount of time slice thereby > lagging the whole system. Windows, like pretty much any other OS, will dole out CPU time in time-slice increments to threads that run CPU bound, but only rarely does a thread (in most systems) ever run out a time slice before blocking on some event (and thus threads are rarely CPU bound). > That's why the number of contex switches doesn't > increase as the number of threads increase. This is perhaps true for a collection of CPU bound threads, but the vast majority of threads in a system are (typically) not. > Windows xp makes absolutely no > attempt to keep the system responsive. To keep the system responsive windows > xp would need to shorten each time slice and thereby creating more contex > switches thus leaving less processing power to do anything usefull. Different versions of Windows use different time-slice intervals. And there are some semi-documented ways that you can change that. Windows takes considerable effort to keep the system responsive (CPU bound threads get a priority reduction, foreground applications get a boost, etc.), but in the presence of multiple CPU bound threads attached to message queues (which is an application design problem), there's not a whole lot you can do. In short, time slice intervals have only a little to do with system responsiveness.
From: Jan Vorbrüggen on 7 Sep 2005 11:17 > Different versions of Windows use different time-slice intervals. And > there are some semi-documented ways that you can change that. Windows > takes considerable effort to keep the system responsive (CPU bound > threads get a priority reduction, foreground applications get a boost, > etc.), but in the presence of multiple CPU bound threads attached to > message queues (which is an application design problem), there's not a > whole lot you can do. I have never seen this work properly in W2K...once you have one compute- bound thread running, the GUI becomes totally unresponsive. Nextstep, now that is another matter. Jan
From: Anne & Lynn Wheeler on 7 Sep 2005 13:58
"robertwessel2(a)yahoo.com" <robertwessel2(a)yahoo.com> writes: > Windows, like pretty much any other OS, will dole out CPU time in > time-slice increments to threads that run CPU bound, but only rarely > does a thread (in most systems) ever run out a time slice before > blocking on some event (and thus threads are rarely CPU bound). i had done dynamic adaptive scheduling back in the 60s as an undergraduate. this was sometimes referred to as fairshare (or wheeler) scheduler ... because the default scheduling policy was fairshare basically light-weight interactive tasks got shorter quanta than more heavy-weight longer running tasks. advistory deadline was calculated proportional to size of the quanta and the recent resource utilization. quanta could span certain short i/os (disk, etc) ... so heavy utilization wasn't necessary restricted to absolutely pure cpu bound (just reasonably high rate of cpu consumption). light-weight interactive tasks would tend to have more frequent quanta with shorter deadlines ... and heavy weight tasks would have less frequent larger quanta. interactive tasks would appear to be more responsive if they could complete within a few of the shorter quanta and/or if they haven't been using a lot of resources recently. basically the objective was to uniformally control of the overall rate of resource consumption (regardless of the quanta size). the other issue was that this was done as modifications to existing cp67 system that had an extremely complex scheduler that wasn't directly controlling resource utilization ... just moving priorities up & done and itself could consumer 10-15 percent of total cpu utilization. so a major objective of the scheduler change was to not only implement effective resource consumption supporting a variety of scheduling policies ... but do it with as close as possible to zero pathlength. http://www.garlic.com/~lynn/subtopic.html#fairshare -- Anne & Lynn Wheeler | http://www.garlic.com/~lynn/ |