From: Andy 'Krazy' Glew on
On 6/27/2010 10:11 AM, Andy 'Krazy' Glew wrote:
> On 6/27/2010 9:40 AM, Andy 'Krazy' Glew wrote:
>> On 6/27/2010 8:25 AM, Terje Mathisen wrote:
>>> MitchAlsup wrote:
>>>> On Jun 26, 1:51 pm, Andy 'Krazy' Glew<ag-n...(a)patten-glew.net>
>>>> wrote:
>>>>> https://semipublic.comp-arch.net/wiki/Picking_N-th_ready_element
>>>>>
>>>>> One common operation in schedulers is picking the 1st, or 1st and
>>>>> 2nd, or ... N-th ... ready elements to schedule.
>>
>>> For large N and M, would it be appropriate to look at probabilistic
>>> methods?
>>>
>>> I.e. a single counter which knows how many ready (R) elements there are
>>> at a given time, when asking for ready(M) we start at (approximately)
>>> element M*N/R.
>>>
>>> What I'm asking is if anyone have modeled probabilistic pickers?
>
>> Terje, I don't really know of any work in probabilistic pickers in
>> hardware. I encourage you to think/talk/work on them some more.
>
> Tweaking my earlier answer:

Tweaking further: I did not even remember my own work in this area. That's what you get for never publishing.

Pentium 4 / Willamette, as you know, was plagued by positive feedback issues that resulted in misbehaviors like replay
tornadoes. You would get into these patterns that you would never break out of, until the next interrupt or external
bus snoop. Limit cyckes.

Bob Colwell wanted me to look at this.

My reaction was that this was standard control theorty. Perhaps with a bit of chaos. Suboptimal limkt cycles as
(not-so-)strange attractors. The best way to break up limit cycles and get into a better neighborhood is to add a bit
of randomness.

I mainly considered randomness in terms of adding random delay to certain instructions, or in terms of the
aforementioned random arbitrators.

I did not consider making the pickers themselves random.

Such randomness solved the problem - at least insofar as I was able to reproduce the problem in my own simulator at
UWisc. Although strictly oldest frst scheduling also tended to solve the worst problems.

--

For Robert Meyers: I always though in terms of adding a bit of turbulence to promote fuel mixing.
From: Terje Mathisen "terje.mathisen at on
Andy 'Krazy' Glew wrote:
> On 6/27/2010 8:25 AM, Terje Mathisen wrote:
>> For large N and M, would it be appropriate to look at probabilistic
>> methods?
>>
>> I.e. a single counter which knows how many ready (R) elements there are
>> at a given time, when asking for ready(M) we start at (approximately)
>> element M*N/R.
>>
>> What I'm asking is if anyone have modeled probabilistic pickers?
>>
>> BTW, it is well-known that you can pick the Nth percentile out of an
>> unordered array in log(N) time, using a non-recursive half quick sort
>> which always picks the relevant partition to drill further into.
>>
>> I assume this will always to out of the question though, since it would
>> throw away nearly all the effort expended in partly sorting the array.
>
> Terje, I don't really know of any work in probailistic pickers in
> hardware. I encourage you to think/talk/work on them some more.
:-)
>
> A complication: your first suggestion, starting at element M*N/R, I
> think is making the implcit assumption that the array is sorted. This is

Yes, I did assume that it would be sorted in binary opcode order. This
makes it much simpler to pick the oldest ready instructions, right?

> often not true in hardware schedulers. I did in that first post talk
> about running the array ready bits through a permutation, that allows us
> to look at the ready bits in order even though the array itself is out
> of order, that really only works for small arrays.
>
> The quicksort idea is interesting. I wonder if we get into the usual
> heapsort vs. quicksort argument, deterministic vs. probabilistic, where

Assuming the instruction array is simply an unordered array of ready (1
bit) and an age field, we want to pick up the M oldest ready entries, right?

(We assume that the compiler (or asm programmer) have tried to schedule
the code so as to minimize total latency, in which case the oldest ready
instruction is always on the critical path.)

> deterministic corresponds to current synchronous clocking, and
> probailistic corresponds to asynchronous clocking? As I have discussed
> here, I have long suspected that asynchronous circuis are one road to
> the future. Forgetting that, most present OOO schedulers must be
> pipelineable, delivering back to back instructions in successive clocks.
> There has been little work that I am aware of in multicycle loop
> schedulers.

My idea was rather the opposite, i.e. use an approximate picker to make
it faster. :-)

> Of course I have been thinking about it. E.g. in my multicycle work.
> Some of the best aspects are figuring out how to make use of a big slow
> outer scheduler. Key: make the big slow outer scheduler work in units of
> multiple instructions, instruction groups of 8, 16, or more, where the
> latency through the execution group is related to, ideally at least, the
> latency of the scheduler. Make the outer scheduler approximate: don't
> schedule based on all inputs, only on one or two or a few: e.g. worst
> case a 16 instruction group might have 32 or even 48 inputs, but you
> might only provide 2 or 3. Don't stop filling an instruction group when
> the number of inputs it can handle is exceeded: instead just select a
> subset of the most important inputs, e.g. the last arriving. (Although
> there may be better choices still.) Don't restrict instruction groups to
> dependency chains.

Interesting...

Terje



--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: MitchAlsup on
On Jun 26, 8:35 pm, MitchAlsup <MitchAl...(a)aol.com> wrote:
> With proper access to true and complement signals from latching
> points, one can pick the first element of 4 in one gate delay, 9 in 2
> gate delays, and 24 in 3 gate delays. None of this violates 4-input
> fan-in rules, but does apply some heavy loads to some inverter fan-
> outs. I have the schematics if anyone is interested.

I looked up my schematics. The NOR gates only have 3 inputs, the NAND
gates can have 4 inputs. Fan-outs are no worse than about 2/3rds those
found in a carry select adder carry chain fan-out. In any event, for
the Athlon/Opteron speed machines, one can easily pick 4 from up to 24
optimally in age order.

That 1:9 picker can be a 1:10 picker in the same gate delay.
In addition, all these pickers have an output that signals if a pick
has been found with no additional delay from the pick element itself.

Only if the number of elements in the sub-window is increased, or the
number of picks increases beyond 4 per cycle, or the cycle time drops
below 14 gates is a more exotic picker needed.

Mitch

From: Robert Myers on
Andy 'Krazy' Glew wrote:

>
> For Robert Meyers: I always though in terms of adding a bit of
> turbulence to promote fuel mixing.

Some turbulent flows; e.g. a turbulent aircraft vortex wake, can be
stubbornly persistent. It seems like your limit cycles would be quite
brittle; all you have to do is to mess up the repeating cycle, like
pushing a stuck phonograph needle to the next groove. But maybe what
you are describing is nastier than I imagine, as if the phonograph
needle would always find its way back into the stuck groove.

Robert.
From: Andy 'Krazy' Glew on
On 6/27/2010 6:43 PM, Robert Myers wrote:
> Andy 'Krazy' Glew wrote:
>
>>
>> For Robert Meyers: I always though in terms of adding a bit of
>> turbulence to promote fuel mixing.
>
> Some turbulent flows; e.g. a turbulent aircraft vortex wake, can be
> stubbornly persistent. It seems like your limit cycles would be quite
> brittle; all you have to do is to mess up the repeating cycle, like
> pushing a stuck phonograph needle to the next groove. But maybe what you
> are describing is nastier than I imagine, as if the phonograph needle
> would always find its way back into the stuck groove.

Actually, I was thinking in terms of fuel-air mixing, where you want turbulence to promote mixing.

Most of the epicyclic bad state sequences I saw were fragile: a little bit of permutation would push them into a better
state.