From: nmm1 on
In article <n7dj27-n7n.ln1(a)ntp.tmsw.no>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>
>> On a machine with a LOT of cores, you could update it directly.
>> On one without, you would want a special loop which would take the
>> hardware clock and the constants maintained by the NTP-like code,
>> and update the clock field in memory once every microsecond. That
>> would behave exactly like a separate core. And, because updating
>> the memory field is a kernel operation, the implementation could be
>> changed transparently.
>
>It could not:
>
>Anything that updates a real memory location every us is a performance bug!

Not in this context! We are clearly at cross purposes. The actual
memory location need not be a DIMM, but could be a logical DIMM
actually stored in a CPU's SRAM (as you describe below). My point
is to use the standard memory distribution system, and not necessarily
real, physical memory.

Yes, I agree that doing that is STILL a problem on current systems,
but my other point is that they are going to HAVE to tackle the same
issue for ordinary memory to make the currently favoured shared memory
programming designs work on a large number of cores.

>If you instead use a memory-mapped timer chip register, then you've
>still got the cost of a real bus transaction instead of a couple of
>core-local instructions.

Eh? But how are you going to keep a thousand cores synchronised?
You can't do THAT with a couple of core-local instructions!


Regards,
Nick Maclaren.
From: Anne & Lynn Wheeler on

Terje Mathisen <"terje.mathisen at tmsw.no"> writes:
> Anything that updates a real memory location every us is a performance bug!
>
> If you instead use a memory-mapped timer chip register, then you've
> still got the cost of a real bus transaction instead of a couple of
> core-local instructions.

one of the justification for the 370 timer facilities. 360s had location
"80" timer in low-store. lower-end 360 modules updated in millisecond
range ... higher end 360s updated low order bit every 13+ microseconds.

for compatibility, 370s did provide support for location 80 timer but at
the millisecond range.

univ. where i was undergraduate had 360/67 (that had "high-speed"
location 80 timer). I had been doing a bunch of enhancements to (virtual
machine) cp67 ... one of which was adding tty/ascii terminal support to
cp67. part of this was I attempted to do something with the 2702
terminal controller that it couldn't quite do (but should). somewhat as
a result, the univ. started a clone controller project ... using an
interdata/3, reverse engineer the 360 channel interface, build channel
interface board for the interdata/3, program the interdata/3 to emulate
2702 controller with some additional function (later four of us got
written up for being responsible for mainframe clone controller
business).

some early controller tests resulted in bringing down the 360/67
(hardware "red-light"). the issue was the memory bus was shared between
processor, the location 80 timer, and i/o channels (and these were
non-cache machines). the location 80 timer had some leeway if the bus
was in use when timer tic'ed ... but if the timer tic'ed again ... and
there was previous timer memory update still pending ... the machine
would stop/red-light.

had to go back and redo the controller channel board to make sure that
it periodically told the channel to release the memory bus (in middle of
transfers) so that any pending timer tic update could occur.

misc. past posts mentioning clone controller effort
http://www.garlic.com/~lynn/subtopic.html#360pcm

--
40+yrs virtualization experience (since Jan68), online at home since Mar1970
From: Terje Mathisen "terje.mathisen at on
nmm1(a)cam.ac.uk wrote:
> In article<n7dj27-n7n.ln1(a)ntp.tmsw.no>,
> Terje Mathisen<"terje.mathisen at tmsw.no"> wrote:
>> If you instead use a memory-mapped timer chip register, then you've
>> still got the cost of a real bus transaction instead of a couple of
>> core-local instructions.
>
> Eh? But how are you going to keep a thousand cores synchronised?
> You can't do THAT with a couple of core-local instructions!

You and I have both written NTP-type code, so as I wrote in another
message: Separate motherboards should use NTP to stay in sync, with or
without hw assists like ethernet timing hw and/or a global PPS source.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: nmm1 on
In article <b6gj27-5bn.ln1(a)ntp.tmsw.no>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>nmm1(a)cam.ac.uk wrote:
>> In article<n7dj27-n7n.ln1(a)ntp.tmsw.no>,
>> Terje Mathisen<"terje.mathisen at tmsw.no"> wrote:
>>> If you instead use a memory-mapped timer chip register, then you've
>>> still got the cost of a real bus transaction instead of a couple of
>>> core-local instructions.
>>
>> Eh? But how are you going to keep a thousand cores synchronised?
>> You can't do THAT with a couple of core-local instructions!
>
>You and I have both written NTP-type code, so as I wrote in another
>message: Separate motherboards should use NTP to stay in sync, with or
>without hw assists like ethernet timing hw and/or a global PPS source.

Yes, but I thinking of a motherboard with a thousand cores on it.
While it could use NTP-like protocols between cores, and for each
core to maintain its own clock, that's a fairly crazy approach.

All right, realistically, it would be 64 groups of 16 cores, or
whatever, but the point stands. Having to use TWO separate
protocols on a single board isn't nice.


Regards,
Nick Maclaren.
From: Bernd Paysan on
nmm1(a)cam.ac.uk wrote:
> The other is maintaining global uniqueness and monotonicity while
> increasing the precision to nanoseconds and the number of cores
> to thousands. All are needed, but it is probably infeasible to
> deliver all of them, simultaneously :-(

It's not so bad as you think. As long as your uncertainty of time is
smaller than the communication delay between the nodes, you are fine, i.e.
your values are unique - you only have to make sure that the adjustments
propagate through the shortest path. For monotonicity, just make sure your
corrections for NTP don't step back. The NTP implementations I know adjust
clocks by slowing them down or speeding them up.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/