From: nmm1 on
In article <0db80478-326d-4b55-b6bd-33d75a811166(a)36g2000yqu.googlegroups.com>,
robertwessel2(a)yahoo.com <robertwessel2(a)yahoo.com> wrote:
>>
>> The other is maintaining global uniqueness and monotonicity while
>> increasing the precision to nanoseconds and the number of cores
>> to thousands. =A0All are needed, but it is probably infeasible to
>> deliver all of them, simultaneously :-(
>
>You only need to keep the clocks well enough synchronized that threads
>running on separate cores can't tell that the order of time values
>stored is actually slightly out of sync across the machine or
>cluster. Basically this is approximately the physical propagation
>delay between nodes, and synchronizing to less than that is relatively
>straight-forward.

Grrk. Not really. To get from one corner of a board to another and
back is (say) 5 nanoseconds, and that's just the speed of light. But
let's say that you can synchronise to 1 nanosecond. The killer is
that two of the most close-coupled cores can often communicate
faster than that, so you can end up with visible discrepancies.
To solve that, you either have to constrain how often each core
can get timestamps - or ensure that the closer each core is, the
better synchronised its clocks are.

I am still doubtful that you can deliver the global properties that
are wanted (essentially sequential consistency, to a higher precision
than any other communication mechanism). Perhaps it can be done, but
I can't see how, and every real product I have seen has added some
constraints.

>Then making sure the values are unique just requires an extension at
>the low end of the time value, and a fixed value per-core to be stored
>there. So effectively core number 13 always stores time values of the
>form "nnnnnnnn.nnnnnnnnn013" and two actually simultaneous stores have
>an artificial difference inserted at the low end. And so long as the
>prior condition (about time/event visibility) is met, you're covered
>here too.

Yes, you are right. It's too long since I worked in this area, and
was forgetting!


Regards,
Nick Maclaren.
From: Terje Mathisen "terje.mathisen at on
nmm1(a)cam.ac.uk wrote:
> In article<b6gj27-5bn.ln1(a)ntp.tmsw.no>,
> Terje Mathisen<"terje.mathisen at tmsw.no"> wrote:
>> You and I have both written NTP-type code, so as I wrote in another
>> message: Separate motherboards should use NTP to stay in sync, with or
>> without hw assists like ethernet timing hw and/or a global PPS source.
>
> Yes, but I thinking of a motherboard with a thousand cores on it.
> While it could use NTP-like protocols between cores, and for each
> core to maintain its own clock, that's a fairly crazy approach.
>
> All right, realistically, it would be 64 groups of 16 cores, or
> whatever, but the point stands. Having to use TWO separate
> protocols on a single board isn't nice.

I agree.

Anything located on a single board should be able to share a common
timing reference, i.e. core crystal.

That only leaves the OS with the task of syncing up the base counter
values during startup.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: Terje Mathisen "terje.mathisen at on
nmm1(a)cam.ac.uk wrote:
> In article<1531844.zBA62FjkXi(a)elfi.zetex.de>,
> Bernd Paysan<bernd.paysan(a)gmx.de> wrote:
>> It's not so bad as you think. As long as your uncertainty of time is
>> smaller than the communication delay between the nodes, you are fine, i.e.
>> your values are unique - you only have to make sure that the adjustments
>> propagate through the shortest path.
>
> Er, no. How do you stop two threads delivering the same timestamp
> if they execute a 'call' at the same time without having a single
> time server? Ensuring global uniqueness is the problem.

No!

Global uniqueness is a separate, but also quite important problem.

It is NOT fair to saddle every single timestamp call with the overhead
required for a globally unique value!

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: robertwessel2 on
On Jan 20, 2:47 pm, n...(a)cam.ac.uk wrote:
> In article <0db80478-326d-4b55-b6bd-33d75a811...(a)36g2000yqu.googlegroups.com>,
>
> robertwess...(a)yahoo.com <robertwess...(a)yahoo.com> wrote:
>
> >> The other is maintaining global uniqueness and monotonicity while
> >> increasing the precision to nanoseconds and the number of cores
> >> to thousands. =A0All are needed, but it is probably infeasible to
> >> deliver all of them, simultaneously :-(
>
> >You only need to keep the clocks well enough synchronized that threads
> >running on separate cores can't tell that the order of time values
> >stored is actually slightly out of sync across the machine or
> >cluster.  Basically this is approximately the physical propagation
> >delay between nodes, and synchronizing to less than that is relatively
> >straight-forward.
>
> Grrk.  Not really.  To get from one corner of a board to another and
> back is (say) 5 nanoseconds, and that's just the speed of light.  But
> let's say that you can synchronise to 1 nanosecond.  The killer is
> that two of the most close-coupled cores can often communicate
> faster than that, so you can end up with visible discrepancies.
> To solve that, you either have to constrain how often each core
> can get timestamps - or ensure that the closer each core is, the
> better synchronised its clocks are.


I should have been clearer, but that's exactly right, the degree of
synchronization required varies based on the distance between nodes,
but it has to be such that no given pair of nodes can see the slop.
In fact, zSeries clusters do just that - the degree of "real"
synchronization within a single machine is substantially higher than
between machines in a cluster.

I don't know if there is a TOD clock synchronization hierarchy
internal to a single zSeries machine, but it's possible - zSeries
machines are built out of 1-4 "books," each of which contains five
quad core chips, which provides a natural hierarchy. On the flip
size, these are approximately 1000MIPS cores, and managing even 1ns
synchronization across a meter or two of distance isn't that hard.
But such a thing would clearly be possible - cores in separate books
are clearly at least several tens of ns apart while two cores on a
chip are much closer. But such a hierarchy of synchronization levels
would almost have to naturally match the hardware architecture of a
big system.


> I am still doubtful that you can deliver the global properties that
> are wanted (essentially sequential consistency, to a higher precision
> than any other communication mechanism).  Perhaps it can be done, but
> I can't see how, and every real product I have seen has added some
> constraints.


I think the cluster wide TOD clock on zSeries clusters (Sysplex) comes
pretty close, at least.
From: Stephen Fuld on
On 1/20/2010 2:20 PM, Terje Mathisen wrote:
> nmm1(a)cam.ac.uk wrote:
>> In article<1531844.zBA62FjkXi(a)elfi.zetex.de>,
>> Bernd Paysan<bernd.paysan(a)gmx.de> wrote:
>>> It's not so bad as you think. As long as your uncertainty of time is
>>> smaller than the communication delay between the nodes, you are fine,
>>> i.e.
>>> your values are unique - you only have to make sure that the adjustments
>>> propagate through the shortest path.
>>
>> Er, no. How do you stop two threads delivering the same timestamp
>> if they execute a 'call' at the same time without having a single
>> time server? Ensuring global uniqueness is the problem.
>
> No!
>
> Global uniqueness is a separate, but also quite important problem.
>
> It is NOT fair to saddle every single timestamp call with the overhead
> required for a globally unique value!

There is a simple solution to this problem. Assume that the time stamp
is updated every microsecond, and that it is a hardware register within
the chip. Further assume that the timer field has enough bits to allow
for say nanoseconds, but these bits are not guaranteed to be accurate.
Then the hardware can use those bits as a "request counter". That is,
the value is incremented once every request and reset to zero every time
the clock increments the least significant bit (i.e microseconds in our
example.) This guarantees uniqueness with a trivial amount of hardware
and no additional overhead. Of course, you have to pick the sizes to
allow for future implementations, etc. but this isn't hard.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)