From: nmm1 on
In article <1531844.zBA62FjkXi(a)elfi.zetex.de>,
Bernd Paysan <bernd.paysan(a)gmx.de> wrote:
>
>> The other is maintaining global uniqueness and monotonicity while
>> increasing the precision to nanoseconds and the number of cores
>> to thousands. All are needed, but it is probably infeasible to
>> deliver all of them, simultaneously :-(
>
>It's not so bad as you think. As long as your uncertainty of time is
>smaller than the communication delay between the nodes, you are fine, i.e.
>your values are unique - you only have to make sure that the adjustments
>propagate through the shortest path.

Er, no. How do you stop two threads delivering the same timestamp
if they execute a 'call' at the same time without having a single
time server? Ensuring global uniqueness is the problem.

> For monotonicity, just make sure your
>propagate through the shortest path. For monotonicity, just make sure your
>corrections for NTP don't step back. The NTP implementations I know adjust
>clocks by slowing them down or speeding them up.

Don't bet on it :-( They do when all goes well, but many of them
will behave very weirdly indeed (including jumping both forward and
back) when they get confused. xntpd certainly does, and that's
perhaps the most common implementation. Or at least did - for the
purposes of strict accuracy.


Regards,
Nick Maclaren.
From: Anne & Lynn Wheeler on
nmm1(a)cam.ac.uk writes:
> Er, no. How do you stop two threads delivering the same timestamp
> if they execute a 'call' at the same time without having a single
> time server? Ensuring global uniqueness is the problem.

one the requirements was to correctly order dbms transaction log records
after a failure (for recovery). a standard dbms speed-up is to allow
transaction to be considered committed after the corresponding log
record has been written to disk ... but the altered record in buffer
memory may not be pushed out to dbms location (lazy writes to DBMS disk
location).

recovery (after failure) requires using the log to sequentially "rerun"
the transactions ... eventually getting the dbms image on disk to
consistent state.

a cluster dbms implementation use to force record to disk before
allowing it to migrate into DBMS buffer on a different processor. to
speed things up, it would be possible to allow modified record to be
transmitted (over high-speed link) between dbms buffers (in different
processors in cluster). the problem then is that there could be multiple
committed transaction changes ... recorded in different dbms logs
.... but not reflected in the DBMS record.

as part of supporting direct buffer-to-buffer copies (w/o having to
force out to disk) ... a mechanism was needed (for recovery) to merge
transaction logs from different systems so that they have the original
global temporal ordering. The requirement isn't actually to have exact
time value for each transaction ... but to have multiple logs to be
merged so that entries occured in the original sequence. unique accurate
time works ... but so would nearly any unique monotonically increasing
number (say like a transaction version number ... which could be
supported as part of the operation of dbms cluster distributed lock
manager ... which also piggy-backs buffer-to-buffer record copies as
part of lock traffic).

--
40+yrs virtualization experience (since Jan68), online at home since Mar1970
From: EricP on
Andy "Krazy" Glew wrote:
> EricP wrote:
>> Andy "Krazy" Glew wrote:
>>> Note that x86 eventually got around to adding READ_EIP instruction.
>>
>> Where is that? I find no reference to such an instruction.
>
> Perhaps I should have said x86-64. And perhaps I have slipped a bit,
> wishful thinking and all that, but does not LEA with a RIP-relative
> addressing mode do what you want?

Ok, yes. I thought you were referring to legacy mode.

In x64 mode there is also SysCall and SysRet.
They seem to be more in line with my requirements as
SysRet re-enables interrupts as it returns to user mode.

SysCall does not load the kernel stack pointer though.
It just disables interrupts and load the kernel RIP
and that code must load the kernel stack pointer.

> I must admit that I have slightly mixed feelings about PIC. Sure, it's
> a good idea to be able to relocate code. But that is PIC code
> addressing. I am not so sure that it is a good idea to encourage data
> to be at a fixed offset from the code. Perhaps for constants.

Yeah, I think I was just time tripping.

It's not worth adding support for it to an operating system
and image file format as as hardware support is so unlikely.
You are better off to focus on standard image relocation
techniques and automatically reusing code pages where possible.

> Also, as somebody who has had to deal with security issues: PIC is a
> gift to malware. After all, one of the basic characteristics of binary
> code injections via buffer overflows is that they are at an unknown
> address. PIC makes it easier to write viruses. Although at the same
> time it makes it easier to randomize the address space, and thereby make
> it harder to write viruses. Fortunately, x86-64 has other good features
> that, when correctly employed, can hinder malware. And, fortunately,
> x86-64 breaks the need for legacy compatibility, affording the
> opportunity to

I think you are blaming PIC for bad software.
There is an exploit, called Return-oriented-programming,
where by making calls to just the right place in a library you can
accomplish anything. Should we also eliminate RET instructions?

Most of the C run time library is like running with scissors anyway.
Is there any language other than C/C++ that suffers from buffer
overflows? As long as the language allows and the rtl encourages
buffer overflows then bad stuff is going to happen.

I don't use the silliest of the C rtl routines, the ones without
buffer length args, and I have never had this problem (afaik :-).

> I suspect that it is better overall to use one or more base registers
> for data addresses. Rather than relying on RIP, the instruction
> pointer, as a free base register. But then that requires at least one
> dedicated register, and even with REX x86-64 doesn't really have enough
> registers.
>
> I sometimes think that we should have RIP-relative branching and control
> flow, and RIP-relative loading of constants. But that we should
> discourage writing to RIP relative data locations. E.g. by disallowing
> it in the store addressing modes. So long as you can do a RIP relative
> LEA, you can always get RIP relative stores if you want.
>

RIP-relative addressing makes DLL code a lot easier
as it eliminates all that GOT table stuff. The OS just maps
all the static data areas right after the code pages.

Eric






From: robertwessel2 on
On Jan 20, 4:52 am, Terje Mathisen <"terje.mathisen at tmsw.no">
wrote:
> robertwess...(a)yahoo.com wrote:
> > Other attributes of the TOD clock are that the values are global,
> > unique, and monotonically increasing as viewed by *all* CPUs in the
> > system.  That allows timing to happen across CPUs, things to be given
> > globally unique timestamps, etc.  The TOD clock also provides the
> > basis for timer interrupts on the CPU.
>
> > It's very handy.
>
> It is also the "Right Stuff", i.e. as I wrote earlier the correct way to
> handle this particular problem.
>
> The only real remaining problem is related to NTP, i.e. when you want to
> sync this system-global TOD clock to UTC/TAI.
>
> Afair IBM does have a (very expensive!) hw solution for this, instead of
> the trivial sw needed for a RDTSC-based clock which I outlined earlier.


IBM's problem is that they need to keep TOD clocks synchronized
cluster-wide. With a single machine, there have been ways to sync to
an external clock, in some cases involving third party software (and
hardware). Not necessarily ideal, but possible. In many cases the
problem was more political than technical.

Anyway, for a cluster this used to be accomplished by an external
device known as a Sysplex Timer, and each machine in the cluster would
connect to the Timer (and there would usually be at least two for
redundancy). Sysplex Timer's had an option for accessing an external
time source, and would drift the hardware clocks as necessary to stay
in sync with that.

These days it's somewhat cheaper and more straight-forward, since the
clock synchronization hardware is now built into current machines, and
a dedicated external Sysplex Timer is not required (I don't remember
if the current z10s still support a Sysplex Timer or not - at least
some generations of hardware did so that you could have a cluster with
both older and newer boxes). With the new synchronization support,
there are architected functions for performing fine adjustments to the
clock stepping rate, and it's no longer dedicated hardware doing the
adjustment, but rather OS code.
From: robertwessel2 on
On Jan 20, 5:32 am, n...(a)cam.ac.uk wrote:
> In article <ds3j27-tsm....(a)ntp.tmsw.no>,
> Terje Mathisen  <"terje.mathisen at tmsw.no"> wrote:
>
> >robertwess...(a)yahoo.com wrote:
> >> Other attributes of the TOD clock are that the values are global,
> >> unique, and monotonically increasing as viewed by *all* CPUs in the
> >> system.  That allows timing to happen across CPUs, things to be given
> >> globally unique timestamps, etc.  The TOD clock also provides the
> >> basis for timer interrupts on the CPU.
>
> >> It's very handy.
>
> >It is also the "Right Stuff", i.e. as I wrote earlier the correct way to
> >handle this particular problem.
>
> Yes.  But see below.
>
> >The only real remaining problem is related to NTP, i.e. when you want to
> >sync this system-global TOD clock to UTC/TAI.
>
> No, not at all.  There are two problems.  That's one.
>
> The other is maintaining global uniqueness and monotonicity while
> increasing the precision to nanoseconds and the number of cores
> to thousands.  All are needed, but it is probably infeasible to
> deliver all of them, simultaneously :-(


You only need to keep the clocks well enough synchronized that threads
running on separate cores can't tell that the order of time values
stored is actually slightly out of sync across the machine or
cluster. Basically this is approximately the physical propagation
delay between nodes, and synchronizing to less than that is relatively
straight-forward.

Then making sure the values are unique just requires an extension at
the low end of the time value, and a fixed value per-core to be stored
there. So effectively core number 13 always stores time values of the
form "nnnnnnnn.nnnnnnnnn013" and two actually simultaneous stores have
an artificial difference inserted at the low end. And so long as the
prior condition (about time/event visibility) is met, you're covered
here too.

You can artificially reduce the timer frequency requirement (and ease
your synchronization problems) by imposing a minimum time between
clock reads. And frankly a “real” timer rate an order of magnitude
slower than the instruction rate probably doesn’t eliminate any real
utility. A few old S/370s have done that (a 1MHz timer on a 3 MIPS
machine, would result in two consecutive STCKs taking at least a
microsecond, for example). And modern boxes do it for the old 64 bit
timer format, which is now effectively out of resolution (so if you
use the fully synchronized version of “store clock 64,” there’s an
artificial maximum rate imposed on those instructions).