From: Bernd Paysan on
nmm1(a)cam.ac.uk wrote:
> The key is to have a clean system design, so the amount of sanity
> checking and the size of a standard prelude are minimal. For example,
> a high proportion of system calls in many applications can be very
> simple, 'unprivileged' ones like reading the clock or debugger hooks.

Actually, with a clean system design, many of those unprivileged ones can
be simple unprivileged library calls. rdtsc is unprivileged, all you need is
a factor (clocks per second) and a global offset - then you can do your
gettimeofday() completely in userland (AFAIK, people have already done
that).

This can go a lot further. In effect, you can do most system stuff in
userland, including even reading and writing file data, and schedule file
metadata for changes ("schedule" means that finally, when committed, the
data is sanity checked by the kernel - but that doesn't need to be too
frequently). All the system needs to do for you is to map those parts of
the disk which you can read or write into your memory map - read only stuff
read-only, read-write data stuff on the disk read-write. The actual reads
and writes from and to the disk still happen in kernel land, but as long as
the program works from cache, no OS intervention necessary.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
From: Noob on
Terje Mathisen wrote:

> Anton Ertl wrote:
>
>> Andy Glew wrote:
>>
>>> I still think that both Intel and AMD missed a big opportunity, to make
>>> system calls truly
>>> as fast as function calls. Chicken and egg.
>>> Nobody wants to make the investment in hardware without a proven
>>> software benefit,
>>> but existing software is optimized to avoid expensive system call
>>> privilege level changes.
>>
>> But given that system calls have to do much more sanity checking on
>> their arguments, and there is the common prelude that you mentioned
>> (what is it for?), I don't see system calls ever becoming as fast as
>> function calls, even with fast system call and system return
>> instructions.
>
> _Some_ system calls don't need that checking code!
>
> I.e. using a very fast syscall(), you can return an OS timestamp within
> a few nanoseconds, totally obviating the need for application code to
> develop their own timers, based on RDTSC() (single-core/single-cpu
> systems only), ACPI timers or whatever else is available.
>
> Even if this is only possible for system calls that deliver very simple
> result, and where the checking code is negligible, this is till an
> important subset.
>
> The best solution today is to take away all attempts on security and
> move all those calls into a user-level library, right?

What about Linux VDSO / vsyscalls ?

http://www.x86-64.org/pipermail/patches/2006-November/003498.html
http://juliusdavies.ca/posix_clocks/clock_realtime_linux_faq.html

Regards.
From: Noob on
Andy Glew wrote:

> I wrote the following for my wiki,
> http://semipublic.comp-arch.net/wiki/SYSENTER/SYSEXIT_vs._SYSCALL/SYSRET
> and thought that USEnet comp.arch might be interested

This old post by Linus Torvalds seems somewhat relevant.
http://lkml.org/lkml/2002/12/18/218
From: "Andy "Krazy" Glew" on
Anton Ertl wrote:
> "Andy \"Krazy\" Glew" <ag-news(a)patten-glew.net> writes:
>> I still think that both Intel and AMD missed a big opportunity, to make
>> system calls truly
>> as fast as function calls. Chicken and egg.
>> Nobody wants to make the investment in hardware without a proven
>> software benefit,
>> but existing software is optimized to avoid expensive system call
>> privilege level changes.
>
> But given that system calls have to do much more sanity checking on
> their arguments, and there is the common prelude that you mentioned
> (what is it for?), I don't see system calls ever becoming as fast as
> function calls, even with fast system call and system return
> instructions.
>
> - anton

There are ways to reduce the work involved in doing sanity checking on
arguments. E.g. a properly designed capability machine architecture can
do this. Or even just "perform access as if in user mode" in the VM.
Rather like Sun's load/store alternate address space, except not for I/O.

This may very well be one of the ways in which I fell short: I have an
agenda to make syscalls faster, of which fast system call instructions
such as SYSENTER/SYSEXIT and SYSCALL/SYSRET are just one step.

It's not clear how much value one step done in isolation has. And all
or nothing feature agendas tend not to happen, unless there is big demand.

--

But as for the value: I think that it is there. Many of the security
holes in modern systems arise because syscalls and cross domain
transfers are too expensive: instead of putting things in different
security domains, processes or the like, we put them in the same
security domain. And then get surprised when, e.g., a bug in a graphics
device driver can allow an OS level break in.
From: "Andy "Krazy" Glew" on
mac wrote:
>> I have observed that this concern about interrupts that cannot be
>> blocked is a key source of complexity in system architecture. The RISC
>> approach may be to assume that all interrupts, even NMIs, can be
>> blocked, broefly, as the syscall code sets things up. But the advent of
>> things like virtual machines, SMIs, etc., means that you can't make this
>> assumption.
>
>
> Didn't Alpha PALcode have someting like this? Special execution
> environment, no interrupts, priveleged register access?
> I don't know much about it, but it looked like a clever hook for CISC
> operations.

Yes.

And, for that matter, microcode in machines like the x86 amounts to the
same thing.