From: Anthony Liguori on
On 03/18/2010 11:54 AM, Ingo Molnar wrote:
> I took a quick look at the qemu.git log and more than half of all recent
> contributions came from Linux distributors.
>

I don't know what you're looking at, but in the past month, there's been
56 unique contributors, with 411 changesets. I count 16 people employed
by distributions with 188 changesets.

> So without KVM Qemu would be a much, much smaller project. It would be similar
> to how it was 5 years ago.
>

I'm not saying that KVM isn't significant. I'm employed to work on QEMU
because of KVM.

I'm just saying that KVM users aren't 99% of the community and that we
can't neglect the rest of the community.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Anthony Liguori on
On 03/18/2010 11:13 AM, Ingo Molnar wrote:
> Good that you mention it, i think it's an excellent example.
> The suckage of kernel async IO is for similar reasons: there's an ugly package
> separation problem between the kernel and between glibc - and between the apps
> that would make use of it.
>
> ( With the separated libaio it was made worse: there were 3 libraries to
> work with, and even less applications that could make use of it ... )
>
> So IMO klibc is an arguably good idea - eventually hpa will get around posting
> it for upstream merging again. Then we could offer both new libraries much
> faster, and could offer things like comprehensive AIO used pervasively within
> existing APIs.
>

And why wouldn't the kernel developers produce posix-aio within klibc.

posix-aio is also a really terrible interface (although not as bad as
linux-aio).

The reason boils down to the fact that these interfaces are designed
without interacting with the consumers. Part of the reason for that is
the attitude of the community.

You approached this discussion with, "QEMU/KVM sucks, you should move
into the kernel because we're awesome and we'd fix everything in a heart
beat". That attitude does not result in any useful collaboration.

Had you started trying to understand what the problems that we face are
and whether there's anything that can be done in the kernel to improve
it, it would have been an entirely different discussion.

The sad thing is, QEMU is probably one of the most demanding free
software applications out there today with respect to performance. We
consume interfaces IO interfaces and things like large pages in a deeper
way than just about any application out there.

We've been trying for a long time to improve Linux interfaces for years
but we've not had many people in the kernel community be receptive.

We've failed to improve the userspace networking interfaces. Compare
Rusty's posting of vringfd to vhost-net. They are the same interface
except we tried to do something more generally useful with vringfd and
it was shot down because it was "yet another kernel/userspace data
transfer interface". Unfortunately, we're learning that if we claim
something is virtualization specific, we avoid a lot of the kernel
bureaucracy. My concern is that over time, we'll have more things like
vhost and that's bad for everyone.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* drepper(a)gmail.com <drepper(a)gmail.com> wrote:

> On Thu, Mar 18, 2010 at 09:13, Ingo Molnar <mingo(a)elte.hu> wrote:
>
> > The suckage of kernel async IO is for similar reasons: there's an ugly
> > package separation problem between the kernel and between glibc
>
> Bollocks. glibc would use (and is using) everything the kernel provides.

I didnt say it's glibc's fault - if then it's more of the kernel's fault as
most of the complexity is on that side. I said it's due to the fundamental
distance between the app that makes use of it, the library and the kernel, and
the resulting difficulties in getting a combined solution out.

None of the parties really feels it to be their own thing.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* drepper(a)gmail.com <drepper(a)gmail.com> wrote:

> On Thu, Mar 18, 2010 at 12:15, Ingo Molnar <mingo(a)elte.hu> wrote:
>
> > I didnt say it's glibc's fault - if then it's more of the kernel's fault
> > as most of the complexity is on that side. I said it's due to the
> > fundamental distance between the app that makes use of it, the library and
> > the kernel, and the resulting difficulties in getting a combined solution
> > out.
>
> This is wrong, too. Once there is a kernel patch that has a reasonable
> syscall interface it's easy enough to hack up the glibc side. [...]

Where 'reasonable' is defined by you, right?

As i said, the KAIO situation is mostly the kernel's fault, but you are a
pretty passive and unhelpful entity in this matter too, arent you?

For example, just to state the obvious: libaio has been written 8 years ago in
2002 and has been used in apps early on. Why arent those kernel APIs, while
not being a full/complete solution, supported by glibc, and wrapped to
pthreads based emulation on kernels that dont support it?

I'm not talking about a 100% full POSIX AIO implementation (the kernel side is
not complete enough for that) - i'm just talking about the APIs that libaio
and the kernel supports today.

Why isnt glibc itself making use of those AIO capabilities internally? (even
if it's not possible to support full POSIX AIO)

I checked today's glibc repo, and there's no sign of any of that:

glibc> git grep io_submit
glibc> git grep aio_context_t
glibc>

Zero, nil, nada.

Getting _something_ into glibc would certainly help move the situation. Glibc
itself using existing KAIO bits internally would help too and dont tell me
it's 100% unusable: it's certainly capable enough to run DB servers. glibc
using it would create further demand (and pressure, and incentives) for
improvements.

There were even glibc patches created by Ben LaHaise for some of these bits,
IIRC.

One can certainly make the argument that glibc not using _any_ of the current
KAIO capabilities harms its further development.

> [...] Don't try to artificially find an argument to support your thesis.

Charming argumentation style, i really missed it.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Frank Ch. Eigler <fche(a)redhat.com> wrote:

> Frederic Weisbecker <fweisbec(a)gmail.com> writes:
>
> > [...] It is actually because both kernel and user side are sync in this
> > scheme. [...]
>
> This argues that co-evolution of an interface is easiest on the developers
> if they own both sides of that interface. No quarrel.

Correct, that's a big advantage.

> This does not argue that that the preservation of a stable ABI is best done
> this way. If anything, it makes it too easy to change both the provider and
> the preferred user of the interface without noticing unintentional breakage
> to forlorn out-of-your-tree clients.

Your concern is valid, and this issue has been raised in the past as one of
the main counter-arguments against tools/perf/. (there was a big flamewar
about it on lkml when it was introduced)

Our roughly 1 year experience with perf is that, somewhat pradoxially, this
scheme not only works as well as classic ABI schemes but actually brings a
_better_ ABI than the classic "let the kernel define an ABI" single-sided
solution.

I know the difference first hand, i've written various syscalls ABIs in the
past 10+ years before perf and know how they interact with their user space
counterparts.

Why did it work out better with tools/perf/? It turns out that there's an
immediate, direct, actionable test feedback effect on the ABI, and much closer
relation to the ABI. Typically the same developer implements the kernel bits
and the user-space bits (because it's so easy to do co-development), so the
ABI aspects are ingrained in the developer much more deeply. Once you see the
kind of havoc ABI breakage can cause during development you avoid it in the
future.

So developers find that a good, stable ABI helps development. It turns out
that developers dont actually _want_ to break the ABI and are careful about it
- and having the app next to the kernel ABI and co-developing it makes it sure
there's never any true mismatch.

Also, we can do ABI improvements at a far higher rate than any other kernel
subsystem. I checked the git logs, we've done over three dozen ABI extensions
since the first version, and all were forwards _and_ backwards compatible.

A higher rate of change gives developers more experience and lets them do a
better ABI, and makes them more ABI-conscious. I think if all kernel ABIs had
such a healthy rate of change we'd fill in all the missing kernel features
very quickly.

With detached packages ABI features are often done by a kernel developer (who
is familar with the kernel subsystem in question) and a separate user-space
developer (who is familar with the user-space project in question), and the
ABI consciousness is less strong.

So you are right that there's a danger of accidental ABI breakage, but it's
not an issue in practice. There are external apps making use of the ABI as
well, not just tools/perf/.

In a more abstract sense this is kind of a classic case of game theory: that a
assume-trust strategy pays off in the long run.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/