From: Eric Anholt on
On Fri, 5 Mar 2010 12:21:29 +0000, Alan Cox <alan(a)lxorguk.ukuu.org.uk> wrote:
> Serious discussion point perhaps should be: is the libdrm so close to the
> kernel it ought to be in the same git tree ? Alternatively does it need
> to be easier to have multiple Nouveau libdrms autoselected according to
> the kernel side versioning. ELF library versioning is not rocket science
> and both the old and new libraries exist and can be installed so all the
> bits are present except for the wrapper to load the right sublibrary yes ?

That *would* make versioning impossible.

To make the difficulty of improving ABI at the moment concrete, I just
got done merging the patches for execbuf2 in userland and enabling i915
texture tiling. This was a 3% performance win in one test I was looking
at, and 1% in another -- less than hoped, but important nonetheless
(there are other cases that should see 30% or so wins hopefully). The
patches got written back in July, and revved several times as they broke
various combinations of compatibility. At the point that I merged eb2
to the kernel for 2.6.33, it wasn't *really* tested -- the userland side
was broken all to hell it looked like, but at least it wasn't regressing
execbuf1 any more, right? I spent this week getting userland working,
including a new libdrm release (already obsolete because a bug in the
libdrm violated what the ABI between libdrm <-> msea was supposed to
be). So overall, I'd say that we spent about a month of developer time
at least between jbarnes, ickle, and myself, on extending the execbuf
interface to add a flag saying "dear kernel, please don't do this bit of
work on this buffer, because I don't need it and it makes things slow."

This is not that bad for Intel folks. We're paid to hack on it, and can
justify spending ridiculous amounts of time for small wins. I actually
enjoy this.

Right now all the userland -- whether it's Mesa, xf86-video-intel,
libva, cairo-drm, stand-alone DRM testcases, etc., gets to move to the
new libdrm API, declare its dependency in PKG_CHECK_MODULES, and hand
that new flag to libdrm as if the kernel supported the new interface.
Inside libdrm, it looks at the kernel version and uses the new interface
or old as appropriate. The ugly versioning stuff stays in one
easy-to-review 5kloc component, and the complicated 50kloc driver
components get to pretend they have a fancy new kernel.

But if libdrm's in the kernel, all those userland components no longer
get to rely on the version of libdrm, because distros will ship
whatever's with the kernel they're using and our userland does have to
work on (relatively recent) distros. Each of those userland components
would have to grow a compatibility layer to work with whatever kernel
libdrm is available, passing the flag in the new API or using the old
API. Userland would even buggier for having to replicate all that logic
everywhere, and we probably wouldn't have execbuf2 landed yet.

Well, OK. What I'd really do instead is make the kernel libdrm be a
thin ioctl wrapper, and build a librealdrm that does what libdrm does
today. But I don't think that's what you were suggesting.
From: Corbin Simpson on
On Fri, Mar 5, 2010 at 8:46 AM, <tytso(a)mit.edu> wrote:
> On Fri, Mar 05, 2010 at 06:04:34PM +0200, Daniel Stone wrote:
>>
>> So you're saying that there's no way to develop any reasonable body of
>> code for the Linux kernel without committing to keeping your ABI
>> absolutely rock-solid stable for eternity, no exceptions, ever? Cool,
>> that worked really well for Xlib.
>
> No, that's not what people are saying. �What people are saying is,
> "avoid flag days". �Deprecate things over a 6-12 month time period.
> We have lots of really good interfaces for doing that.
>
> You say you don't want to do that? �Then keep it to your self and
> don't get it dropped into popular distributions like Fedora or Ubuntu.
> You want a larger pool of testers? �Great! �The price you need to pay
> for that is to be able to do some kind of of ABI versioning so that
> you don't have "drop dead flag days".
>
> If you don't want to be a good citizen, then prepared to have people
> call you out for, well, not being a good OSS citizen.

I was trying my hardest to not say anything, but...

Nouveau isn't an official Xorg project. It hasn't been added to the
jhbuild list for auto-checkout, it doesn't get tinderbox time
(admittedly a function of being part of the jhbuild) and I don't think
it's on the katamari list, so it's never been shipped as part of an
Xorg release. It is only in mainline under the staging rules; drivers
come and go from staging under fairly lax rules.

Fedora ships this stuff because they're actively developing it and
enjoy deploying half-broken things to users in the vain hope that it
magically won't break. I can't count the number of kittens eaten by
Fedora systems I've used. (It is kind of sad that Fedora's still the
best distro about not deploying broken stuff but still remaining
up-to-date.) Tellingly, it doesn't look like this interface change has
been deployed to stable Fedora, just Rawhide.

The Ubuntu people don't talk to us as much as they should. Seeing how
badly they biffed Radeon and Intel KMS deployment, it's hard for me to
believe that deploying Nouveau went smoothly. I don't have much more
personal experience; my work computer has an HD 3450 in it now instead
of the old GeForce, and that's my only Ubuntu box.

If distros want to run weird experiments on their users, let them!
Sure, sometimes bad things happen, but sometimes good things happen
too. ConsoleKit, DeviceKit, HAL, NetworkManager, KMS, yaird, dracut,
Plymouth, the list goes on and on.

If the problem here is actually that a distro is deploying a staging
driver and picking up the pieces themselves, then just say it. This
whole thing with flag days, deprecation, interface changes, etc.
hinges on the idea that the code being deprecated was stable, usable,
and widely deployed, but it wasn't and isn't.

That said... Code probably is moving too fast inside nouveau. There is
a bit of a wall to go through to get new patches upstream, which one
would hope would inspire some developer restraint. intel and radeon
both still have most (if not all) of the legacy code needed by ancient
userspaces, and both DDX drivers are doing multiple-branch releases to
keep old userspace interfaces alive for people unable to update their
kernels. It might be useful for the nouveau guys to really seriously
consider code before it leaves their trees and enters mainline;
writing code that you won't commit to is quite lame for the obvious
reasons, but also for some unobvious reasons, e.g. it makes you look
like you don't actually know what you're doing and would rather just
keep reinventing wheels without justifying and testing your design
choices. (This is also why I was not exactly pleased with the
suggestion of retooling all of the r600 userspace over a change to the
CS system; we just spent the better part of a year moving everything
over to CS!)

~ C.

--
Only fools are easily impressed by what is only
barely beyond their reach. ~ Unknown

Corbin Simpson
<MostAwesomeDude(a)gmail.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Felipe Contreras on
On Fri, Mar 5, 2010 at 2:41 AM, Linus Torvalds
<torvalds(a)linux-foundation.org> wrote:
> On Fri, 5 Mar 2010, Ben Skeggs wrote:
>> The F13 packages *will* work, so long as you're not bisecting back and
>> forth.
>
> How do I install just the F13 libdrm thing, without changing everything
> else? I'm willing to try. We can make it part of the 2.6.34 release notes.
>
> And if we end up having people bisecting back and forth, I will hate that
> f*cking nouveau driver even more.

I believe Dave has already explained this to you, but nobody has
mentioned it here.

What you are supposed to do is install the new nouveau driver, which
requires a new libdrm. So, just compile both libdrm, and nouveau, to a
sandbox, say /opt/new-nouveau, and then in /etc/X11/xorg.conf:

Section "Files"
ModulePath "/opt/new-nouveau/lib/xorg/modules"
ModulePath "/usr/lib/xorg/modules"
EndSection

That should do it. No frankensteinian F13 packaging stuff, and no mess
with the system's /usr/lib/.

Cheers.

--
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Luca Barbieri on
>�So overall, I'd say that we spent about a month of developer time
> at least between jbarnes, ickle, and myself, on extending the execbuf
> interface to add a flag saying "dear kernel, please don't do this bit of
> work on this buffer, because I don't need it and it makes things slow."

Perhaps then, we should break ABI compatibility _more_ often to speed
up development, but also have awesome mechanisms to make it painless
for the user.

Such as:
1. Automatic side by side userspace installation, as Linus proposed
2. Kernel "make install" refusing to proceed if it finds that
userspace is not updated, and giving instructions on how to update
userspace
3. Distributions packaging the new ABI X/Mesa drivers and libdrm even
for stable distributions
4. Kernel "make install" offering to automatically install said
distribution packages if it detects a supported distribution
5. Ability to drop new versions of drivers/gpu/drm in an older kernel
tree and have it compile (within reasonable limits)

In particular, for people with (slightly) old kernels, it should be
much easier to make updated DRM trees still work with older kernels,
than attempting to make updated userspace work with older kernel
modules.
This also actually gives them the benefits of the new code.

And for people with really old kernels, it's not different from any
other hardware device, which requires a kernel upgrade to have better
support.

Then, for instance, Linus would just have seen the following upon
running make install:
This kernel requires the Nouveau userspace version 0.0.16, which you
don't have installed.
Fedora 12 has been detected.
Invoke yum to install the <rpmnames> RPMs required for it? [y/n]
_or_
Ubuntu 9.10 has been detected
Invoke apt-get to install the <debnames> packages required for it? [y/n]

If the user says no, or the distribution is unknown, instructions on
how to download and compile the source would be presented.

Once you setup this system, you can freely break the ABI with no
significant user discomfort by just raising the version number.
This also potentially applies to stuff other than DRM (e.g. perf, kvm,
iptables, udev, filesystem-specific tools/APIs, various device
configuration systems, and so on).

The really stable APIs/ABIs should not be the (undocumented) kernel
ones, but Xlib and OpenGL, which actually have formal specifications.
Perhaps eventually Gallium could join them as a stable API closer to
the hardware, but that's a long way off.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Felipe Contreras on
On Fri, Mar 5, 2010 at 6:19 PM, Linus Torvalds
<torvalds(a)linux-foundation.org> wrote:
> The thing I objected to, in the VERY BEGINNING in this thread, i the fact
> that the thing was done in such a way that it's basically impossible to
> support the old/new ABI at all!

[...]

> The way this was done, it's apparently basically impossible for the Fedora
> people to push out packaged that support both the old and the new kernel.

The reason why the nouveau people wanted to leave the driver in
staging is because they wanted to leave open the option of reshuffling
the API. The Fedora guys integrated this stuff on their own risk, and
linux (because of your pressure), also did. At no point in time
nouveau guys agreed to freeze the API.

Now they have done precisely what was expected; there's no surprise there.

The surprise seems to be that you thought (for some reason), that
reshuffling the API wouldn't break the old (or current in F12)
user-space code. Now, how exactly do you think that could have been
achieved? Even if you have both nouveau_drv-0.0.15.so, and
nouveau_drv-0.0.16.so... What piece of could would choose one rather
than the other? There has never been such a piece of code.

If there was no compatibility code for API re-shuffling, and API
re-shuffling was expected, the resulting breakage was doomed to
happen.

Finally, at least it's possible to compile the radeon driver without
support for DRM, so perhaps nouveau (and other drivers), should check
the kernel drm version at run-time instead, and fall-back to non-drm
mode when the version is not compatible. I think that's a sensible
approach, although that might require a considerable amount of code.
However, that's something to consider for the future, as your current
libdrm/nouveau is not prepared to handle the DRM API re-shuffle that
_must_ happen.

Cheers.

--
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/