From: Charles Oliver Nutter on
On Mon, Jan 25, 2010 at 9:56 PM, Aaron Patterson
<aaron(a)tenderlovemaking.com> wrote:
> On Tue, Jan 26, 2010 at 04:52:13AM +0900, Chuck Remes wrote:
>> For one, Rubinius does not support the entire MRI C API nor will it ever Extensions that directly access memory structures are not supported. FFI is a better long-term choice for Rubinius.
>
> It doesn't need to support the entire API.  It supports enough of the C
> API to get nokogiri running, and believe me, we use a *lot* of the C
> API.  Why pay the FFI speed penalty when you can write C code that works
> cross implementation?

I'd like to understand how much of a speed penalty we actually pay
using FFI. It's worth pointing out that Rubinius has had to implement
some pretty nasty (as in tricky, difficult, and potentially a lot
slower than MRI's "raw" memory access) logic in order to support their
current subset of the MRI C API. They've chosen to try to support APIs
I would never dream of like RARRAY and other direct pointer access,
and in many cases they have to do it by copying around a lot more data
than MRI does. And that's life, sucky though it is, if you want to
support enough of the C API to run real-world extensions right now.
I'm sure Evan can describe how they handle those APIs better than I
can.

I do believe there's a subset of APIs that could be supported across
implementations without a major perf penalty if these points (and
probably others) were addressed:

* No direct access to object internals without explicitly copying in
and out yourself (i.e. you have to opt-in to the copying penalty)
* Additional APIs to make object access and manipulation easier (like
APIs for copying or doing bulk writes into array contents)
* Additional APIs for lifecycle management (hard and weak references
and functions for acquiring and releasing such references)

I'd love to hear from the other implementers about what they think
they'd be able to support of the C API.

The example set by JNI might help us figure out the safe subset and
enhancements needed. JNI, for all its warts, does a very good job of
isolating native code from JVM internals. You can't get direct
pointers to anything, you need to manage reference lifecycles
appropriately, you need to copy data in and out yourself if the object
accessor functions don't do what you need. It's not a pretty API,
granted, but in the 15 years the JVM has been mainstream that API has
changed very little.

> Even if FFI were the cross implementation messiah it's supposed to be,
> our FFI applications will *still* not work on GAE or Android.  Rubinius
> has already proved that you can implement a *subset* of the C API and
> get complex extensions to work.  Why can't we run with that?  I think it
> would be a better long term solution.  We would get the same "cross
> implementation" behavior as FFI, but not have to pay FFI's runtime
> conversion penalties.  We also get the ability to do compile time checks
> of C library functionality (i.e. check for #defines, function existence, etc).

I'll say it again: The Rubinius folks have done an admirable job of
implementing the large subset that they do. And given the target
audience for Rubinius, they may not have any other choice. But there's
some pretty large tradeoffs required to get that subset
working...tradeoffs that in some cases might make binding to the C API
a lot slower than using something like FFI. It has also required a
herculean effort to support that subset given the (good) design
choices Evan made (like having accurate GC that moves objects around
in memory). Expecting all implementations to put in that effort is
pretty close to absurdity; consider that JRuby only recently really
started to feel "compatible" enough that we don't spend every day, all
day fixing Ruby core class bugs.

JRuby has had a continuous stream of about 3.5 bug reports per day,
every day, for over three years...and out of the 4500-some filed bugs,
we manage to keep our unresolved count around 500. That has required
fulltime effort from at least two of us (Tom Enebo and I) and
part-time help from dozens of contributors. The benefits of supporting
a C API subset just don't warrant the effort we would personally have
to put in and the sacrifices that would result. We need help. :(

- Charlie