From: Nakajima, Jun on
Ian Campbell wrote on Tue, 9 Feb 2010 at 06:02:04:

> On Tue, 2010-02-09 at 12:46 +0000, Sheng Yang wrote:
>> On Tuesday 09 February 2010 19:52:56 Ian Campbell wrote:
>>> On Mon, 2010-02-08 at 08:05 +0000, Sheng Yang wrote:
>>>> + if (xen_hvm_pv_evtchn_enabled()) {
>>>> + if (enable_hvm_pv(HVM_PV_EVTCHN))
>>>> + return -EINVAL;
>>>> +[...]
>>>> + callback_via =
>>>> HVM_CALLBACK_VECTOR(X86_PLATFORM_IPI_VECTOR); +
>>>> set_callback_via(callback_via);
>>>> +
>>>> + x86_platform_ipi_callback =
> do_hvm_pv_evtchn_intr;
>>>
>>> Why this indirection via X86_PLATFORM_IPI_VECTOR?
>>>
>>> Apart from that why not use CALLBACKOP_register subop
>>> CALLBACKTYPE_event pointing to xen_hypervisor_callback the same as a
>>> full PV guest?
>>>
>>> This would remove all the evtchn related code from HVMOP_enable_pv
>>> which I think should be eventually unnecessary as an independent
>>> hypercall since all HVM guests should simply be PV capable by default
>>> -- the hypervisor only needs to track if the guest has made use of
>>> specific PV functionality, not the umbrella "is PV" state.
>> The reason is the bounce frame buffer implemented by PV guest to
>> inject a event is too complex here... Basically you need to setup a
>> stack like hardware would do, and return to the certain guest CS:IP to
>> handle this. And you need to take care of every case, e.g. guest in the
>> ring0 or ring3, guest in the interrupt context or not, and the
>> recursion of the handler, and so on.
>
> The code for all this already exists on both the hypervisor and guest
> side in order to support PV guests, would it not just be a case of
> wiring it up for this case as well?

The code is not so useful for HVM guests. The current PV code uses the ring transition which maintains the processor state in the stack, to switch between the hypervisor and the guest, but HVM VM entry/exit does not use the stack at all. To implement an asynchronous event, i.e. callback handler for HVM, the simplest (and reliable) way is to use the architectural event (i.e. IDT-based). Otherwise, we need to modify various VMCS/VMCB fields (e.g. selectors, segments, stacks, etc.) depending on where the last VM happened using the OS-specific knowledge.

Having said that, the interface and implementation are different. I think we can use the same/similar code that registers the callback handler, by hiding such HVM-specific code from the common code path.

>
>> Hardware can easily handle all these elegantly, you just need to inject
>> a vector through hardware provided method. That's much easily and
>> elegant. Take the advantage of hardware is still a part of our target.
>> :)
> I thought one of the points of this patchset was that there was
> overhead associated with the hardware event injection mechanisms which
> you wanted to avoid?

We need to execute VM entry anyway to call back a handler in the guest kernel. Bypassing IDT vectoring does not help.

Jun
___
Intel Open Source Technology Center