KVM: MMU: fix forgot reserved bits check in speculative path [Kernel]

Prev: SNATed connections show as original ip in /proc/net/tcp
Next: KVM: MMU: introduce gfn_to_page_many_atomic() function

From: Avi Kivity on 11 Jul 2010 08:30

On 07/06/2010 01:44 PM, Xiao Guangrong wrote:
> In the speculative path, we should check guest pte's reserved bits just as
> the real processor does
>
> Reported-by: Marcelo Tosatti<mtosatti(a)redhat.com>
> Signed-off-by: Xiao Guangrong<xiaoguangrong(a)cn.fujitsu.com>
> ---
> arch/x86/kvm/mmu.c | 3 +++
> arch/x86/kvm/paging_tmpl.h | 3 ++-
> 2 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 104756b..3dcd55d 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2781,6 +2781,9 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> break;
> }
>
> + if (is_rsvd_bits_set(vcpu, gentry, PT_PAGE_TABLE_LEVEL))
> + gentry = 0;
> +
>

That only works if the gpte is for the same mode as the current vcpu mmu
mode. In some cases it is too strict (vcpu in pae mode writing a 32-bit
gpte), which is not too bad, in some cases it is too permissive (vcpu in
nonpae mode writing a pae gpte).

(once upon a time mixed modes were rare, only on OS setup, but with
nested virt they happen all the time).

> mmu_guess_page_from_pte_write(vcpu, gpa, gentry);
> spin_lock(&vcpu->kvm->mmu_lock);
> if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter)
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index dfb2720..19f0077 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -628,7 +628,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);
>
> if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa,&gpte,
> - sizeof(pt_element_t)))
> + sizeof(pt_element_t)) ||
> + is_rsvd_bits_set(vcpu, gpte, PT_PAGE_TABLE_LEVEL))
> return -EINVAL;
>

This is better done a few lines down where we check for
!is_present_gpte(), no?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Xiao Guangrong on 11 Jul 2010 22:50

Avi Kivity wrote:

>> + if (is_rsvd_bits_set(vcpu, gentry, PT_PAGE_TABLE_LEVEL))
>> + gentry = 0;
>> +
>>
>
> That only works if the gpte is for the same mode as the current vcpu mmu
> mode. In some cases it is too strict (vcpu in pae mode writing a 32-bit
> gpte), which is not too bad, in some cases it is too permissive (vcpu in
> nonpae mode writing a pae gpte).
>

Avi, thanks for your review.

Do you mean that the VM has many different mode vcpu? For example, both
nonpae vcpu and pae vcpu are running in one VM? I forgot to consider this
case.

> (once upon a time mixed modes were rare, only on OS setup, but with
> nested virt they happen all the time).

I'm afraid it's still has problem, it will cause access corruption:
1: if nonpae vcpu write pae gpte, it will miss NX bit
2: if pae vcpu write nonpae gpte, it will add NX bit that over gpte's width

How about only update the shadow page which has the same pae set with the written
vcpu? Just like this:

@@ -3000,6 +3000,10 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
while (npte--) {
entry = *spte;
mmu_pte_write_zap_pte(vcpu, sp, spte);
+
+ if (!!is_pae(vcpu) != sp->role.cr4_pae)
+ continue;
+
if (gentry)
mmu_pte_write_new_pte(vcpu, sp, spte, &gentry);

>
>> mmu_guess_page_from_pte_write(vcpu, gpa, gentry);
>> spin_lock(&vcpu->kvm->mmu_lock);
>> if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter)
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index dfb2720..19f0077 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -628,7 +628,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu,
>> struct kvm_mmu_page *sp,
>> pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);
>>
>> if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa,&gpte,
>> - sizeof(pt_element_t)))
>> + sizeof(pt_element_t)) ||
>> + is_rsvd_bits_set(vcpu, gpte, PT_PAGE_TABLE_LEVEL))
>> return -EINVAL;
>>
>
> This is better done a few lines down where we check for
> !is_present_gpte(), no?

Yeah, it's a better way, that will avoid zap whole shadow page if reserved bits set,
will fix it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 12 Jul 2010 09:20

On 07/12/2010 05:37 AM, Xiao Guangrong wrote:
>
>>> + if (is_rsvd_bits_set(vcpu, gentry, PT_PAGE_TABLE_LEVEL))
>>> + gentry = 0;
>>> +
>>>
>>>
>> That only works if the gpte is for the same mode as the current vcpu mmu
>> mode. In some cases it is too strict (vcpu in pae mode writing a 32-bit
>> gpte), which is not too bad, in some cases it is too permissive (vcpu in
>> nonpae mode writing a pae gpte).
>>
>>
> Avi, thanks for your review.
>
> Do you mean that the VM has many different mode vcpu? For example, both
> nonpae vcpu and pae vcpu are running in one VM? I forgot to consider this
> case.
>

Yes. This happens while the guest brings up other vcpus, and when using
nested virtualization.

>> (once upon a time mixed modes were rare, only on OS setup, but with
>> nested virt they happen all the time).
>>
> I'm afraid it's still has problem, it will cause access corruption:
> 1: if nonpae vcpu write pae gpte, it will miss NX bit
> 2: if pae vcpu write nonpae gpte, it will add NX bit that over gpte's width
>
> How about only update the shadow page which has the same pae set with the written
> vcpu? Just like this:
>
> @@ -3000,6 +3000,10 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
> while (npte--) {
> entry = *spte;
> mmu_pte_write_zap_pte(vcpu, sp, spte);
> +
> + if (!!is_pae(vcpu) != sp->role.cr4_pae)
> + continue;
> +
>

Not enough, one vcpu can have nx set while the other has it reset, etc.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Xiao Guangrong on 12 Jul 2010 22:10

Avi Kivity wrote:

>>
>> How about only update the shadow page which has the same pae set with
>> the written
>> vcpu? Just like this:
>>
>> @@ -3000,6 +3000,10 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu,
>> gpa_t gpa,
>> while (npte--) {
>> entry = *spte;
>> mmu_pte_write_zap_pte(vcpu, sp, spte);
>> +
>> + if (!!is_pae(vcpu) != sp->role.cr4_pae)
>> + continue;
>> +
>>
>
> Not enough, one vcpu can have nx set while the other has it reset, etc.
>

Yeah, so we also need check sp->role.nxe here

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Xiao Guangrong on 13 Jul 2010 21:20

Marcelo Tosatti wrote:
entry = *spte;
>> mmu_pte_write_zap_pte(vcpu, sp, spte);
>> +
>> + if (!!is_pae(vcpu) != sp->role.cr4_pae ||
>> + is_nx(vcpu) != sp->role.nxe)
>> + continue;
>> +
>
> This breaks remote_flush assignment below.

Ah, Oops, will fix
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3
Prev: SNATed connections show as original ip in /proc/net/tcp
Next: KVM: MMU: introduce gfn_to_page_many_atomic() function