From: Avi Kivity on
On 06/25/2010 03:06 PM, Xiao Guangrong wrote:
> Consider using small page to fit guest's large page mapping:
>
> If the mapping is writable but the dirty flag is not set, we will find
> the read-only direct sp and setup the mapping, then if the write #PF
> occur, we will mark this mapping writable in the read-only direct sp,
> now, other real read-only mapping will happily write it without #PF.
>
> It may hurt guest's COW
>
> Fixed by re-install the mapping when write #PF occur.
>
> Signed-off-by: Xiao Guangrong<xiaoguangrong(a)cn.fujitsu.com>
> ---
> arch/x86/kvm/mmu.c | 3 ++-
> arch/x86/kvm/paging_tmpl.h | 18 ++++++++++++++++++
> 2 files changed, 20 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 556a798..0412ba4 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -153,7 +153,8 @@ module_param(oos_shadow, bool, 0644);
> #define CREATE_TRACE_POINTS
> #include "mmutrace.h"
>
> -#define SPTE_HOST_WRITEABLE (1ULL<< PT_FIRST_AVAIL_BITS_SHIFT)
> +#define SPTE_HOST_WRITEABLE (1ULL<< PT_FIRST_AVAIL_BITS_SHIFT)
> +#define SPTE_NO_DIRTY (2ULL<< PT_FIRST_AVAIL_BITS_SHIFT)
>
> #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
>
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index e46eb8a..fdba751 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -325,6 +325,20 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
> break;
> }
>
> + if (*sptep& SPTE_NO_DIRTY) {
> + struct kvm_mmu_page *child;
> +
> + WARN_ON(level != gw->level);
> + WARN_ON(!is_shadow_present_pte(*sptep));
> + if (dirty) {
> + child = page_header(*sptep&
> + PT64_BASE_ADDR_MASK);
> + mmu_page_remove_parent_pte(child, sptep);
> + __set_spte(sptep, shadow_trap_nonpresent_pte);
> + kvm_flush_remote_tlbs(vcpu->kvm);
> + }
> + }
> +
>

Instead of adding a new bit, can you encode the protection in the direct
sp's access bits? So we'll have one sp for read-only or
writeable-but-not-dirty small pages, and another sp for
writeable-and-dirty small pages.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Xiao Guangrong on


Avi Kivity wrote:

>
> Instead of adding a new bit, can you encode the protection in the direct
> sp's access bits? So we'll have one sp for read-only or
> writeable-but-not-dirty small pages, and another sp for
> writeable-and-dirty small pages.
>

It looks like it can't solve all problems, it fix the access corrupted,
but will cause D bit losed:

mapping A and mapping B both are writable-and-dirty, when mapping A write
#PF occurs, the mapping is writable, then we can't set B's D bit anymore.

Anyway, i think we should re-intall the mapping when the state is changed. :-(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on
On 06/28/2010 01:02 PM, Xiao Guangrong wrote:
>
> Avi Kivity wrote:
>
>
>> Instead of adding a new bit, can you encode the protection in the direct
>> sp's access bits? So we'll have one sp for read-only or
>> writeable-but-not-dirty small pages, and another sp for
>> writeable-and-dirty small pages.
>>
>>
> It looks like it can't solve all problems, it fix the access corrupted,
> but will cause D bit losed:
>
> mapping A and mapping B both are writable-and-dirty, when mapping A write
> #PF occurs, the mapping is writable, then we can't set B's D bit anymore.
>

If B is writeable-and-dirty, then it's D bit is already set, and we
don't need to do anything.

If B is writeable-and-clean, then we'll have an spte pointing to a
read-only sp, so we'll get a write fault on access and an opportunity to
set the D bit.

> Anyway, i think we should re-intall the mapping when the state is changed. :-(
>

When the gpte is changed from read-only to writeable or from clean to
dirty, we need to update the spte, yes. But that's true for other sptes
as well, not just large gptes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Xiao Guangrong on


Avi Kivity wrote:
> On 06/28/2010 01:02 PM, Xiao Guangrong wrote:
>>
>> Avi Kivity wrote:
>>
>>
>>> Instead of adding a new bit, can you encode the protection in the direct
>>> sp's access bits? So we'll have one sp for read-only or
>>> writeable-but-not-dirty small pages, and another sp for
>>> writeable-and-dirty small pages.
>>>
>>>
>> It looks like it can't solve all problems, it fix the access corrupted,
>> but will cause D bit losed:
>>
>> mapping A and mapping B both are writable-and-dirty, when mapping A write
>> #PF occurs, the mapping is writable, then we can't set B's D bit anymore.
>>
>
> If B is writeable-and-dirty, then it's D bit is already set, and we
> don't need to do anything.
>
> If B is writeable-and-clean, then we'll have an spte pointing to a
> read-only sp, so we'll get a write fault on access and an opportunity to
> set the D bit.
>

Sorry, a typo in my reply, i mean mapping A and B both are writable-and-clean,
while A occurs write-#PF, we should change A's spte map to writable sp, if we
only update the spte in writable-and-clean sp(form readonly to writable), the B's
D bit will miss set.

>> Anyway, i think we should re-intall the mapping when the state is
>> changed. :-(
>>
>
> When the gpte is changed from read-only to writeable or from clean to
> dirty, we need to update the spte, yes. But that's true for other sptes
> as well, not just large gptes.
>

I think the indirect sp is not hurt by this bug since for the indirect sp, the access
just form its upper-level, and the D bit is only in the last level, when we change the
pte's access, is not affect its sp's access.

But for direct sp, the sp's access is form all level. and different mapping that not share
the last mapping page will have the same last sp.





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on
On 06/29/2010 04:17 AM, Xiao Guangrong wrote:
>
>> If B is writeable-and-dirty, then it's D bit is already set, and we
>> don't need to do anything.
>>
>> If B is writeable-and-clean, then we'll have an spte pointing to a
>> read-only sp, so we'll get a write fault on access and an opportunity to
>> set the D bit.
>>
>>
> Sorry, a typo in my reply, i mean mapping A and B both are writable-and-clean,
> while A occurs write-#PF, we should change A's spte map to writable sp, if we
> only update the spte in writable-and-clean sp(form readonly to writable), the B's
> D bit will miss set.
>

Right.

We need to update something to notice this:

- FNAME(fetch)() to replace the spte
- FNAME(walk_addr)() to invalidate the spte

I think FNAME(walk_addr) is the right place, we're updating the gpte, so
we should update the spte at the same time, just like a guest write.
But that will be expensive (there could be many sptes, so we have to
call kvm_mmu_pte_write()), so perhaps FNAME(fetch) is easier.

We have now

if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
continue;

So we need to add a check, if sp->role.access doesn't match pt_access &
pte_access, we need to get a new sp with the correct access (can only
change read->write).

>>> Anyway, i think we should re-intall the mapping when the state is
>>> changed. :-(
>>>
>>>
>> When the gpte is changed from read-only to writeable or from clean to
>> dirty, we need to update the spte, yes. But that's true for other sptes
>> as well, not just large gptes.
>>
>>
> I think the indirect sp is not hurt by this bug since for the indirect sp, the access
> just form its upper-level, and the D bit is only in the last level, when we change the
> pte's access, is not affect its sp's access.
>
> But for direct sp, the sp's access is form all level. and different mapping that not share
> the last mapping page will have the same last sp.
>

Right.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/