From: Xiao Guangrong on


Avi Kivity wrote:
> On 06/29/2010 10:35 AM, Xiao Guangrong wrote:
>>
>>> We have now
>>>
>>> if (is_shadow_present_pte(*sptep)&& !is_large_pte(*sptep))
>>> continue;
>>>
>>> So we need to add a check, if sp->role.access doesn't match pt_access&
>>> pte_access, we need to get a new sp with the correct access (can only
>>> change read->write).
>>>
>>>
>> Umm, we should update the spte at the gw->level, so we need get the child
>> sp, and compare its access at this point, just like this:
>>
>> if (level == gw->level&& is_shadow_present_pte(*sptep)) {
>> child_sp = page_header(__pa(*sptep& PT64_BASE_ADDR_MASK));
>>
>> if (child_sp->access != pt_access& pte_access& (diry ? 1 :
>> ~ACC_WRITE_MASK )) {
>> /* Zap sptep */
>> ......
>> }
>>
>> }
>>
>> So, why not use the new spte flag (SPTE_NO_DIRTY in my patch) to mark
>> this spte then we can see
>> this spte whether need updated directly? i think it more simpler ;-)
>>
>
> It's new state, and new state means more maintenance of that state and
> the need to consider the state in all relevant code paths.
>
> In terms of maintainability, changing walk_addr() is best, since it
> maintains the tight invariant that PT_PAGE_DIRECTORY_LEVEL sptes are
> always consistent with their sptes. Updating fetch() to allow for a
> relaxed invariant (spte may be read-only while gpte is write-dirty) is
> more complicated, but performs better. This is also consistent with
> what we do with PT_PAGE_TABLE_LEVEL gptes/sptes and with unsync pages.
>

Maybe you are right, i just think is more quickly by using SPTE_NO_DIRTY flag
to judge whether need updated. I'll modify this patch as your suggestion.

> btw, how can the patch work?
>
>>
>> + if (level == gw->level&& !dirty&&
>> + access& gw->pte_access& ACC_WRITE_MASK)
>> + spte |= SPTE_NO_DIRTY;
>> +
>> spte = __pa(sp->spt)
>> | PT_PRESENT_MASK | PT_ACCESSED_MASK
>> | PT_WRITABLE_MASK | PT_USER_MASK;
>>
>
> spte is immediately overwritten by the following assignment.
>

Ah, sorry, i miss it, spte |= SPTE_NO_DIRTY should behind of following assignment.

> However, the other half of the patch can be adapted:
>
>>
>> + if (*sptep& SPTE_NO_DIRTY) {
>> + struct kvm_mmu_page *child;
>> +
>> + WARN_ON(level != gw->level);
>> + WARN_ON(!is_shadow_present_pte(*sptep));
>> + if (dirty) {
>> + child = page_header(*sptep&
>> + PT64_BASE_ADDR_MASK);
>> + mmu_page_remove_parent_pte(child, sptep);
>> + __set_spte(sptep, shadow_trap_nonpresent_pte);
>> + kvm_flush_remote_tlbs(vcpu->kvm);
>> + }
>> + }
>> +
>> if (is_shadow_present_pte(*sptep)&& !is_large_pte(*sptep))
>> continue;
>>
>
> Simply replace (*spte & SPTE_NO_DIRTY) with a condition that checks
> whether sp->access is consistent with gw->pt(e)_access.
>

If the guest mapping is writable and it !dirty, we mark SPTE_NO_DIRTY flag in
the spte, when the next #PF occurs, we just need check this flag and see whether
gpte's D bit is set, if it's true, we zap this spte and map to the correct sp.

> Can you write a test case for qemu-kvm.git/kvm/test that demonstrates
> the problem and the fix? It will help ensure we don't regress in this
> area.
>

OK, but allow me do it later :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Avi Kivity on
On 06/29/2010 12:04 PM, Xiao Guangrong wrote:
>
>> Simply replace (*spte& SPTE_NO_DIRTY) with a condition that checks
>> whether sp->access is consistent with gw->pt(e)_access.
>>
>>
> If the guest mapping is writable and it !dirty, we mark SPTE_NO_DIRTY flag in
> the spte, when the next #PF occurs, we just need check this flag and see whether
> gpte's D bit is set, if it's true, we zap this spte and map to the correct sp.
>

My point is, SPTE_NO_DIRTY is equivalent to an sp->role.access check
(the access check is a bit slower, but that shouldn't matter).


>> Can you write a test case for qemu-kvm.git/kvm/test that demonstrates
>> the problem and the fix? It will help ensure we don't regress in this
>> area.
>>
>>
> OK, but allow me do it later :-)
>
>

Sure, but please do it soon.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Xiao Guangrong on


Avi Kivity wrote:
> On 06/29/2010 12:04 PM, Xiao Guangrong wrote:
>>
>>> Simply replace (*spte& SPTE_NO_DIRTY) with a condition that checks
>>> whether sp->access is consistent with gw->pt(e)_access.
>>>
>>>
>> If the guest mapping is writable and it !dirty, we mark SPTE_NO_DIRTY
>> flag in
>> the spte, when the next #PF occurs, we just need check this flag and
>> see whether
>> gpte's D bit is set, if it's true, we zap this spte and map to the
>> correct sp.
>>
>
> My point is, SPTE_NO_DIRTY is equivalent to an sp->role.access check
> (the access check is a bit slower, but that shouldn't matter).
>

I see.

>
>>> Can you write a test case for qemu-kvm.git/kvm/test that demonstrates
>>> the problem and the fix? It will help ensure we don't regress in this
>>> area.
>>>
>>>
>> OK, but allow me do it later :-)
>>
>>
>
> Sure, but please do it soon.

Sure, i will do it as soon as possible.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Xiao Guangrong on


Avi Kivity wrote:
> On 06/29/2010 10:45 AM, Xiao Guangrong wrote:
>>
>>> - there was once talk that instead of folding pt_access and pte_access
>>> together into the leaf sp->role.access, each sp level would have its own
>>> access permissions. In this case we don't even have to get a new direct
>>> sp, only change the PT_DIRECTORY_LEVEL spte to add write permissions
>>> (all direct sp's would be writeable and permissions would be controlled
>>> at their parent_pte level). Of course that's a much bigger change than
>>> this bug fix.
>>>
>>>
>> Yeah, i have considered this way, but it will change the shadow page's
>> mapping
>> way: it control the access at the upper level, but in the current
>> code, we allow
>> the upper level have the ALL_ACCESS and control the access right at
>> the last level.
>> It will break many things, such as write-protected...
>>
>
> spte's access bits have dual purpose, both to map guest protection and
> for host protection (like for shadowed pages, or ksm pages). So the
> last level sptes still need to consider host write protection.
>

Yeah, i see your mean, thanks, :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Marcelo Tosatti on
On Wed, Jun 30, 2010 at 04:03:28PM +0800, Xiao Guangrong wrote:
> If the mapping is writable but the dirty flag is not set, we will find
> the read-only direct sp and setup the mapping, then if the write #PF
> occur, we will mark this mapping writable in the read-only direct sp,
> now, other real read-only mapping will happily write it without #PF.
>
> It may hurt guest's COW
>
> Fixed by re-install the mapping when write #PF occur.

Applied 1, 2 and 4, thanks.

> Signed-off-by: Xiao Guangrong <xiaoguangrong(a)cn.fujitsu.com>
> ---
> arch/x86/kvm/paging_tmpl.h | 28 ++++++++++++++++++++++++++--
> 1 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 28c8493..f28f09d 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -325,8 +325,32 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
> break;
> }
>
> - if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
> - continue;
> + if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
> + struct kvm_mmu_page *child;
> + unsigned direct_access;
> +
> + if (level != gw->level)
> + continue;

This will skip the check for the sp at level 1 when emulating 1GB pages
with 4k host pages (where there are direct sp's at level 2 and 1).
Should be > instead of !=.

> +
> + /*
> + * For the direct sp, if the guest pte's dirty bit
> + * changed form clean to dirty, it will corrupt the
> + * sp's access: allow writable in the read-only sp,
> + * so we should update the spte at this point to get
> + * a new sp with the correct access.
> + */
> + direct_access = gw->pt_access & gw->pte_access;
> + if (!is_dirty_gpte(gw->ptes[gw->level - 1]))
> + direct_access &= ~ACC_WRITE_MASK;
> +
> + child = page_header(*sptep & PT64_BASE_ADDR_MASK);
> + if (child->role.access == direct_access)
> + continue;
> +
> + mmu_page_remove_parent_pte(child, sptep);
> + __set_spte(sptep, shadow_trap_nonpresent_pte);
> + kvm_flush_remote_tlbs(vcpu->kvm);
> + }
>
> if (is_large_pte(*sptep)) {
> rmap_remove(vcpu->kvm, sptep);
> --
> 1.6.1.2
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/