KVM: MMU: fix direct sp's access corruptted [Kernel]

Prev: KVM: MMU: fix forgot to flush all vcpu's tlb
Next: [GIT pull] core fixes for 2.6.35

From: Xiao Guangrong on 29 Jun 2010 03:40

Avi Kivity wrote:
> On 06/29/2010 04:17 AM, Xiao Guangrong wrote:
>>
>>> If B is writeable-and-dirty, then it's D bit is already set, and we
>>> don't need to do anything.
>>>
>>> If B is writeable-and-clean, then we'll have an spte pointing to a
>>> read-only sp, so we'll get a write fault on access and an opportunity to
>>> set the D bit.
>>>
>>>
>> Sorry, a typo in my reply, i mean mapping A and B both are
>> writable-and-clean,
>> while A occurs write-#PF, we should change A's spte map to writable
>> sp, if we
>> only update the spte in writable-and-clean sp(form readonly to
>> writable), the B's
>> D bit will miss set.
>>
>
> Right.
>
> We need to update something to notice this:
>
> - FNAME(fetch)() to replace the spte
> - FNAME(walk_addr)() to invalidate the spte
>
> I think FNAME(walk_addr) is the right place, we're updating the gpte, so
> we should update the spte at the same time, just like a guest write.
> But that will be expensive (there could be many sptes, so we have to
> call kvm_mmu_pte_write()), so perhaps FNAME(fetch) is easier.
>

I agree.

> We have now
>
> if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
> continue;
>
> So we need to add a check, if sp->role.access doesn't match pt_access &
> pte_access, we need to get a new sp with the correct access (can only
> change read->write).
>

Umm, we should update the spte at the gw->level, so we need get the child
sp, and compare its access at this point, just like this:

if (level == gw->level && is_shadow_present_pte(*sptep)) {
child_sp = page_header(__pa(*sptep & PT64_BASE_ADDR_MASK));

if (child_sp->access != pt_access & pte_access & (diry ? 1 : ~ACC_WRITE_MASK )) {
/* Zap sptep */
......
}

}

So, why not use the new spte flag (SPTE_NO_DIRTY in my patch) to mark this spte then we can see
this spte whether need updated directly? i think it more simpler ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 29 Jun 2010 03:40

On 06/29/2010 10:06 AM, Avi Kivity wrote:
> On 06/29/2010 04:17 AM, Xiao Guangrong wrote:
>>
>>> If B is writeable-and-dirty, then it's D bit is already set, and we
>>> don't need to do anything.
>>>
>>> If B is writeable-and-clean, then we'll have an spte pointing to a
>>> read-only sp, so we'll get a write fault on access and an
>>> opportunity to
>>> set the D bit.
>>>
>> Sorry, a typo in my reply, i mean mapping A and B both are
>> writable-and-clean,
>> while A occurs write-#PF, we should change A's spte map to writable
>> sp, if we
>> only update the spte in writable-and-clean sp(form readonly to
>> writable), the B's
>> D bit will miss set.
>
> Right.
>
> We need to update something to notice this:
>
> - FNAME(fetch)() to replace the spte
> - FNAME(walk_addr)() to invalidate the spte
>
> I think FNAME(walk_addr) is the right place, we're updating the gpte,
> so we should update the spte at the same time, just like a guest
> write. But that will be expensive (there could be many sptes, so we
> have to call kvm_mmu_pte_write()), so perhaps FNAME(fetch) is easier.
>
> We have now
>
> if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
> continue;
>
> So we need to add a check, if sp->role.access doesn't match pt_access
> & pte_access, we need to get a new sp with the correct access (can
> only change read->write).

Note:

- modifying walk_addr() to call kvm_mmu_pte_write() is probably not so
bad. It's rare that a large pte walk sets the dirty bit, and it's
probably rare to share those large ptes. Still, I think the fetch()
change is better since it's more local.

- there was once talk that instead of folding pt_access and pte_access
together into the leaf sp->role.access, each sp level would have its own
access permissions. In this case we don't even have to get a new direct
sp, only change the PT_DIRECTORY_LEVEL spte to add write permissions
(all direct sp's would be writeable and permissions would be controlled
at their parent_pte level). Of course that's a much bigger change than
this bug fix.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Xiao Guangrong on 29 Jun 2010 03:50

Avi Kivity wrote:

>
> Note:
>
> - modifying walk_addr() to call kvm_mmu_pte_write() is probably not so
> bad. It's rare that a large pte walk sets the dirty bit, and it's
> probably rare to share those large ptes. Still, I think the fetch()
> change is better since it's more local.
>
> - there was once talk that instead of folding pt_access and pte_access
> together into the leaf sp->role.access, each sp level would have its own
> access permissions. In this case we don't even have to get a new direct
> sp, only change the PT_DIRECTORY_LEVEL spte to add write permissions
> (all direct sp's would be writeable and permissions would be controlled
> at their parent_pte level). Of course that's a much bigger change than
> this bug fix.
>

Yeah, i have considered this way, but it will change the shadow page's mapping
way: it control the access at the upper level, but in the current code, we allow
the upper level have the ALL_ACCESS and control the access right at the last level.
It will break many things, such as write-protected...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 29 Jun 2010 04:50

On 06/29/2010 10:35 AM, Xiao Guangrong wrote:
>
>> We have now
>>
>> if (is_shadow_present_pte(*sptep)&& !is_large_pte(*sptep))
>> continue;
>>
>> So we need to add a check, if sp->role.access doesn't match pt_access&
>> pte_access, we need to get a new sp with the correct access (can only
>> change read->write).
>>
>>
> Umm, we should update the spte at the gw->level, so we need get the child
> sp, and compare its access at this point, just like this:
>
> if (level == gw->level&& is_shadow_present_pte(*sptep)) {
> child_sp = page_header(__pa(*sptep& PT64_BASE_ADDR_MASK));
>
> if (child_sp->access != pt_access& pte_access& (diry ? 1 : ~ACC_WRITE_MASK )) {
> /* Zap sptep */
> ......
> }
>
> }
>
> So, why not use the new spte flag (SPTE_NO_DIRTY in my patch) to mark this spte then we can see
> this spte whether need updated directly? i think it more simpler ;-)
>

It's new state, and new state means more maintenance of that state and
the need to consider the state in all relevant code paths.

In terms of maintainability, changing walk_addr() is best, since it
maintains the tight invariant that PT_PAGE_DIRECTORY_LEVEL sptes are
always consistent with their sptes. Updating fetch() to allow for a
relaxed invariant (spte may be read-only while gpte is write-dirty) is
more complicated, but performs better. This is also consistent with
what we do with PT_PAGE_TABLE_LEVEL gptes/sptes and with unsync pages.

btw, how can the patch work?

>
> + if (level == gw->level&& !dirty&&
> + access& gw->pte_access& ACC_WRITE_MASK)
> + spte |= SPTE_NO_DIRTY;
> +
> spte = __pa(sp->spt)
> | PT_PRESENT_MASK | PT_ACCESSED_MASK
> | PT_WRITABLE_MASK | PT_USER_MASK;
>

spte is immediately overwritten by the following assignment.

However, the other half of the patch can be adapted:

>
> + if (*sptep& SPTE_NO_DIRTY) {
> + struct kvm_mmu_page *child;
> +
> + WARN_ON(level != gw->level);
> + WARN_ON(!is_shadow_present_pte(*sptep));
> + if (dirty) {
> + child = page_header(*sptep&
> + PT64_BASE_ADDR_MASK);
> + mmu_page_remove_parent_pte(child, sptep);
> + __set_spte(sptep, shadow_trap_nonpresent_pte);
> + kvm_flush_remote_tlbs(vcpu->kvm);
> + }
> + }
> +
> if (is_shadow_present_pte(*sptep)&& !is_large_pte(*sptep))
> continue;
>

Simply replace (*spte & SPTE_NO_DIRTY) with a condition that checks
whether sp->access is consistent with gw->pt(e)_access.

Can you write a test case for qemu-kvm.git/kvm/test that demonstrates
the problem and the fix? It will help ensure we don't regress in this area.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Avi Kivity on 29 Jun 2010 05:00

On 06/29/2010 10:45 AM, Xiao Guangrong wrote:
>
>> - there was once talk that instead of folding pt_access and pte_access
>> together into the leaf sp->role.access, each sp level would have its own
>> access permissions. In this case we don't even have to get a new direct
>> sp, only change the PT_DIRECTORY_LEVEL spte to add write permissions
>> (all direct sp's would be writeable and permissions would be controlled
>> at their parent_pte level). Of course that's a much bigger change than
>> this bug fix.
>>
>>
> Yeah, i have considered this way, but it will change the shadow page's mapping
> way: it control the access at the upper level, but in the current code, we allow
> the upper level have the ALL_ACCESS and control the access right at the last level.
> It will break many things, such as write-protected...
>

spte's access bits have dual purpose, both to map guest protection and
for host protection (like for shadowed pages, or ksm pages). So the
last level sptes still need to consider host write protection.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: KVM: MMU: fix forgot to flush all vcpu's tlb
Next: [GIT pull] core fixes for 2.6.35