From: Vladislav Bolkhovitin on
Boaz Harrosh, on 06/03/2010 08:09 PM wrote:
> [Topic]
> How to not let pages change while in IO
>
> [Abstract]
> As seen in a long thread on the fsdvel scsi mailing lists. Lots of
> people have headaches and sleep less nights because individual pages
> can change while in IO and/or DMA. Though each one as slightly different
> needs, the mechanics look to be the same.
>
> People that care:
> - Mirror and RAID people that need on disk consistency.
> - Network storage that wants data checksum.
> - DIF/DIX people

- Load balancing MPIO clusters, where out of order execution of
overlapping write requests for the changed pages can introduce a data
corruption, which makes using Linux with load balancing MPIO clusters
unsafe.

> - ...
>
> I for one know nothing of the subject but am a RAID person and would
> like a solution that does not force me to copy the complete data load.
>
> Please lets get all the VM VFS and drivers people in one room and see
> if we can have a Linux solution to this problem
>
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on
On Thu 03-06-10 19:09:52, Boaz Harrosh wrote:
> [Topic]
> How to not let pages change while in IO
>
> [Abstract]
> As seen in a long thread on the fsdvel scsi mailing lists. Lots of
> people have headaches and sleep less nights because individual pages
> can change while in IO and/or DMA. Though each one as slightly different
> needs, the mechanics look to be the same.
Hmm, I don't think it's really about "how to not let pages change" - that
is doable by using wait_on_page_writeback() in ->page_mkwrite and
->write_begin. I think the discussion is more about whether we should do it
or whether we should rechecksum and resubmit IO in case of checksum failure
as Nick proposed...

Honza
> People that care:
> - Mirror and RAID people that need on disk consistency.
> - Network storage that wants data checksum.
> - DIF/DIX people
> - ...
>
> I for one know nothing of the subject but am a RAID person and would
> like a solution that does not force me to copy the complete data load.
>
> Please lets get all the VM VFS and drivers people in one room and see
> if we can have a Linux solution to this problem
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Boaz Harrosh on
On 06/04/2010 07:23 PM, Jan Kara wrote:
> On Thu 03-06-10 19:09:52, Boaz Harrosh wrote:
>> [Topic]
>> How to not let pages change while in IO
>>
>> [Abstract]
>> As seen in a long thread on the fsdvel scsi mailing lists. Lots of
>> people have headaches and sleep less nights because individual pages
>> can change while in IO and/or DMA. Though each one as slightly different
>> needs, the mechanics look to be the same.

> Hmm, I don't think it's really about "how to not let pages change" - that
> is doable by using wait_on_page_writeback() in ->page_mkwrite and
> ->write_begin. I think the discussion is more about whether we should do it
> or whether we should rechecksum and resubmit IO in case of checksum failure
> as Nick proposed...
>
> Honza

I have hijacked the DIF threads but, No, my proposal is for a general toolset
that could be used for all the above as well as DIF if needed.

Surly even with DIF the keep-constant vs retransmit is a matter of machine+link
speed multiply by faulting work loads. So there might be situations where an admin
wants to choose.

With other none checksum fixtures, like RAID5/MIRROR this is not always an option
and it becomes keep-constant vs copy. (That is complete workload copy). So for
these setups the option is clear. No?

I'm glad that you think it is easy/doable to implement. And I'll surly test your
above receipt. Do you think it would be acceptable as a generic per-sb tunable.
So for instance an ext3 over RAID5 could turn this on and eliminate the data copy?

Lets talk about this in LSF
Boaz

>> People that care:
>> - Mirror and RAID people that need on disk consistency.
>> - Network storage that wants data checksum.
>> - DIF/DIX people
>> - ...
>>
>> I for one know nothing of the subject but am a RAID person and would
>> like a solution that does not force me to copy the complete data load.
>>
>> Please lets get all the VM VFS and drivers people in one room and see
>> if we can have a Linux solution to this problem

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on
On Sun 06-06-10 12:35:03, Boaz Harrosh wrote:
> On 06/04/2010 07:23 PM, Jan Kara wrote:
> > On Thu 03-06-10 19:09:52, Boaz Harrosh wrote:
> >> [Topic]
> >> How to not let pages change while in IO
> >>
> >> [Abstract]
> >> As seen in a long thread on the fsdvel scsi mailing lists. Lots of
> >> people have headaches and sleep less nights because individual pages
> >> can change while in IO and/or DMA. Though each one as slightly different
> >> needs, the mechanics look to be the same.
>
> > Hmm, I don't think it's really about "how to not let pages change" - that
> > is doable by using wait_on_page_writeback() in ->page_mkwrite and
> > ->write_begin. I think the discussion is more about whether we should do it
> > or whether we should rechecksum and resubmit IO in case of checksum failure
> > as Nick proposed...
> >
> > Honza
>
> I have hijacked the DIF threads but, No, my proposal is for a general
> toolset that could be used for all the above as well as DIF if needed.
>
> Surly even with DIF the keep-constant vs retransmit is a matter of
> machine+link speed multiply by faulting work loads. So there might be
> situations where an admin wants to choose.
>
> With other none checksum fixtures, like RAID5/MIRROR this is not always
> an option and it becomes keep-constant vs copy. (That is complete
> workload copy). So for these setups the option is clear. No?
Is it? You can have enough CPU / memory bandwidth to do the copying while
you need not be comfortable with a thread blocking until IO is finished
when it tries to do a rewrite...

> I'm glad that you think it is easy/doable to implement. And I'll surly
> test your above receipt. Do you think it would be acceptable as a generic
> per-sb tunable. So for instance an ext3 over RAID5 could turn this on
> and eliminate the data copy?
Yes, that would be useful. At least so that one can get real performance
numbers...

Honza
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Boaz Harrosh on
On 06/07/2010 02:37 AM, Jan Kara wrote:
>> With other none checksum fixtures, like RAID5/MIRROR this is not always
>> > an option and it becomes keep-constant vs copy. (That is complete
>> > workload copy). So for these setups the option is clear. No?
>
> Is it? You can have enough CPU / memory bandwidth to do the copying while
> you need not be comfortable with a thread blocking until IO is finished
> when it tries to do a rewrite...
>
>> I'm glad that you think it is easy/doable to implement. And I'll surly
>> test your above receipt. Do you think it would be acceptable as a generic
>> per-sb tunable. So for instance an ext3 over RAID5 could turn this on
>> and eliminate the data copy?
>
> Yes, that would be useful. At least so that one can get real performance
> numbers...
>
> Honza

Thanks Jan.
You have helped me tremendously. I think I can begin to understand now what I
need to do.

With the workloads I need (HPC), every cycle/memory counts and that the app
waits for a rewrite is a good thing, which reminds me that I would want to trace
that case so applications could be fixed, tuned.

I do understand that for a desktop, that might be just the opposite, so testing
is important. Perhaps I'll need help in instrumenting all this.

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/