From: Jens Axboe on
On 07/15/2010 10:03 AM, Martin K. Petersen wrote:
>>>>>> "Christof" == Christof Schmitt <christof.schmitt(a)de.ibm.com> writes:
>
> Christof> While experimenting with the data integrity support in the
> Christof> Linux kernel, i found that the block layer integrity code can
> Christof> send integrity data segments for a request that do not adhere
> Christof> to the queue limits. The integrity data segment can be larger
> Christof> than queue_max_segment_size and the segment does not adhere to
> Christof> the queue_segment_boundary.
>
> Correct. That was a deliberate design decision.
>
> Modern HBAs allow essentially indefinite chaining and our block layer
> segmentation controls are to some extent legacy baggage. I did not want
> to put in a set of constraints on the DI scatterlist because I was
> afraid it would encourage vendors to actually them.

That sounds like a very batch design decision. Either the limits are
explicitly given and different, or if not we have to assume that they
are the same as the data limits at least.

So do they have limits or not? Essentially indefinite, what does that
mean? If they are limited, then we must have settings to cope and map
around those. If these limits are not the same as the regular data
limits, then those limits could be separate.

> Christof> It appears to me that the right way would be to apply the same
> Christof> restrictions that are in place for data segments also to
> Christof> integrity data segments. The patch works for my experiments
> Christof> and applies on top of the current Linux tree (2.6.35-rc5).
>
> Who says constraints on the integrity scatterlist are the same as on the
> data ditto? In my experience they are not. If you must do this, then
> the DI constraints should be separate from the data segmentation ones.
> But I'm interested in what motivated this change to begin with.

Yes me too, what bug triggered this patch?

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On 07/15/2010 10:35 AM, Martin K. Petersen wrote:
>>>>>> "Jens" == Jens Axboe <axboe(a)kernel.dk> writes:
>
> Jens> That sounds like a very batch design decision. Either the limits
> Jens> are explicitly given and different, or if not we have to assume
> Jens> that they are the same as the data limits at least.
>
> Imagine a controller that has a 4KB segment, 1 entry limit. If we
> capped the DI sgl the same way as the data we'd only be able to issue
> 512-byte requests unless the DI entries happened to be contiguous in
> memory.
>
> For several types of I/O the DI sgl is much longer than the data sgl.
> Especially if the submitter is using buffer_heads to map 512-byte
> blocks.
>
> And consequently we require vendors to be able to handle the
> pathological case in which any data scatterlist honoring the
> segmentation constraints given by the driver can be matched with an
> integrity scatterlist in which there is a separate entry for each
> logical block. No vendor has had any problems with this. Therefore
> there are no block layer data integrity queue limits.
>
> If a device appears that does in fact have constraints I have no
> problems intruducing a set of suitable knobs.

OK, lets wait and hear what problem that Christof ran into here.
Is it ensuring that a segment doesn't corss eg the 4GB boundary?

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christof Schmitt on
On Thu, Jul 15, 2010 at 12:03:24PM -0400, Martin K. Petersen wrote:
> >>>>> "Christof" == Christof Schmitt <christof.schmitt(a)de.ibm.com> writes:
>
> Christof> While experimenting with the data integrity support in the
> Christof> Linux kernel, i found that the block layer integrity code can
> Christof> send integrity data segments for a request that do not adhere
> Christof> to the queue limits. The integrity data segment can be larger
> Christof> than queue_max_segment_size and the segment does not adhere to
> Christof> the queue_segment_boundary.
>
> Correct. That was a deliberate design decision.
>
> Modern HBAs allow essentially indefinite chaining and our block layer
> segmentation controls are to some extent legacy baggage. I did not want
> to put in a set of constraints on the DI scatterlist because I was
> afraid it would encourage vendors to actually them.
>
>
> Christof> It appears to me that the right way would be to apply the same
> Christof> restrictions that are in place for data segments also to
> Christof> integrity data segments. The patch works for my experiments
> Christof> and applies on top of the current Linux tree (2.6.35-rc5).
>
> Who says constraints on the integrity scatterlist are the same as on the
> data ditto? In my experience they are not. If you must do this, then
> the DI constraints should be separate from the data segmentation ones.
> But I'm interested in what motivated this change to begin with.

The motivation stems from research how the integrity data can be
mapped to the hardware interface used by the zfcp driver. When passing
data segments to the zfcp hardware controller, there is the constraint
that each data segment has a maximum size of 4k and a segment must not
cross a 4k boundary. Right now, this is done by reporting the
maximum segment size and segment boundary accordingly from zfcp. When
issuing a request, zfcp simply walks the sg list and passes the
segments to the hardware controller, no mapping or readjustment is
necessary in the driver.

Adding integrity data to this interface implies that the integrity
data segments are passed the same way and with the same restrictions.
integrity data segments. In fact, i am planning to post an
experimental patch for zfcp for upstream inclusion. While this is
still research, it does not affect non-integrity I/O and it will ease
future work on the integrity data mapping for zfcp.

Maybe my thinking is too much with the zfcp hardware interface where
it is obvious to have the same constraints for everything passed along
to the hardware. Another motivation is that i do not want to have the
need in the driver to re-map or copy data segments, when the block
layer already has a generic method of doing this.

What would be the right approach for hardware that has specific
constraints for integrity data? Add an interface for reporting
integrity data constraints independently of constraints for regular
data?

> Your change also has repercussions when merging requests and bios. We'd
> need to honor the DI segmentation constraints when merging. Otherwise
> we may end up going beyond the controller limits when mapping the sgl.

Meaning the integrity data sg list would have more entries than
max_segments? I have not seen this during my experiments, but then i
likely have not hit every case of a possible request layout.

Christof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christof Schmitt on
On Wed, Jul 21, 2010 at 12:20:01AM -0400, Martin K. Petersen wrote:
> >>>>> "Christof" == Christof Schmitt <christof.schmitt(a)de.ibm.com> writes:
>
> Christof> To have a simple approach that covers the case with one
> Christof> integrity data segment per user data segment, we only report
> Christof> half the size for the scatterlist length when running
> Christof> DIX. This guarantees that the other half can be used for
> Christof> integrity data.
>
> Yup, a few of our partners did something similar.
>
> My concern is the scenario where we submit lots of 512-byte writes that
> get merged into (in your case) 4 KB segments. Each of those 512-byte
> writes could come with an 8-byte integrity metadata tuple. And so you'd
> need 8 DI scatterlist elements per data element.
>
>
> Christof> Meaning the integrity data sg list would have more entries
> Christof> than max_segments? I have not seen this during my experiments,
> Christof> but then i likely have not hit every case of a possible
> Christof> request layout.
>
> dd to the block device is usually a good way to issue long scatterlists.
>
>
> Christof> Ok, i have to look into that as well. It would be an issue
> Christof> with the approach we are looking at now: If there are
> Christof> max_segments data segments, and more than max_segments
> Christof> integrity data segments, we will overrun the hardware
> Christof> constraint.
>
> Ok.

After looking at the given facts, this seems to be the missing part:
The zfcp hardware interface has an maximum number of data segments
that can be part of one request. In the experimental zfcp DIF/DIX
patch (now in scsi-misc), zfcp reserves half of the data segments
for integrity data. But if small bios have been merged until hitting
queue_max_segments, there may be more integrity data segments.

To summarize the limits i see in the zfcp hardware:
- Maximum size of 4k per segment
- The segments must not cross page boundaries
- The number of segments per request is limited

My preferred approach would be to set the limits on the queue, so that
the request adheres to the hardware limitations and can be passed on
directly to the hardware. I would like to avoid splitting large
segments in the driver code, and i especially want to avoid having to
copy the integrity data to new buffers to adhere to the hardware
constraints.

Looking at the block layer, the number of integrity data segments
could be limited with an additional check in ll_new_hw_segment.

What would be the preferred approach for handling the integrity data
limits in the block layer? Introduce new queue limits for integrity
data, or assume that the limits for integrity data are the same as for
user data? I can update my patch accordingly and include a check for
the maximum number of segments.

Christof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christof Schmitt on
On Tue, Aug 03, 2010 at 12:44:50AM -0400, Martin K. Petersen wrote:
> >>>>> "Christof" == Christof Schmitt <christof.schmitt(a)de.ibm.com> writes:
>
> Christof> To summarize the limits i see in the zfcp hardware:
> Christof> - Maximum size of 4k per segment
> Christof> - The segments must not cross page boundaries
> Christof> - The number of segments per request is limited
>
> The interesting thing here is that your hw has a limit for the total
> number of segments whereas other DIX HBAs have separate limits for data
> and integrity scatterlists.

Yes, the hw interface only has a limit for the total number. The best
solution would be an interface that allows reporting this limit to the
block layer. If this is not possible, or deemed too exotic, reporting
seperate limits for integrity segments and data segments would also be
fine with me.

> Christof> What would be the preferred approach for handling the
> Christof> integrity data limits in the block layer? Introduce new queue
> Christof> limits for integrity data, or assume that the limits for
> Christof> integrity data are the same as for user data? I can update my
> Christof> patch accordingly and include a check for the maximum number
> Christof> of segments.
>
> I've been messing with this tonight. It's not entirely trivial because
> of the housekeeping involved, having to accomodate different types of
> hardware, having to avoid breaking existing setups, and having to work
> with integrity compiled and without.
>
> My first attempt at this got quite messy. I think I have found a way
> but it's bedtime here. Give me a day or two to get back to you with
> something that'll hopefully work for everyone.

Ok, when you have something, i can have a look at it and see if it
matches the requirements here.

Thanks,

Christof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/