From: Martin K. Petersen on
>>>>> "Christof" == Christof Schmitt <christof.schmitt(a)de.ibm.com> writes:

Christof> While experimenting with the data integrity support in the
Christof> Linux kernel, i found that the block layer integrity code can
Christof> send integrity data segments for a request that do not adhere
Christof> to the queue limits. The integrity data segment can be larger
Christof> than queue_max_segment_size and the segment does not adhere to
Christof> the queue_segment_boundary.

Correct. That was a deliberate design decision.

Modern HBAs allow essentially indefinite chaining and our block layer
segmentation controls are to some extent legacy baggage. I did not want
to put in a set of constraints on the DI scatterlist because I was
afraid it would encourage vendors to actually them.


Christof> It appears to me that the right way would be to apply the same
Christof> restrictions that are in place for data segments also to
Christof> integrity data segments. The patch works for my experiments
Christof> and applies on top of the current Linux tree (2.6.35-rc5).

Who says constraints on the integrity scatterlist are the same as on the
data ditto? In my experience they are not. If you must do this, then
the DI constraints should be separate from the data segmentation ones.
But I'm interested in what motivated this change to begin with.

Your change also has repercussions when merging requests and bios. We'd
need to honor the DI segmentation constraints when merging. Otherwise
we may end up going beyond the controller limits when mapping the sgl.

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Martin K. Petersen on
>>>>> "Jens" == Jens Axboe <axboe(a)kernel.dk> writes:

Jens> That sounds like a very batch design decision. Either the limits
Jens> are explicitly given and different, or if not we have to assume
Jens> that they are the same as the data limits at least.

Imagine a controller that has a 4KB segment, 1 entry limit. If we
capped the DI sgl the same way as the data we'd only be able to issue
512-byte requests unless the DI entries happened to be contiguous in
memory.

For several types of I/O the DI sgl is much longer than the data sgl.
Especially if the submitter is using buffer_heads to map 512-byte
blocks.

And consequently we require vendors to be able to handle the
pathological case in which any data scatterlist honoring the
segmentation constraints given by the driver can be matched with an
integrity scatterlist in which there is a separate entry for each
logical block. No vendor has had any problems with this. Therefore
there are no block layer data integrity queue limits.

If a device appears that does in fact have constraints I have no
problems intruducing a set of suitable knobs.

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Martin K. Petersen on
>>>>> "Christof" == Christof Schmitt <christof.schmitt(a)de.ibm.com> writes:

Christof,

Christof> The motivation stems from research how the integrity data can
Christof> be mapped to the hardware interface used by the zfcp
Christof> driver. When passing data segments to the zfcp hardware
Christof> controller, there is the constraint that each data segment has
Christof> a maximum size of 4k and a segment must not cross a 4k
Christof> boundary.

Ok.


Christof> Right now, this is done by reporting the maximum segment size
Christof> and segment boundary accordingly from zfcp. When issuing a
Christof> request, zfcp simply walks the sg list and passes the segments
Christof> to the hardware controller, no mapping or readjustment is
Christof> necessary in the driver.

In that case I don't really have a problem with adhering to the queue
segment size and segment boundary for the integrity metadata. As long
as we avoid using the max number of data segments to cap the integrity
scatterlist because that'll definitely break a lot of stuff.

Does the zfcp hardware have scatterlist length constraints?


>> Your change also has repercussions when merging requests and bios.
>> We'd need to honor the DI segmentation constraints when merging.
>> Otherwise we may end up going beyond the controller limits when
>> mapping the sgl.

Christof> Meaning the integrity data sg list would have more entries
Christof> than max_segments? I have not seen this during my experiments,
Christof> but then i likely have not hit every case of a possible
Christof> request layout.

It happens all the time.

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Martin K. Petersen on
>>>>> "Christof" == Christof Schmitt <christof.schmitt(a)de.ibm.com> writes:

Christof> To have a simple approach that covers the case with one
Christof> integrity data segment per user data segment, we only report
Christof> half the size for the scatterlist length when running
Christof> DIX. This guarantees that the other half can be used for
Christof> integrity data.

Yup, a few of our partners did something similar.

My concern is the scenario where we submit lots of 512-byte writes that
get merged into (in your case) 4 KB segments. Each of those 512-byte
writes could come with an 8-byte integrity metadata tuple. And so you'd
need 8 DI scatterlist elements per data element.


Christof> Meaning the integrity data sg list would have more entries
Christof> than max_segments? I have not seen this during my experiments,
Christof> but then i likely have not hit every case of a possible
Christof> request layout.

dd to the block device is usually a good way to issue long scatterlists.


Christof> Ok, i have to look into that as well. It would be an issue
Christof> with the approach we are looking at now: If there are
Christof> max_segments data segments, and more than max_segments
Christof> integrity data segments, we will overrun the hardware
Christof> constraint.

Ok.

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Martin K. Petersen on
>>>>> "Christof" == Christof Schmitt <christof.schmitt(a)de.ibm.com> writes:

Christof> To summarize the limits i see in the zfcp hardware:
Christof> - Maximum size of 4k per segment
Christof> - The segments must not cross page boundaries
Christof> - The number of segments per request is limited

The interesting thing here is that your hw has a limit for the total
number of segments whereas other DIX HBAs have separate limits for data
and integrity scatterlists.


Christof> What would be the preferred approach for handling the
Christof> integrity data limits in the block layer? Introduce new queue
Christof> limits for integrity data, or assume that the limits for
Christof> integrity data are the same as for user data? I can update my
Christof> patch accordingly and include a check for the maximum number
Christof> of segments.

I've been messing with this tonight. It's not entirely trivial because
of the housekeeping involved, having to accomodate different types of
hardware, having to avoid breaking existing setups, and having to work
with integrity compiled and without.

My first attempt at this got quite messy. I think I have found a way
but it's bedtime here. Give me a day or two to get back to you with
something that'll hopefully work for everyone.

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/