From: Eric Sandeen on
Tao Ma wrote:
> In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want
> to return fi_extent_max extents, but actually it won't work for
> a sparse file. The reason is that in xfs_getbmap we will
> calculate holes and set it in 'out', while out is malloced by
> bmv_count(fi_extent_max+1) which didn't consider holes. So in the
> worst case, if 'out' vector looks like
> [hole, extent, hole, extent, hole, ... hole, extent, hole],
> we will only return half of fi_extent_max extents.
>
> So in xfs_vn_fiemap, we should consider this worst case. If the
> user wants fi_extent_max extents, we need a 'out' with size of
> 2 *fi_extent_max + 1.

This all seems right to me, though your commit message above (+1)
doesn't match the comment and code in the patch (+2)

-Eric

> Cc: Alex Elder <aelder(a)sgi.com>
> Cc: Christoph Hellwig <hch(a)lst.de>
> Cc: Dave Chinner <david(a)fromorbit.com>
> Signed-off-by: Tao Ma <tao.ma(a)oracle.com>
> ---
> fs/xfs/linux-2.6/xfs_iops.c | 16 ++++++++++++++--
> 1 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
> index 9c8019c..1db92e3 100644
> --- a/fs/xfs/linux-2.6/xfs_iops.c
> +++ b/fs/xfs/linux-2.6/xfs_iops.c
> @@ -672,9 +672,21 @@ xfs_vn_fiemap(
> else
> bm.bmv_length = BTOBB(length);
>
> - /* We add one because in getbmap world count includes the header */
> + /*
> + * It is a bit tricky for us to calculate the bmv_count from
> + * fi_extent_max.
> + * If we support to return fi_extent_max extents to the user,
> + * we need at most 2 * fi_extent_max + 1 for bmv_count since
> + * in xfs_getbmap we will calculate holes while fi_extent_max
> + * don't have them. So in the worst case, bmv can looks like
> + * [hole, extent, hole, extent, hole, ... hole, extent, hole].
> + * So there will be 2 *fi_extent_max + 1.
> + * What's more, in getbmap world count have to include the
> + * header, so we need another bmv. So the total number will
> + * be 2 * fieinfo->fi_extents_max + 2.
> + */
> bm.bmv_count = !fieinfo->fi_extents_max ? MAXEXTNUM :
> - fieinfo->fi_extents_max + 1;
> + 2 * fieinfo->fi_extents_max + 2;
> bm.bmv_count = min_t(__s32, bm.bmv_count,
> (PAGE_SIZE * 16 / sizeof(struct getbmapx)));
> bm.bmv_iflags = BMV_IF_PREALLOC;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tao Ma on
Eric Sandeen wrote:
> Tao Ma wrote:
>
>> In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want
>> to return fi_extent_max extents, but actually it won't work for
>> a sparse file. The reason is that in xfs_getbmap we will
>> calculate holes and set it in 'out', while out is malloced by
>> bmv_count(fi_extent_max+1) which didn't consider holes. So in the
>> worst case, if 'out' vector looks like
>> [hole, extent, hole, extent, hole, ... hole, extent, hole],
>> we will only return half of fi_extent_max extents.
>>
>> So in xfs_vn_fiemap, we should consider this worst case. If the
>> user wants fi_extent_max extents, we need a 'out' with size of
>> 2 *fi_extent_max + 1.
>>
>
> This all seems right to me, though your commit message above (+1)
> doesn't match the comment and code in the patch (+2)
>
oh, yes, I will change the commit log and send a v2.
and can I add your ack for this patch?

Regards,
Tao
> -Eric
>
>
>> Cc: Alex Elder <aelder(a)sgi.com>
>> Cc: Christoph Hellwig <hch(a)lst.de>
>> Cc: Dave Chinner <david(a)fromorbit.com>
>> Signed-off-by: Tao Ma <tao.ma(a)oracle.com>
>> ---
>> fs/xfs/linux-2.6/xfs_iops.c | 16 ++++++++++++++--
>> 1 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
>> index 9c8019c..1db92e3 100644
>> --- a/fs/xfs/linux-2.6/xfs_iops.c
>> +++ b/fs/xfs/linux-2.6/xfs_iops.c
>> @@ -672,9 +672,21 @@ xfs_vn_fiemap(
>> else
>> bm.bmv_length = BTOBB(length);
>>
>> - /* We add one because in getbmap world count includes the header */
>> + /*
>> + * It is a bit tricky for us to calculate the bmv_count from
>> + * fi_extent_max.
>> + * If we support to return fi_extent_max extents to the user,
>> + * we need at most 2 * fi_extent_max + 1 for bmv_count since
>> + * in xfs_getbmap we will calculate holes while fi_extent_max
>> + * don't have them. So in the worst case, bmv can looks like
>> + * [hole, extent, hole, extent, hole, ... hole, extent, hole].
>> + * So there will be 2 *fi_extent_max + 1.
>> + * What's more, in getbmap world count have to include the
>> + * header, so we need another bmv. So the total number will
>> + * be 2 * fieinfo->fi_extents_max + 2.
>> + */
>> bm.bmv_count = !fieinfo->fi_extents_max ? MAXEXTNUM :
>> - fieinfo->fi_extents_max + 1;
>> + 2 * fieinfo->fi_extents_max + 2;
>> bm.bmv_count = min_t(__s32, bm.bmv_count,
>> (PAGE_SIZE * 16 / sizeof(struct getbmapx)));
>> bm.bmv_iflags = BMV_IF_PREALLOC;
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Sandeen on
Tao Ma wrote:
> Eric Sandeen wrote:
>> Tao Ma wrote:
>>
>>> In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want
>>> to return fi_extent_max extents, but actually it won't work for
>>> a sparse file. The reason is that in xfs_getbmap we will
>>> calculate holes and set it in 'out', while out is malloced by
>>> bmv_count(fi_extent_max+1) which didn't consider holes. So in the
>>> worst case, if 'out' vector looks like
>>> [hole, extent, hole, extent, hole, ... hole, extent, hole],
>>> we will only return half of fi_extent_max extents.
>>>
>>> So in xfs_vn_fiemap, we should consider this worst case. If the
>>> user wants fi_extent_max extents, we need a 'out' with size of
>>> 2 *fi_extent_max + 1.
>>>
>>
>> This all seems right to me, though your commit message above (+1)
>> doesn't match the comment and code in the patch (+2)
>>
> oh, yes, I will change the commit log and send a v2.
> and can I add your ack for this patch?

Sure, it seems right to me.

Thanks,
-Eric

> Regards,
> Tao
>> -Eric
>>
>>
>>> Cc: Alex Elder <aelder(a)sgi.com>
>>> Cc: Christoph Hellwig <hch(a)lst.de>
>>> Cc: Dave Chinner <david(a)fromorbit.com>
>>> Signed-off-by: Tao Ma <tao.ma(a)oracle.com>
>>> ---
>>> fs/xfs/linux-2.6/xfs_iops.c | 16 ++++++++++++++--
>>> 1 files changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
>>> index 9c8019c..1db92e3 100644
>>> --- a/fs/xfs/linux-2.6/xfs_iops.c
>>> +++ b/fs/xfs/linux-2.6/xfs_iops.c
>>> @@ -672,9 +672,21 @@ xfs_vn_fiemap(
>>> else
>>> bm.bmv_length = BTOBB(length);
>>>
>>> - /* We add one because in getbmap world count includes the header */
>>> + /*
>>> + * It is a bit tricky for us to calculate the bmv_count from
>>> + * fi_extent_max.
>>> + * If we support to return fi_extent_max extents to the user,
>>> + * we need at most 2 * fi_extent_max + 1 for bmv_count since
>>> + * in xfs_getbmap we will calculate holes while fi_extent_max
>>> + * don't have them. So in the worst case, bmv can looks like
>>> + * [hole, extent, hole, extent, hole, ... hole, extent, hole].
>>> + * So there will be 2 *fi_extent_max + 1.
>>> + * What's more, in getbmap world count have to include the
>>> + * header, so we need another bmv. So the total number will
>>> + * be 2 * fieinfo->fi_extents_max + 2.
>>> + */
>>> bm.bmv_count = !fieinfo->fi_extents_max ? MAXEXTNUM :
>>> - fieinfo->fi_extents_max + 1;
>>> + 2 * fieinfo->fi_extents_max + 2;
>>> bm.bmv_count = min_t(__s32, bm.bmv_count,
>>> (PAGE_SIZE * 16 / sizeof(struct getbmapx)));
>>> bm.bmv_iflags = BMV_IF_PREALLOC;
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to majordomo(a)vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Chinner on
On Sat, Jun 12, 2010 at 10:08:15AM +0800, Tao Ma wrote:
> In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want
> to return fi_extent_max extents, but actually it won't work for
> a sparse file.

Define "won't work". i.e. what's the test case? I just created a
sparse file and checked it, and it reported all the extents in it:

# xfs_bmap -vp testfile
testfile:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..7]: hole 8
1: [8..15]: 96..103 0 (96..103) 8 00000
2: [16..23]: hole 8
3: [24..31]: 112..119 0 (112..119) 8 00000
4: [32..39]: hole 8
5: [40..47]: 128..135 0 (128..135) 8 00000
6: [48..55]: hole 8
7: [56..63]: 144..151 0 (144..151) 8 00000
8: [64..71]: hole 8
9: [72..79]: 160..167 0 (160..167) 8 00000
10: [80..87]: hole 8
11: [88..95]: 176..183 0 (176..183) 8 00000
12: [96..103]: hole 8
13: [104..111]: 192..199 0 (192..199) 8 00000
14: [112..119]: hole 8
15: [120..127]: 208..215 0 (208..215) 8 00000
# filefrag -v testfile
Filesystem type is: 58465342
File size of testfile is 65536 (16 blocks, blocksize 4096)
ext logical physical expected length flags
0 1 12 1
1 3 14 12 1
2 5 16 14 1
3 7 18 16 1
4 9 20 18 1
5 11 22 20 1
6 13 24 22 1
7 15 26 24 1 eof
testfile: 9 extents found
#

FWIW, filefrag seems busted - the file has 8 extents, not 9.

For a more fragmented sparse file (25,000 extents):

# for i in `seq 1 2 50000`; do dd if=/dev/zero of=testfile bs=4k count=1 seek=$i; done
.....
# xfs_bmap -vp testfile | grep -v hole | wc -l
25002
# filefrag -v testfile |tail -1
testfile: 25001 extents found

So taking away the 2 header lines from xfs_bmap output we have 25000
extents, and filefrag has over-counted by one again. However, we are
we are definitely finding all the extents through fiemap...

> The reason is that in xfs_getbmap we will
> calculate holes and set it in 'out', while out is malloced by
> bmv_count(fi_extent_max+1) which didn't consider holes. So in the
> worst case, if 'out' vector looks like
> [hole, extent, hole, extent, hole, ... hole, extent, hole],
> we will only return half of fi_extent_max extents.

Right, it's not broken, we simply return less than fi_extent_mex
extents when there are holes. I don't see that as a problem as
applications have to handle that case anyway, and....

> So in xfs_vn_fiemap, we should consider this worst case. If the
> user wants fi_extent_max extents, we need a 'out' with size of
> 2 *fi_extent_max + 2(one more the header).

That's rather dangerous, I think. It relies on other code to catch
the buffer overrun that this sets up for fragmented, non-sparse
files. Personally I'd much prefer to return fewer extents for sparse
files than to add a landmine like this into the kernel code....

Cheers,

Dave.
--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tao Ma on


On 06/14/2010 08:27 AM, Dave Chinner wrote:
> On Sat, Jun 12, 2010 at 10:08:15AM +0800, Tao Ma wrote:
>> In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want
>> to return fi_extent_max extents, but actually it won't work for
>> a sparse file.
>
> Define "won't work". i.e. what's the test case? I just created a
> sparse file and checked it, and it reported all the extents in it:
>
> # xfs_bmap -vp testfile
> testfile:
> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> 0: [0..7]: hole 8
> 1: [8..15]: 96..103 0 (96..103) 8 00000
> 2: [16..23]: hole 8
> 3: [24..31]: 112..119 0 (112..119) 8 00000
> 4: [32..39]: hole 8
> 5: [40..47]: 128..135 0 (128..135) 8 00000
> 6: [48..55]: hole 8
> 7: [56..63]: 144..151 0 (144..151) 8 00000
> 8: [64..71]: hole 8
> 9: [72..79]: 160..167 0 (160..167) 8 00000
> 10: [80..87]: hole 8
> 11: [88..95]: 176..183 0 (176..183) 8 00000
> 12: [96..103]: hole 8
> 13: [104..111]: 192..199 0 (192..199) 8 00000
> 14: [112..119]: hole 8
> 15: [120..127]: 208..215 0 (208..215) 8 00000
ok, so let me explain it. In commit
2d1ff3c75a4642062d314634290be6d8da4ffb03, I add the mode for extent
query of fiemap for xfs. So with your test file, it will return that we
have 8 extents(because in xfs_fiemap_format we don't return holes). So
normally and naturally, a user begin to iterate all the extents by doing

fiemap = malloc(sizeof(fiemap) + 8 * sizeof(struct fiemap_extent));
fiemap->fm_extent_count = 8

But what will happen? He will only get 4 extent. So do you think it is
acceptable for a user? We told him that we have 8 extents, he has
allocated enough space, but he can't get what he wanted. And he need to
fiemap = malloc(sizeof(fiemap) + 16 * sizeof(struct fiemap_extent));
fiemap->fm_extent_count = 16
to get 8 extent for your test file.

> # filefrag -v testfile
> Filesystem type is: 58465342
> File size of testfile is 65536 (16 blocks, blocksize 4096)
> ext logical physical expected length flags
> 0 1 12 1
> 1 3 14 12 1
> 2 5 16 14 1
> 3 7 18 16 1
> 4 9 20 18 1
> 5 11 22 20 1
> 6 13 24 22 1
> 7 15 26 24 1 eof
> testfile: 9 extents found
> #
>
> FWIW, filefrag seems busted - the file has 8 extents, not 9.
yeah, filefrag is really broken.
>
>> The reason is that in xfs_getbmap we will
>> calculate holes and set it in 'out', while out is malloced by
>> bmv_count(fi_extent_max+1) which didn't consider holes. So in the
>> worst case, if 'out' vector looks like
>> [hole, extent, hole, extent, hole, ... hole, extent, hole],
>> we will only return half of fi_extent_max extents.
>
> Right, it's not broken, we simply return less than fi_extent_mex
> extents when there are holes. I don't see that as a problem as
> applications have to handle that case anyway, and....
see my above test case. I guess we really don't want a userspace user to
allocate num_extents * 2 + 1 fiemap_extent to get them.
>
>> So in xfs_vn_fiemap, we should consider this worst case. If the
>> user wants fi_extent_max extents, we need a 'out' with size of
>> 2 *fi_extent_max + 2(one more the header).
>
> That's rather dangerous, I think. It relies on other code to catch
> the buffer overrun that this sets up for fragmented, non-sparse
> files. Personally I'd much prefer to return fewer extents for sparse
> files than to add a landmine like this into the kernel code....
We just change the size of our 'out', we don't change fi_extent_max or
anything related to the fiemap. So I think what we care is how to keep
our 'out' in good shape and fiemap should handle and check their
fi_extent_max if we pass it more extents.

btw, maybe there is a better solution for the problem I described above.
If there is a good one, I am happy to accept it.

Regards,
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/