From: Dave Chinner on
On Mon, Jun 14, 2010 at 10:27:06AM +1000, Dave Chinner wrote:
> On Sat, Jun 12, 2010 at 10:08:15AM +0800, Tao Ma wrote:
> > The reason is that in xfs_getbmap we will
> > calculate holes and set it in 'out', while out is malloced by
> > bmv_count(fi_extent_max+1) which didn't consider holes. So in the
> > worst case, if 'out' vector looks like
> > [hole, extent, hole, extent, hole, ... hole, extent, hole],
> > we will only return half of fi_extent_max extents.
>
> Right, it's not broken, we simply return less than fi_extent_mex
> extents when there are holes. I don't see that as a problem as
> applications have to handle that case anyway, and....
>
> > So in xfs_vn_fiemap, we should consider this worst case. If the
> > user wants fi_extent_max extents, we need a 'out' with size of
> > 2 *fi_extent_max + 2(one more the header).
>
> That's rather dangerous, I think. It relies on other code to catch
> the buffer overrun that this sets up for fragmented, non-sparse
> files. Personally I'd much prefer to return fewer extents for sparse
> files than to add a landmine like this into the kernel code....

I just had a thought - if you want to avoid holes being reported to
fiemap, then add a BMV_IF_NO_HOLES flag to xfs_getbmap() and skip
holes in the mappin gloop when this flag is set. That will make
fiemap fill in the full number of extents without hacking the
extent count...

Cheers,

Dave.
--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tao Ma on
Hi Dave,

Dave Chinner wrote:
> On Mon, Jun 14, 2010 at 10:27:06AM +1000, Dave Chinner wrote:
>
>> On Sat, Jun 12, 2010 at 10:08:15AM +0800, Tao Ma wrote:
>>
>>> The reason is that in xfs_getbmap we will
>>> calculate holes and set it in 'out', while out is malloced by
>>> bmv_count(fi_extent_max+1) which didn't consider holes. So in the
>>> worst case, if 'out' vector looks like
>>> [hole, extent, hole, extent, hole, ... hole, extent, hole],
>>> we will only return half of fi_extent_max extents.
>>>
>> Right, it's not broken, we simply return less than fi_extent_mex
>> extents when there are holes. I don't see that as a problem as
>> applications have to handle that case anyway, and....
>>
>>
>>> So in xfs_vn_fiemap, we should consider this worst case. If the
>>> user wants fi_extent_max extents, we need a 'out' with size of
>>> 2 *fi_extent_max + 2(one more the header).
>>>
>> That's rather dangerous, I think. It relies on other code to catch
>> the buffer overrun that this sets up for fragmented, non-sparse
>> files. Personally I'd much prefer to return fewer extents for sparse
>> files than to add a landmine like this into the kernel code....
>>
>
> I just had a thought - if you want to avoid holes being reported to
> fiemap, then add a BMV_IF_NO_HOLES flag to xfs_getbmap() and skip
> holes in the mappin gloop when this flag is set. That will make
> fiemap fill in the full number of extents without hacking the
> extent count...
>
yeah, that should work and I will try to generate a patch for it.
I am not quite familiar with xfs, so please be kind to me if I make some
stupid mistake in the patch. ;)

Regards,
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tao Ma on
Hi Dave,
On 06/14/2010 08:29 PM, Dave Chinner wrote:
> I just had a thought - if you want to avoid holes being reported to
> fiemap, then add a BMV_IF_NO_HOLES flag to xfs_getbmap() and skip
> holes in the mappin gloop when this flag is set. That will make
> fiemap fill in the full number of extents without hacking the
> extent count...
Here is the updated one. I have used BVM_IF_NO_HOLES in xfs_getbmap
to skip increasing index 'cur_ext'. It is a bit ugly, see my commit
log. I guess maybe we can add another flag in xfs_bmapi so that it
don't even give us the holes?

Regards,
Tao

From cee1765ffd3e2b003b837666b4620b5107ed9ddd Mon Sep 17 00:00:00 2001
From: Tao Ma <tao.ma(a)oracle.com>
Date: Thu, 17 Jun 2010 16:14:22 +0800
Subject: [PATCH v3] xfs: Make fiemap works with sparse file.

In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want
to return fi_extent_max extents, but actually it won't work for
a sparse file. The reason is that in xfs_getbmap we will
calculate holes and set it in 'out', while out is malloced by
bmv_count(fi_extent_max+1) which didn't consider holes. So in the
worst case, if 'out' vector looks like
[hole, extent, hole, extent, hole, ... hole, extent, hole],
we will only return half of fi_extent_max extents.

This patch add a new parameter BMV_IF_NO_HOLES for bvm_iflags.
So with this flags, we don't use our 'out' in xfs_getbmap for
a hole. The solution is a bit ugly by just don't increasing
index of 'out' vector. I felt that it is not easy to skip it
at the very beginning since we have the complicated check and
some function like xfs_getbmapx_fix_eof_hole to adjust 'out'.

Cc: Dave Chinner <david(a)fromorbit.com>
Cc: Alex Elder <aelder(a)sgi.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Eric Sandeen <sandeen(a)redhat.com>
Signed-off-by: Tao Ma <tao.ma(a)oracle.com>
---
fs/xfs/linux-2.6/xfs_iops.c | 2 +-
fs/xfs/xfs_bmap.c | 14 +++++++++++++-
fs/xfs/xfs_fs.h | 4 +++-
3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
index 9c8019c..505d5c0 100644
--- a/fs/xfs/linux-2.6/xfs_iops.c
+++ b/fs/xfs/linux-2.6/xfs_iops.c
@@ -677,7 +677,7 @@ xfs_vn_fiemap(
fieinfo->fi_extents_max + 1;
bm.bmv_count = min_t(__s32, bm.bmv_count,
(PAGE_SIZE * 16 / sizeof(struct getbmapx)));
- bm.bmv_iflags = BMV_IF_PREALLOC;
+ bm.bmv_iflags = BMV_IF_PREALLOC | BMV_IF_NO_HOLES;
if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR)
bm.bmv_iflags |= BMV_IF_ATTRFORK;
if (!(fieinfo->fi_flags & FIEMAP_FLAG_SYNC))
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
index 99587de..d49107d 100644
--- a/fs/xfs/xfs_bmap.c
+++ b/fs/xfs/xfs_bmap.c
@@ -5744,12 +5744,24 @@ xfs_getbmap(
map[i].br_startblock))
goto out_free_map;

- nexleft--;
bmv->bmv_offset =
out[cur_ext].bmv_offset +
out[cur_ext].bmv_length;
bmv->bmv_length =
max_t(__int64_t, 0, bmvend - bmv->bmv_offset);
+
+ /*
+ * In case we don't want to return the hole,
+ * don't increase cur_ext so that we can reuse
+ * it in the next loop.
+ */
+ if ((iflags & BMV_IF_NO_HOLES) &&
+ out[cur_ext].bmv_block == -1LL) {
+ memset(&out[cur_ext], 0, sizeof(out[cur_ext]));
+ continue;
+ }
+
+ nexleft--;
bmv->bmv_entries++;
cur_ext++;
}
diff --git a/fs/xfs/xfs_fs.h b/fs/xfs/xfs_fs.h
index 7cf7220..87c2e9d 100644
--- a/fs/xfs/xfs_fs.h
+++ b/fs/xfs/xfs_fs.h
@@ -114,8 +114,10 @@ struct getbmapx {
#define BMV_IF_NO_DMAPI_READ 0x2 /* Do not generate DMAPI read event */
#define BMV_IF_PREALLOC 0x4 /* rtn status BMV_OF_PREALLOC if req */
#define BMV_IF_DELALLOC 0x8 /* rtn status BMV_OF_DELALLOC if req */
+#define BMV_IF_NO_HOLES 0x10 /* Do not return holes */
#define BMV_IF_VALID \
- (BMV_IF_ATTRFORK|BMV_IF_NO_DMAPI_READ|BMV_IF_PREALLOC|BMV_IF_DELALLOC)
+ (BMV_IF_ATTRFORK|BMV_IF_NO_DMAPI_READ|BMV_IF_PREALLOC| \
+ BMV_IF_DELALLOC|BMV_IF_NO_HOLES)

/* bmv_oflags values - returned for each non-header segment */
#define BMV_OF_PREALLOC 0x1 /* segment = unwritten pre-allocation */
--
1.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Chinner on
On Thu, Jun 17, 2010 at 04:53:19PM +0800, Tao Ma wrote:
> Hi Dave,
> On 06/14/2010 08:29 PM, Dave Chinner wrote:
> > I just had a thought - if you want to avoid holes being reported to
> > fiemap, then add a BMV_IF_NO_HOLES flag to xfs_getbmap() and skip
> > holes in the mappin gloop when this flag is set. That will make
> > fiemap fill in the full number of extents without hacking the
> > extent count...
> Here is the updated one. I have used BVM_IF_NO_HOLES in xfs_getbmap
> to skip increasing index 'cur_ext'. It is a bit ugly, see my commit
> log. I guess maybe we can add another flag in xfs_bmapi so that it
> don't even give us the holes?

No need...

>
> Regards,
> Tao
>
> From cee1765ffd3e2b003b837666b4620b5107ed9ddd Mon Sep 17 00:00:00 2001
> From: Tao Ma <tao.ma(a)oracle.com>
> Date: Thu, 17 Jun 2010 16:14:22 +0800
> Subject: [PATCH v3] xfs: Make fiemap works with sparse file.
>
> In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want
> to return fi_extent_max extents, but actually it won't work for
> a sparse file. The reason is that in xfs_getbmap we will
> calculate holes and set it in 'out', while out is malloced by
> bmv_count(fi_extent_max+1) which didn't consider holes. So in the
> worst case, if 'out' vector looks like
> [hole, extent, hole, extent, hole, ... hole, extent, hole],
> we will only return half of fi_extent_max extents.
>
> This patch add a new parameter BMV_IF_NO_HOLES for bvm_iflags.
> So with this flags, we don't use our 'out' in xfs_getbmap for
> a hole. The solution is a bit ugly by just don't increasing
> index of 'out' vector. I felt that it is not easy to skip it
> at the very beginning since we have the complicated check and
> some function like xfs_getbmapx_fix_eof_hole to adjust 'out'.

.... because I think we can safely skip xfs_getbmapx_fix_eof_hole()
it only modifies the hole. Hence just adding a check after the
attribute fork end check (which needs to detect a hole to terminate)
should be fine: e.g something like:

if (map[i].br_startblock == HOLESTARTBLOCK &&
whichfork == XFS_ATTR_FORK) {
/* came to the end of attribute fork */
out[cur_ext].bmv_oflags |= BMV_OF_LAST;
goto out_free_map;
}
+ if (map[i].br_startblock == HOLESTARTBLOCK &&
+ (iflags & BMV_IF_NO_HOLES)) {
+ memset(&out[cur_ext], 0, sizeof(out[cur_ext]));
+ continue;
+ }

Should work and avoid the worst of the ugliness.

The rest of the patch looks fine.

Cheers,

Dave.

--
Dave Chinner
david(a)fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tao Ma on
Hi Dave,

On 06/18/2010 08:47 AM, Dave Chinner wrote:
> On Thu, Jun 17, 2010 at 04:53:19PM +0800, Tao Ma wrote:
>> Hi Dave,
>> On 06/14/2010 08:29 PM, Dave Chinner wrote:
>>> I just had a thought - if you want to avoid holes being reported to
>>> fiemap, then add a BMV_IF_NO_HOLES flag to xfs_getbmap() and skip
>>> holes in the mappin gloop when this flag is set. That will make
>>> fiemap fill in the full number of extents without hacking the
>>> extent count...
>> Here is the updated one. I have used BVM_IF_NO_HOLES in xfs_getbmap
>> to skip increasing index 'cur_ext'. It is a bit ugly, see my commit
>> log. I guess maybe we can add another flag in xfs_bmapi so that it
>> don't even give us the holes?
>
> No need...
I am fine with it.
>
>>
>> Regards,
>> Tao
>>
>> From cee1765ffd3e2b003b837666b4620b5107ed9ddd Mon Sep 17 00:00:00 2001
>> From: Tao Ma<tao.ma(a)oracle.com>
>> Date: Thu, 17 Jun 2010 16:14:22 +0800
>> Subject: [PATCH v3] xfs: Make fiemap works with sparse file.
>>
>> In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want
>> to return fi_extent_max extents, but actually it won't work for
>> a sparse file. The reason is that in xfs_getbmap we will
>> calculate holes and set it in 'out', while out is malloced by
>> bmv_count(fi_extent_max+1) which didn't consider holes. So in the
>> worst case, if 'out' vector looks like
>> [hole, extent, hole, extent, hole, ... hole, extent, hole],
>> we will only return half of fi_extent_max extents.
>>
>> This patch add a new parameter BMV_IF_NO_HOLES for bvm_iflags.
>> So with this flags, we don't use our 'out' in xfs_getbmap for
>> a hole. The solution is a bit ugly by just don't increasing
>> index of 'out' vector. I felt that it is not easy to skip it
>> at the very beginning since we have the complicated check and
>> some function like xfs_getbmapx_fix_eof_hole to adjust 'out'.
>
> ... because I think we can safely skip xfs_getbmapx_fix_eof_hole()
> it only modifies the hole. Hence just adding a check after the
> attribute fork end check (which needs to detect a hole to terminate)
> should be fine: e.g something like:
>
> if (map[i].br_startblock == HOLESTARTBLOCK&&
> whichfork == XFS_ATTR_FORK) {
> /* came to the end of attribute fork */
> out[cur_ext].bmv_oflags |= BMV_OF_LAST;
> goto out_free_map;
> }
> + if (map[i].br_startblock == HOLESTARTBLOCK&&
> + (iflags& BMV_IF_NO_HOLES)) {
> + memset(&out[cur_ext], 0, sizeof(out[cur_ext]));
> + continue;
> + }
>
> Should work and avoid the worst of the ugliness.
I am afraid it doesn't work, at least from my test. It enters a dead loop.
I think the root cause is that your change doesn't update bmv_offset and
bmv_length for a hole. So in the large loop,
do {
nmap = (nexleft > subnex) ? subnex : nexleft;
error = xfs_bmapi(NULL, ip, XFS_BB_TO_FSBT(mp, bmv->bmv_offset),
XFS_BB_TO_FSB(mp, bmv->bmv_length),
bmapi_flags, NULL, 0, map, &nmap,
NULL, NULL);
if (error)
goto out_free_map;
....
} while (nmap && nexleft && bmv->bmv_length);

We will dead loop there and we need xfs_getbmapx_fix_eof_hole to go out
directly.

Regards,
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/