From: Boaz Harrosh on
On 03/24/2010 07:15 PM, Boaz Harrosh wrote:
> On 03/24/2010 06:39 PM, Al Viro wrote:
>> On Wed, Mar 24, 2010 at 06:10:52PM +0200, Boaz Harrosh wrote:
>>> On 03/24/2010 06:07 PM, Al Viro wrote:
>>>> On Wed, Mar 24, 2010 at 06:04:56PM +0200, Boaz Harrosh wrote:
>>>>>> Bloody impressive... Does that happen to underlying fs or to what you
>>>>>> are seeing via NFS?
>>>>>
>>>>> Only via NFS. All local access is fine.
>>>>>
>>>>> After the corruption above I can cd to the local mount cp a fresh copy
>>>>> of .git/index file and play around just fine.
>>>>> Once I return to the NFS mounted directory, a git status will do it.
>>>>> It does not matter if caches are cold (Takes a long time) or hot it happens
>>>>> every time.
>>>>>
>>>>> Weird I know, I'm playing some more with it as we speak
>>>>
>>>> What happens if you export to box running older kernel *or* from box
>>>> running older kernel? IOW, is that nfsd or nfs client getting unhappy?
>>>> I'd suspect the latter, but...
>>>
>>>
>>> Good question, I'm just getting to that because currently it's all
>>> over localhost (same kernel, BTW inside a UML)
>>>
>>> I will try what you said. Please through any other tests on me, if needed.
>>
>
> As you suspected old-server+new-client fails. any-thing+old-client is
> fine. (two separate machines this time)
>
>> Very interesting... Just to see which path we are hitting: add
>> if (IS_ERR(nd->intent.open.file))
>> printk("foo: %s", pathname);
>> right after
>> error = do_lookup(nd, &nd->last, path);
>> if (error)
>> goto exit;
>> in fs/namei.c:do_last() and see whether we are hitting it or not on objects
>> that get corrupted.
>
> Sorry was busy shifting setups, didn't see your mail, will do that next ...
>
> Thanks
> Boaz


Below is what I changed. (I hope its what you meant)
It does not get hit, just that git corruption as before but I don't see the prints.
I'll try running with nfs dbg-prints on see what it does around the time gits complains

Boaz

---
diff --git a/fs/namei.c b/fs/namei.c
index 1c0fca6..d1c96f0 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1650,6 +1650,12 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
error = do_lookup(nd, &nd->last, path);
if (error)
goto exit;
+
+ if (IS_ERR(nd->intent.open.file)) {
+ printk(KERN_ERR "foo: %s", pathname);
+ WARN_ON(1);
+ }
+
error = -ENOENT;
if (!path->dentry->d_inode)
goto exit_dput;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Boaz Harrosh on
On 03/24/2010 07:32 PM, Boaz Harrosh wrote:
> On 03/24/2010 07:15 PM, Boaz Harrosh wrote:
>> On 03/24/2010 06:39 PM, Al Viro wrote:
>>> On Wed, Mar 24, 2010 at 06:10:52PM +0200, Boaz Harrosh wrote:
>>>> On 03/24/2010 06:07 PM, Al Viro wrote:
>>>>> On Wed, Mar 24, 2010 at 06:04:56PM +0200, Boaz Harrosh wrote:
>>>>>>> Bloody impressive... Does that happen to underlying fs or to what you
>>>>>>> are seeing via NFS?
>>>>>>
>>>>>> Only via NFS. All local access is fine.
>>>>>>
>>>>>> After the corruption above I can cd to the local mount cp a fresh copy
>>>>>> of .git/index file and play around just fine.
>>>>>> Once I return to the NFS mounted directory, a git status will do it.
>>>>>> It does not matter if caches are cold (Takes a long time) or hot it happens
>>>>>> every time.
>>>>>>
>>>>>> Weird I know, I'm playing some more with it as we speak
>>>>>
>>>>> What happens if you export to box running older kernel *or* from box
>>>>> running older kernel? IOW, is that nfsd or nfs client getting unhappy?
>>>>> I'd suspect the latter, but...
>>>>
>>>>
>>>> Good question, I'm just getting to that because currently it's all
>>>> over localhost (same kernel, BTW inside a UML)
>>>>
>>>> I will try what you said. Please through any other tests on me, if needed.
>>>
>>
>> As you suspected old-server+new-client fails. any-thing+old-client is
>> fine. (two separate machines this time)
>>
>>> Very interesting... Just to see which path we are hitting: add
>>> if (IS_ERR(nd->intent.open.file))
>>> printk("foo: %s", pathname);
>>> right after
>>> error = do_lookup(nd, &nd->last, path);
>>> if (error)
>>> goto exit;
>>> in fs/namei.c:do_last() and see whether we are hitting it or not on objects
>>> that get corrupted.
>>
>> Sorry was busy shifting setups, didn't see your mail, will do that next ...
>>
>> Thanks
>> Boaz
>
>
> Below is what I changed. (I hope its what you meant)
> It does not get hit, just that git corruption as before but I don't see the prints.
> I'll try running with nfs dbg-prints on see what it does around the time gits complains
>
> Boaz
>

Attached is an output of when I:
$ echo $((0x7fff)) > /proc/sys/sunrpc/nfs_debug
and then run git status. (On a new client)

We can see the complains after things got broken but what broke it
that's hard for me to see.

(If the file is too big I'll put it on the web somewhere, see if it arrives)

Boaz

> ---
> diff --git a/fs/namei.c b/fs/namei.c
> index 1c0fca6..d1c96f0 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1650,6 +1650,12 @@ static struct file *do_last(struct nameidata *nd, struct path *path,
> error = do_lookup(nd, &nd->last, path);
> if (error)
> goto exit;
> +
> + if (IS_ERR(nd->intent.open.file)) {
> + printk(KERN_ERR "foo: %s", pathname);
> + WARN_ON(1);
> + }
> +
> error = -ENOENT;
> if (!path->dentry->d_inode)
> goto exit_dput;
>
>
> _______________________________________________
> pNFS mailing list
> pNFS(a)linux-nfs.org
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs

From: Boaz Harrosh on
On 03/24/2010 07:47 PM, Boaz Harrosh wrote:
>>> On 03/24/2010 06:39 PM, Al Viro wrote:
>>>> On Wed, Mar 24, 2010 at 06:10:52PM +0200, Boaz Harrosh wrote:
>>>>> On 03/24/2010 06:07 PM, Al Viro wrote:
>>>>>>>> Bloody impressive... Does that happen to underlying fs or to what you
>>>>>>>> are seeing via NFS?
>>>>>>>
>>>>>>> Only via NFS. All local access is fine.
>>>>>>>
<snip>

Al hi

Would you like to attempt a revert of this patch (or group of patches)
Just to get rid of the thought that git bisect was just peeking the
wrong guy. Maybe it's just something else? Can you understand the
relevance of all this?

(I'll try other setups as well but tomorrow, it's getting late out here)
Thanks for your help
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Al Viro on
On Wed, Mar 24, 2010 at 07:58:00PM +0200, Boaz Harrosh wrote:
> Al hi
>
> Would you like to attempt a revert of this patch (or group of patches)
> Just to get rid of the thought that git bisect was just peeking the
> wrong guy. Maybe it's just something else? Can you understand the
> relevance of all this?

If you see breakage at that commit and do not see it on its parent, we
do have the right guy...

As for reverting, try reverting 781b16775ba0bb55fac0e1757bf0bd87c8879632
first, then this commit.

How consistent are the effects you are seeing from test to test on the same
kernel? This one was very interesting, since it seemed to fail with
-EISDIR while opening .git/objects/pack. Which is a directory and which
should fail with -EISDIR if and only if we pass O_CREAT to open(). And
passing O_CREAT on that one is probably not an intended behaviour of git...

Does anybody else see NFS breakage starting at that commit, BTW? Other
testcases would be useful...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Trond Myklebust on
On Wed, 2010-03-24 at 19:47 +0200, Boaz Harrosh wrote:
> On 03/24/2010 07:32 PM, Boaz Harrosh wrote:
> > On 03/24/2010 07:15 PM, Boaz Harrosh wrote:
> >> On 03/24/2010 06:39 PM, Al Viro wrote:
> >>> On Wed, Mar 24, 2010 at 06:10:52PM +0200, Boaz Harrosh wrote:
> >>>> On 03/24/2010 06:07 PM, Al Viro wrote:
> >>>>> On Wed, Mar 24, 2010 at 06:04:56PM +0200, Boaz Harrosh wrote:
> >>>>>>> Bloody impressive... Does that happen to underlying fs or to what you
> >>>>>>> are seeing via NFS?
> >>>>>>
> >>>>>> Only via NFS. All local access is fine.
> >>>>>>
> >>>>>> After the corruption above I can cd to the local mount cp a fresh copy
> >>>>>> of .git/index file and play around just fine.
> >>>>>> Once I return to the NFS mounted directory, a git status will do it.
> >>>>>> It does not matter if caches are cold (Takes a long time) or hot it happens
> >>>>>> every time.
> >>>>>>
> >>>>>> Weird I know, I'm playing some more with it as we speak
> >>>>>
> >>>>> What happens if you export to box running older kernel *or* from box
> >>>>> running older kernel? IOW, is that nfsd or nfs client getting unhappy?
> >>>>> I'd suspect the latter, but...
> >>>>
> >>>>
> >>>> Good question, I'm just getting to that because currently it's all
> >>>> over localhost (same kernel, BTW inside a UML)
> >>>>
> >>>> I will try what you said. Please through any other tests on me, if needed.
> >>>
> >>
> >> As you suspected old-server+new-client fails. any-thing+old-client is
> >> fine. (two separate machines this time)
> >>
> >>> Very interesting... Just to see which path we are hitting: add
> >>> if (IS_ERR(nd->intent.open.file))
> >>> printk("foo: %s", pathname);
> >>> right after
> >>> error = do_lookup(nd, &nd->last, path);
> >>> if (error)
> >>> goto exit;
> >>> in fs/namei.c:do_last() and see whether we are hitting it or not on objects
> >>> that get corrupted.
> >>
> >> Sorry was busy shifting setups, didn't see your mail, will do that next ...
> >>
> >> Thanks
> >> Boaz
> >
> >
> > Below is what I changed. (I hope its what you meant)
> > It does not get hit, just that git corruption as before but I don't see the prints.
> > I'll try running with nfs dbg-prints on see what it does around the time gits complains
> >
> > Boaz
> >
>
> Attached is an output of when I:
> $ echo $((0x7fff)) > /proc/sys/sunrpc/nfs_debug
> and then run git status. (On a new client)
>
> We can see the complains after things got broken but what broke it
> that's hard for me to see.
>
> (If the file is too big I'll put it on the web somewhere, see if it arrives)
>
> Boaz

Something weird is going on in your trace:

NFS: open file(5b/46ff70a61cf4e159a0339df0e02113bf35f805)
NFS: permission(0:12/323044), mask=0x24, res=0
NFS: revalidating (0:12/323044)
--> nfs4_setup_sequence clp 00000000791f3000 session (null) sr_slotid
128
<-- nfs4_setup_sequence status=0
encode_compound: tag=
decode_attr_type: type=00
decode_attr_change: change attribute=10077553255782547456
decode_attr_size: file size=921
decode_attr_fsid: fsid=(0x0/0x0)
decode_attr_fileid: fileid=0
decode_attr_fs_locations: fs_locations done, error = 0
decode_attr_mode: file mode=00
decode_attr_nlink: nlink=1
decode_attr_owner: uid=-2
decode_attr_group: gid=-2
decode_attr_rdev: rdev=(0x0:0x0)
decode_attr_space_used: space used=0
decode_attr_time_access: atime=0
decode_attr_time_metadata: ctime=1269422731
decode_attr_time_modify: mtime=1269422731
decode_attr_mounted_on_fileid: fileid=0
decode_getfattr: xdr returned 0

A file type of '0' in the above trace is just wrong, and probably
indicates that the server didn't even return that attribute.

I'd say you have a corruption issue either on the server side or on your
client.

Trond

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/