From: Jan Kara on
On Mon 04-01-10 16:33:36, Greg Kroah-Hartman wrote:
> From: Dmitry Monakhov <dmonakhov(a)openvz.org>
>
> commit d21cd8f163ac44b15c465aab7306db931c606908 upstream.
Ted has found a serious bug in this patch over Christmas and fixed it.
Ted, do you prefer to just discard this patch or will you add your fixes?

Honza

> We have to delay vfs_dq_claim_space() until allocation context destruction.
> Currently we have following call-trace:
> ext4_mb_new_blocks()
> /* task is already holding ac->alloc_semp */
> ->ext4_mb_mark_diskspace_used
> ->vfs_dq_claim_space() /* acquire dqptr_sem here. Possible deadlock */
> ->ext4_mb_release_context() /* drop ac->alloc_semp here */
>
> Let's move quota claiming to ext4_da_update_reserve_space()
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.32-rc7 #18
> -------------------------------------------------------
> write-truncate-/3465 is trying to acquire lock:
> (&s->s_dquot.dqptr_sem){++++..}, at: [<c025e73b>] dquot_claim_space+0x3b/0x1b0
>
> but task is already holding lock:
> (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&meta_group_info[i]->alloc_sem){++++..}:
> [<c017d04b>] __lock_acquire+0xd7b/0x1260
> [<c017d5ea>] lock_acquire+0xba/0xd0
> [<c0527191>] down_read+0x51/0x90
> [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
> [<c02d0c1c>] ext4_mb_free_blocks+0x46c/0x870
> [<c029c9d3>] ext4_free_blocks+0x73/0x130
> [<c02c8cfc>] ext4_ext_truncate+0x76c/0x8d0
> [<c02a8087>] ext4_truncate+0x187/0x5e0
> [<c01e0f7b>] vmtruncate+0x6b/0x70
> [<c022ec02>] inode_setattr+0x62/0x190
> [<c02a2d7a>] ext4_setattr+0x25a/0x370
> [<c022ee81>] notify_change+0x151/0x340
> [<c021349d>] do_truncate+0x6d/0xa0
> [<c0221034>] may_open+0x1d4/0x200
> [<c022412b>] do_filp_open+0x1eb/0x910
> [<c021244d>] do_sys_open+0x6d/0x140
> [<c021258e>] sys_open+0x2e/0x40
> [<c0103100>] sysenter_do_call+0x12/0x32
>
> -> #2 (&ei->i_data_sem){++++..}:
> [<c017d04b>] __lock_acquire+0xd7b/0x1260
> [<c017d5ea>] lock_acquire+0xba/0xd0
> [<c0527191>] down_read+0x51/0x90
> [<c02a5787>] ext4_get_blocks+0x47/0x450
> [<c02a74c1>] ext4_getblk+0x61/0x1d0
> [<c02a7a7f>] ext4_bread+0x1f/0xa0
> [<c02bcddc>] ext4_quota_write+0x12c/0x310
> [<c0262d23>] qtree_write_dquot+0x93/0x120
> [<c0261708>] v2_write_dquot+0x28/0x30
> [<c025d3fb>] dquot_commit+0xab/0xf0
> [<c02be977>] ext4_write_dquot+0x77/0x90
> [<c02be9bf>] ext4_mark_dquot_dirty+0x2f/0x50
> [<c025e321>] dquot_alloc_inode+0x101/0x180
> [<c029fec2>] ext4_new_inode+0x602/0xf00
> [<c02ad789>] ext4_create+0x89/0x150
> [<c0221ff2>] vfs_create+0xa2/0xc0
> [<c02246e7>] do_filp_open+0x7a7/0x910
> [<c021244d>] do_sys_open+0x6d/0x140
> [<c021258e>] sys_open+0x2e/0x40
> [<c0103100>] sysenter_do_call+0x12/0x32
>
> -> #1 (&sb->s_type->i_mutex_key#7/4){+.+...}:
> [<c017d04b>] __lock_acquire+0xd7b/0x1260
> [<c017d5ea>] lock_acquire+0xba/0xd0
> [<c0526505>] mutex_lock_nested+0x65/0x2d0
> [<c0260c9d>] vfs_load_quota_inode+0x4bd/0x5a0
> [<c02610af>] vfs_quota_on_path+0x5f/0x70
> [<c02bc812>] ext4_quota_on+0x112/0x190
> [<c026345a>] sys_quotactl+0x44a/0x8a0
> [<c0103100>] sysenter_do_call+0x12/0x32
>
> -> #0 (&s->s_dquot.dqptr_sem){++++..}:
> [<c017d361>] __lock_acquire+0x1091/0x1260
> [<c017d5ea>] lock_acquire+0xba/0xd0
> [<c0527191>] down_read+0x51/0x90
> [<c025e73b>] dquot_claim_space+0x3b/0x1b0
> [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
> [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
> [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
> [<c02a5966>] ext4_get_blocks+0x226/0x450
> [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
> [<c02a6ed6>] ext4_da_writepages+0x506/0x790
> [<c01de272>] do_writepages+0x22/0x50
> [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
> [<c01d7b9b>] filemap_flush+0x2b/0x30
> [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
> [<c029e595>] ext4_release_file+0x75/0xb0
> [<c0216b59>] __fput+0xf9/0x210
> [<c0216c97>] fput+0x27/0x30
> [<c02122dc>] filp_close+0x4c/0x80
> [<c014510e>] put_files_struct+0x6e/0xd0
> [<c01451b7>] exit_files+0x47/0x60
> [<c0146a24>] do_exit+0x144/0x710
> [<c0147028>] do_group_exit+0x38/0xa0
> [<c0159abc>] get_signal_to_deliver+0x2ac/0x410
> [<c0102849>] do_notify_resume+0xb9/0x890
> [<c01032d2>] work_notifysig+0x13/0x21
>
> other info that might help us debug this:
>
> 3 locks held by write-truncate-/3465:
> #0: (jbd2_handle){+.+...}, at: [<c02e1f8f>] start_this_handle+0x38f/0x5c0
> #1: (&ei->i_data_sem){++++..}, at: [<c02a57f6>] ext4_get_blocks+0xb6/0x450
> #2: (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
>
> stack backtrace:
> Pid: 3465, comm: write-truncate- Not tainted 2.6.32-rc7 #18
> Call Trace:
> [<c0524cb3>] ? printk+0x1d/0x22
> [<c017ac9a>] print_circular_bug+0xca/0xd0
> [<c017d361>] __lock_acquire+0x1091/0x1260
> [<c016bca2>] ? sched_clock_local+0xd2/0x170
> [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
> [<c017d5ea>] lock_acquire+0xba/0xd0
> [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
> [<c0527191>] down_read+0x51/0x90
> [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0
> [<c025e73b>] dquot_claim_space+0x3b/0x1b0
> [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380
> [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530
> [<c02c601d>] ? ext4_ext_find_extent+0x25d/0x280
> [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0
> [<c016bca2>] ? sched_clock_local+0xd2/0x170
> [<c016be60>] ? sched_clock_cpu+0x120/0x160
> [<c016beef>] ? cpu_clock+0x4f/0x60
> [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
> [<c052712c>] ? down_write+0x8c/0xa0
> [<c02a5966>] ext4_get_blocks+0x226/0x450
> [<c016be60>] ? sched_clock_cpu+0x120/0x160
> [<c016beef>] ? cpu_clock+0x4f/0x60
> [<c017908b>] ? trace_hardirqs_off+0xb/0x10
> [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0
> [<c01d69cc>] ? find_get_pages_tag+0x16c/0x180
> [<c01d6860>] ? find_get_pages_tag+0x0/0x180
> [<c02a73bd>] ? __mpage_da_writepage+0x16d/0x1a0
> [<c01dfc4e>] ? pagevec_lookup_tag+0x2e/0x40
> [<c01ddf1b>] ? write_cache_pages+0xdb/0x3d0
> [<c02a7250>] ? __mpage_da_writepage+0x0/0x1a0
> [<c02a6ed6>] ext4_da_writepages+0x506/0x790
> [<c016beef>] ? cpu_clock+0x4f/0x60
> [<c016bca2>] ? sched_clock_local+0xd2/0x170
> [<c016be60>] ? sched_clock_cpu+0x120/0x160
> [<c016be60>] ? sched_clock_cpu+0x120/0x160
> [<c02a69d0>] ? ext4_da_writepages+0x0/0x790
> [<c01de272>] do_writepages+0x22/0x50
> [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80
> [<c01d7b9b>] filemap_flush+0x2b/0x30
> [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60
> [<c029e595>] ext4_release_file+0x75/0xb0
> [<c0216b59>] __fput+0xf9/0x210
> [<c0216c97>] fput+0x27/0x30
> [<c02122dc>] filp_close+0x4c/0x80
> [<c014510e>] put_files_struct+0x6e/0xd0
> [<c01451b7>] exit_files+0x47/0x60
> [<c0146a24>] do_exit+0x144/0x710
> [<c017b163>] ? lock_release_holdtime+0x33/0x210
> [<c0528137>] ? _spin_unlock_irq+0x27/0x30
> [<c0147028>] do_group_exit+0x38/0xa0
> [<c017babb>] ? trace_hardirqs_on+0xb/0x10
> [<c0159abc>] get_signal_to_deliver+0x2ac/0x410
> [<c0102849>] do_notify_resume+0xb9/0x890
> [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0
> [<c017b163>] ? lock_release_holdtime+0x33/0x210
> [<c0165b50>] ? autoremove_wake_function+0x0/0x50
> [<c017ba54>] ? trace_hardirqs_on_caller+0x134/0x190
> [<c017babb>] ? trace_hardirqs_on+0xb/0x10
> [<c0300ba4>] ? security_file_permission+0x14/0x20
> [<c0215761>] ? vfs_write+0x131/0x190
> [<c0214f50>] ? do_sync_write+0x0/0x120
> [<c0103115>] ? sysenter_do_call+0x27/0x32
> [<c01032d2>] work_notifysig+0x13/0x21
>
> CC: Theodore Ts'o <tytso(a)mit.edu>
> Signed-off-by: Dmitry Monakhov <dmonakhov(a)openvz.org>
> Signed-off-by: Jan Kara <jack(a)suse.cz>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)suse.de>
> ---
> fs/ext4/inode.c | 9 +++++++--
> fs/ext4/mballoc.c | 6 ------
> 2 files changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 374d38c..9fc624d 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1088,7 +1088,7 @@ static int ext4_calc_metadata_amount(struct inode *inode, int blocks)
> static void ext4_da_update_reserve_space(struct inode *inode, int used)
> {
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> - int total, mdb, mdb_free;
> + int total, mdb, mdb_free, mdb_claim = 0;
>
> spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
> /* recalculate the number of metablocks still need to be reserved */
> @@ -1101,7 +1101,9 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
>
> if (mdb_free) {
> /* Account for allocated meta_blocks */
> - mdb_free -= EXT4_I(inode)->i_allocated_meta_blocks;
> + mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks;
> + BUG_ON(mdb_free < mdb_claim);
> + mdb_free -= mdb_claim;
>
> /* update fs dirty blocks counter */
> percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
> @@ -1112,8 +1114,11 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used)
> /* update per-inode reservations */
> BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks);
> EXT4_I(inode)->i_reserved_data_blocks -= used;
> + percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim);
> spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>
> + vfs_dq_claim_block(inode, used + mdb_claim);
> +
> /*
> * free those over-booking quota for metadata blocks
> */
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 7d71148..82b9778 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2755,12 +2755,6 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
> if (!(ac->ac_flags & EXT4_MB_DELALLOC_RESERVED))
> /* release all the reserved blocks if non delalloc */
> percpu_counter_sub(&sbi->s_dirtyblocks_counter, reserv_blks);
> - else {
> - percpu_counter_sub(&sbi->s_dirtyblocks_counter,
> - ac->ac_b_ex.fe_len);
> - /* convert reserved quota blocks to real quota blocks */
> - vfs_dq_claim_block(ac->ac_inode, ac->ac_b_ex.fe_len);
> - }
>
> if (sbi->s_log_groups_per_flex) {
> ext4_group_t flex_group = ext4_flex_group(sbi,
> --
> 1.6.6
>
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Greg KH on
On Tue, Jan 05, 2010 at 01:47:42AM +0100, Jan Kara wrote:
> On Mon 04-01-10 16:33:36, Greg Kroah-Hartman wrote:
> > From: Dmitry Monakhov <dmonakhov(a)openvz.org>
> >
> > commit d21cd8f163ac44b15c465aab7306db931c606908 upstream.
> Ted has found a serious bug in this patch over Christmas and fixed it.
> Ted, do you prefer to just discard this patch or will you add your fixes?

Do you have the git commit id of the patch that fixes it?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on
On Tue 05-01-10 10:56:55, Greg KH wrote:
> On Tue, Jan 05, 2010 at 01:47:42AM +0100, Jan Kara wrote:
> > On Mon 04-01-10 16:33:36, Greg Kroah-Hartman wrote:
> > > From: Dmitry Monakhov <dmonakhov(a)openvz.org>
> > >
> > > commit d21cd8f163ac44b15c465aab7306db931c606908 upstream.
> > Ted has found a serious bug in this patch over Christmas and fixed it.
> > Ted, do you prefer to just discard this patch or will you add your fixes?
> Do you have the git commit id of the patch that fixes it?
Commits fixing the behavior are:
0637c6f4135f592f094207c7c21e7c0fc5557834
ee5f4d9cdf32fd99172d11665c592a288c2b1ff4
9d0be50230b333005635967f7ecd4897dbfd181b

Honza
--
Jan Kara <jack(a)suse.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Greg KH on
On Tue, Jan 05, 2010 at 10:39:27PM +0100, Jan Kara wrote:
> On Tue 05-01-10 10:56:55, Greg KH wrote:
> > On Tue, Jan 05, 2010 at 01:47:42AM +0100, Jan Kara wrote:
> > > On Mon 04-01-10 16:33:36, Greg Kroah-Hartman wrote:
> > > > From: Dmitry Monakhov <dmonakhov(a)openvz.org>
> > > >
> > > > commit d21cd8f163ac44b15c465aab7306db931c606908 upstream.
> > > Ted has found a serious bug in this patch over Christmas and fixed it.
> > > Ted, do you prefer to just discard this patch or will you add your fixes?
> > Do you have the git commit id of the patch that fixes it?
> Commits fixing the behavior are:
> 0637c6f4135f592f094207c7c21e7c0fc5557834
> ee5f4d9cdf32fd99172d11665c592a288c2b1ff4
> 9d0be50230b333005635967f7ecd4897dbfd181b

Hm, should I add these as well, or back out all of the ext4 quota
patches from .32 and .31?

Ted?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: tytso on
On Tue, Jan 05, 2010 at 03:16:27PM -0800, Greg KH wrote:
> On Tue, Jan 05, 2010 at 10:39:27PM +0100, Jan Kara wrote:
> > On Tue 05-01-10 10:56:55, Greg KH wrote:
> > > On Tue, Jan 05, 2010 at 01:47:42AM +0100, Jan Kara wrote:
> > > > On Mon 04-01-10 16:33:36, Greg Kroah-Hartman wrote:
> > > > > From: Dmitry Monakhov <dmonakhov(a)openvz.org>
> > > > >
> > > > > commit d21cd8f163ac44b15c465aab7306db931c606908 upstream.
> > > > Ted has found a serious bug in this patch over Christmas and fixed it.
> > > > Ted, do you prefer to just discard this patch or will you add your fixes?
> > > Do you have the git commit id of the patch that fixes it?
> > Commits fixing the behavior are:
> > 0637c6f4135f592f094207c7c21e7c0fc5557834
> > ee5f4d9cdf32fd99172d11665c592a288c2b1ff4
> > 9d0be50230b333005635967f7ecd4897dbfd181b
>
> Hm, should I add these as well, or back out all of the ext4 quota
> patches from .32 and .31?

I don't think you need to back out all of the ext4 quota patches; Jan
has looked at the more carefully than I (since they went in via the
quota tree instead of the ext4 tree) but as far as I know the d21cd8f1
is a standalone fix and the other patches shouldn't break if you just
drop patch #83. Adding the above mentioned patches is definitely
necessary if you include Patch #83, though, since otherwise people
mounting uncoverted ext3 filesystems using ext4 will get the BUG in
fs/ext4/inode.c:1063 regression.

As far as whether it would be better to drop patch #83 or to include
these three commits, it's a bit of a toss up. If you want to play it
safe, I'd say let's drop it for now and wait for week or so, so we can
make sure the people who are testing -rc3 are happy, and for the next
stable release we can pick it up plus the additional commits listed
above.

Regards,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/