From: Jan Kara on
Hi,

> In running iozone for writes to small files, we noticed a pretty big
> discrepency between the performance of the deadline and cfq I/O
> schedulers. Investigation showed that I/O was being issued from 2
> different contexts: the iozone process itself, and the jbd2/sdh-8 thread
> (as expected). Because of the way cfq performs slice idling, the delays
> introduced between the metadata and data I/Os were significant. For
> example, cfq would see about 7MB/s versus deadline's 35 for the same
> workload. I also tested fs_mark with writing and fsyncing 1000 64k
> files, and a similar 5x performance difference was observed. Eric
> Sandeen suggested that I flag the journal writes as metadata, and once I
> did that, the performance difference went away completely (cfq has
> special logic to prioritize metadata I/O).
>
> So, I'm submitting this patch for comments and testing. I have a
> similar patch for jbd that I will submit if folks agree that this is a
> good idea.
This looks like a good idea to me. I'd just be careful about data=journal
mode where even data is written via journal and thus you'd incorrectly
prioritize all the IO. I suppose that could have negative impact on performace
of other filesystems on the same disk. So for data=journal mode, I'd leave
write_op to be just WRITE / WRITE_SYNC_PLUG.

Honza
--
Jan Kara <jack(a)suse.cz>
SuSE CR Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on
On Thu, Apr 01 2010, Jeff Moyer wrote:
> Hi,
>
> In running iozone for writes to small files, we noticed a pretty big
> discrepency between the performance of the deadline and cfq I/O
> schedulers. Investigation showed that I/O was being issued from 2
> different contexts: the iozone process itself, and the jbd2/sdh-8 thread
> (as expected). Because of the way cfq performs slice idling, the delays
> introduced between the metadata and data I/Os were significant. For
> example, cfq would see about 7MB/s versus deadline's 35 for the same
> workload. I also tested fs_mark with writing and fsyncing 1000 64k
> files, and a similar 5x performance difference was observed. Eric
> Sandeen suggested that I flag the journal writes as metadata, and once I
> did that, the performance difference went away completely (cfq has
> special logic to prioritize metadata I/O).
>
> So, I'm submitting this patch for comments and testing. I have a
> similar patch for jbd that I will submit if folks agree that this is a
> good idea.

Looks good to me.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jeff Moyer on
Jan Kara <jack(a)suse.cz> writes:

> Hi,
>
>> In running iozone for writes to small files, we noticed a pretty big
>> discrepency between the performance of the deadline and cfq I/O
>> schedulers. Investigation showed that I/O was being issued from 2
>> different contexts: the iozone process itself, and the jbd2/sdh-8 thread
>> (as expected). Because of the way cfq performs slice idling, the delays
>> introduced between the metadata and data I/Os were significant. For
>> example, cfq would see about 7MB/s versus deadline's 35 for the same
>> workload. I also tested fs_mark with writing and fsyncing 1000 64k
>> files, and a similar 5x performance difference was observed. Eric
>> Sandeen suggested that I flag the journal writes as metadata, and once I
>> did that, the performance difference went away completely (cfq has
>> special logic to prioritize metadata I/O).
>>
>> So, I'm submitting this patch for comments and testing. I have a
>> similar patch for jbd that I will submit if folks agree that this is a
>> good idea.
> This looks like a good idea to me. I'd just be careful about data=journal
> mode where even data is written via journal and thus you'd incorrectly
> prioritize all the IO. I suppose that could have negative impact on performace
> of other filesystems on the same disk. So for data=journal mode, I'd leave
> write_op to be just WRITE / WRITE_SYNC_PLUG.

Hi, Jan, thanks for the review! I'm trying to figure out the best way
to relay the journal mode from ext3 or ext4 to jbd or jbd2. Would a new
journal flag, set in journal_init_inode, be appropriate? This wouldn't
cover the case of data journalling set per inode, though. It also puts
some ext3-specific code into the purportedly fs-agnostic jbd code
(specifically, testing the superblock for the data journal mount flag).
Do you have any suggestions?

Thanks!
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: tytso on
On Mon, Apr 05, 2010 at 11:24:13AM -0400, Jeff Moyer wrote:
> Jan Kara <jack(a)suse.cz> writes:
>
> > Hi,
> >
> >> In running iozone for writes to small files, we noticed a pretty big
> >> discrepency between the performance of the deadline and cfq I/O
> >> schedulers. Investigation showed that I/O was being issued from 2
> >> different contexts: the iozone process itself, and the jbd2/sdh-8 thread
> >> (as expected). Because of the way cfq performs slice idling, the delays
> >> introduced between the metadata and data I/Os were significant. For
> >> example, cfq would see about 7MB/s versus deadline's 35 for the same
> >> workload. I also tested fs_mark with writing and fsyncing 1000 64k
> >> files, and a similar 5x performance difference was observed. Eric
> >> Sandeen suggested that I flag the journal writes as metadata, and once I
> >> did that, the performance difference went away completely (cfq has
> >> special logic to prioritize metadata I/O).
> >>
> >> So, I'm submitting this patch for comments and testing. I have a
> >> similar patch for jbd that I will submit if folks agree that this is a
> >> good idea.
> > This looks like a good idea to me. I'd just be careful about data=journal
> > mode where even data is written via journal and thus you'd incorrectly
> > prioritize all the IO. I suppose that could have negative impact on performace
> > of other filesystems on the same disk. So for data=journal mode, I'd leave
> > write_op to be just WRITE / WRITE_SYNC_PLUG.
>
> Hi, Jan, thanks for the review! I'm trying to figure out the best way
> to relay the journal mode from ext3 or ext4 to jbd or jbd2. Would a new
> journal flag, set in journal_init_inode, be appropriate? This wouldn't
> cover the case of data journalling set per inode, though. It also puts
> some ext3-specific code into the purportedly fs-agnostic jbd code
> (specifically, testing the superblock for the data journal mount flag).
> Do you have any suggestions?

I don't think it's necessary to worry about journal=data mode. First
of all, it's not true that all of the I/O would be prioritized as
metadata. In data=journal mode, data blocks are written twice; once
to the journal, and once to the final location on disk. And the
journal writes do need to be prioritized because the commit can't go
out until all of the preceeding journal blocks have been written. So
treating all of the journal writes as metadata for the the purposes of
cfq's prioritization makes sense to me....

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: tytso on
On Thu, Apr 01, 2010 at 03:04:54PM -0400, Jeff Moyer wrote:
>
> So, I'm submitting this patch for comments and testing. I have a
> similar patch for jbd that I will submit if folks agree that this is a
> good idea.

Added to the ext4 patch queue.

What benchmark were you using to test small file writes? This looks
good to me as well, but we might want to do some extra benchmarking
just to be sure we're not accidentally introducing a performance
regression.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/