From: James Cloos on
I forgot to say:

# CONFIG_SMP is not set
CONFIG_TREE_PREEMPT_RCU=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_PREEMPT_TRACER is not set

-JimC
--
James Cloos <cloos(a)jhcloos.com> OpenPGP: 1024D/ED7DAEA6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Sunday 14 February 2010, James Cloos wrote:
> Sometime between rc6 and rc7 all filesystem I/O started using 100% CPU,
> usually on the order of 60% sys, 40% user.
>
> I've tried this with each of ext4, jfs and btrfs filesystems. All show
> the same issue.
>
> Using dd(1) to read from the block specials directly works as well and
> as fast as it always has; only reading or writing to mounted filesystems
> is affected.
>
> Box is 32-bit x86, PentiumIII-M; drives are ide using libata.
>
> If the btrfs fs is mounted, the slowdown is enought to trigger the
> hung_task call trace (120s) on the btrfs-transac process.
>
> But the regression is just as apparent when only jfs and ext4 are mounted.
>
> The only filesystems I've found which avoid the regression are tmpfs and
> devtmpfs.
>
> I didn't have time to write up a report when I noticed this in rc7 but
> had to boot back into rc6 for work.
>
> Some of the commits since rc7 looked like they might have addressed this
> regression, but it persists in rc8.

I have created the bug entry at http://bugzilla.kernel.org/show_bug.cgi?id=15306
for your report.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Chris Mason on
On Sat, Feb 13, 2010 at 10:11:07PM -0500, James Cloos wrote:
> Sometime between rc6 and rc7 all filesystem I/O started using 100% CPU,
> usually on the order of 60% sys, 40% user.
>
> I've tried this with each of ext4, jfs and btrfs filesystems. All show
> the same issue.
>
> Using dd(1) to read from the block specials directly works as well and
> as fast as it always has; only reading or writing to mounted filesystems
> is affected.
>
> Box is 32-bit x86, PentiumIII-M; drives are ide using libata.
>
> If the btrfs fs is mounted, the slowdown is enought to trigger the
> hung_task call trace (120s) on the btrfs-transac process.
>
> But the regression is just as apparent when only jfs and ext4 are mounted.
>
> The only filesystems I've found which avoid the regression are tmpfs and
> devtmpfs.

Could you please hit sysrq-l a few times during the high CPU usage and
send us the results? Also, please send along the full dd command line
you are using and how much ram your system has.

(Even better would be running perf record <your dd command line>, if
you're willing to setup perf)

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Cloos on
>>>>> "Chris" == Chris Mason <chris.mason(a)oracle.com> writes:

>> Sometime between rc6 and rc7 all filesystem I/O started using 100% CPU,
>> usually on the order of 60% sys, 40% user.

Chris> (Even better would be running perf record <your dd command line>, if
Chris> you're willing to setup perf)

The box has 512MB (the chipset's max). (I did mention p3m, therefore OLD! :)

:; mount
/dev/root on / type jfs (rw,noatime)
/dev/sda3 on /var type ext4 (rw,noatime,barrier=1,data=ordered)
/dev/sdb2 on /media/bay type btrfs (rw,noatime,noacl)

: I only vaguely remember installing perf....
:; perf version
perf version 0.0.1.PERF

:; perf record dd if=/dev/sda4 of=/dev/null bs=4M count=16
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 14.7279 s, 4.6 MB/s
[ perf record: Captured and wrote 0.803 MB perf1.data (~35093 samples) ]

:; perf record dd if=/dev/sdb2 of=/dev/null bs=4M count=16
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 7.31642 s, 9.2 MB/s
[ perf record: Captured and wrote 0.938 MB perf2.data (~40980 samples) ]

:; perf record dd if=/dev/sdc2 of=/dev/null bs=4M count=16
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 60.9204 s, 1.1 MB/s
[ perf record: Captured and wrote 3.309 MB perf3.data (~144590 samples) ]

(I then mounted:
/dev/sdc2 on /media/usb type ext4 (rw,relatime,barrier=1,data=ordered)
)

:; perf record dd if=/dev/zero of=ZERO bs=4M count=16 # sda4
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 10.0027 s, 6.7 MB/s
[ perf record: Captured and wrote 1.634 MB perf4.data (~71386 samples) ]

:; perf record dd if=/dev/zero of=/var/tmp/ZERO bs=4M count=16 # sda3
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 1.712 s, 39.2 MB/s
[ perf record: Captured and wrote 0.364 MB perf5.data (~15915 samples) ]

:; perf record dd if=/dev/zero of=/media/bay/ZERO bs=4M count=16
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 4.51859 s, 14.9 MB/s
[ perf record: Captured and wrote 0.611 MB perf6.data (~26675 samples) ]

:; perf record dd if=/dev/zero of=/media/usb/ZERO bs=4M count=16
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 8.49166 s, 7.9 MB/s
[ perf record: Captured and wrote 1.523 MB perf.data (~66556 samples) ]

:; for ij in /media/usb/ZERO /media/bay/ZERO /var/tmp/ZERO /root/ZERO;do perf record dd of=/dev/null if=${ij} bs=4M count=16; mv -v perf.data perf$[kl++].data; done
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 49.1844 s, 1.4 MB/s
[ perf record: Captured and wrote 0.569 MB perf.data (~24858 samples) ]
`perf.data' -> `perf8.data'
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 10.0016 s, 6.7 MB/s
[ perf record: Captured and wrote 0.948 MB perf.data (~41438 samples) ]
`perf.data' -> `perf9.data'
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 5.72855 s, 11.7 MB/s
[ perf record: Captured and wrote 0.580 MB perf.data (~25355 samples) ]
`perf.data' -> `perf10.data'
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 7.32789 s, 9.2 MB/s
[ perf record: Captured and wrote 0.899 MB perf.data (~39278 samples) ]
`perf.data' -> `perf11.data'

From: James Cloos on
While getting re-aquainted with perf, and after upgrading to the current
version, I see that do_page_fault, unmap_vmas, get_page_from_freelist,
handle_mm_fault, __do_fault and T.1339 tend to get the most time when the
kernel is spinning.

T.1339 is in this part of kallsyms:

,----< grep -E -C9 ^c1027750 /proc/kallsyms >
| c10266a0 T sched_setscheduler_nocheck
| c10266c0 T sched_setscheduler
| c10266e0 t do_sched_setscheduler
| c1026770 T sys_sched_setparam
| c1026790 T sys_sched_setscheduler
| c10267c0 T sched_fork
| c1026900 T task_oncpu_function_call
| c1026960 T print_cfs_rq
| c1027640 T scheduler_tick
| c1027750 t T.1339
| c10277f0 T schedule_tail
| c1027830 t ftrace_raw_output_sched_kthread_stop
| c1027930 t ftrace_raw_output_sched_kthread_stop_ret
| c1027a30 t ftrace_raw_output_sched_wait_task
| c1027b40 t ftrace_raw_output_sched_switch
| c1027c90 t ftrace_raw_output_sched_migrate_task
| c1027da0 t ftrace_raw_output_sched_process_wait
| c1027eb0 t ftrace_raw_output_sched_process_fork
| c1027fc0 t ftrace_raw_output_sched_stat_runtime
`----

suggesting that T.1339 is part of sched.

This leaves the impression that the issue is related to either
scheduling or paging.

I don't see any commits in the rc6..rc7 timeframe which stand out
as likely suspects.

-JimC
--
James Cloos <cloos(a)jhcloos.com> OpenPGP: 1024D/ED7DAEA6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/