[CPUFREQ] fix race condition in store_scaling

Prev: cpuset,mm: fix no node to alloc memory when changing cpuset's mems - fix2
Next: [PATCH] USB: gadget: f_mass_storage: fix in fsg_common_init() error recovery

From: Américo Wang on 12 May 2010 04:10

On Tue, May 11, 2010 at 04:20:41PM +0200, Andrej Gelenberg wrote:
>Wrap store_scaling_governor with mutex lock cpufreq_governor_mutex.
>Fix kernel panic if switch scaling governor very fast.
>Bug in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=15948
>
>Signed-off-by: Andrej Gelenberg <andrej.gelenberg(a)udo.edu>
>---
> drivers/cpufreq/cpufreq.c | 16 +++++++++++-----
> 1 files changed, 11 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>index 75d293e..6ba42f9 100644
>--- a/drivers/cpufreq/cpufreq.c
>+++ b/drivers/cpufreq/cpufreq.c
>@@ -403,8 +403,6 @@ static int cpufreq_parse_governor(char
>*str_governor, unsigned int *policy,
> } else if (cpufreq_driver->target) {
> struct cpufreq_governor *t;
>
>- mutex_lock(&cpufreq_governor_mutex);
>-
> t = __find_governor(str_governor);
>
> if (t == NULL) {
>@@ -429,8 +427,6 @@ static int cpufreq_parse_governor(char
>*str_governor, unsigned int *policy,
> *governor = t;
> err = 0;
> }
>-
>- mutex_unlock(&cpufreq_governor_mutex);
> }
> out:
> return err;
>@@ -521,7 +517,7 @@ static ssize_t show_scaling_governor(struct
>cpufreq_policy *policy, char *buf)
> /**
> * store_scaling_governor - store policy for the specified CPU
> */
>-static ssize_t store_scaling_governor(struct cpufreq_policy *policy,
>+static ssize_t _store_scaling_governor(struct cpufreq_policy *policy,
> const char *buf, size_t count)
> {
> unsigned int ret = -EINVAL;
>@@ -553,6 +549,16 @@ static ssize_t store_scaling_governor(struct
>cpufreq_policy *policy,
> return count;
> }
>
>+static ssize_t store_scaling_governor(struct cpufreq_policy *policy,
>+ const char *buf, size_t count)
>+{
>+ ssize_t ret;
>+ mutex_lock(&cpufreq_governor_mutex);
>+ ret = _store_scaling_governor(policy, buf, count);
>+ mutex_unlock(&cpufreq_governor_mutex);
>+ return ret;
>+}
>+

Sorry, I don't get it, cpufreq_governor_mutex is used to protect
cpufreq_governor_list. What is the point of moving it up?
Can you explain what the race condition is?

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrej Gelenberg on 12 May 2010 05:10

Hi,

i have reported a bug (https://bugzilla.kernel.org/show_bug.cgi?id=15948
). I get a kernel panic with my tool, which switch the scaling governor
to conservative (default is compiled in ondemand) if there no ac online
(i have attached the code to the bug report). In bug report i have
attached the dmesg output before the kernel panic (i get it with kernel
crash dump). Something like this:

....
<4>------------[ cut here ]------------
<4>WARNING: at /home/andrej/kernel/linux/fs/sysfs/dir.c:451
sysfs_add_one+0xab/0xc0()
<4>Hardware name: 287655G
<4>sysfs: cannot create duplicate filename
'/devices/system/cpu/cpu0/cpufreq/ondemand'
<4>Modules linked in:
<4>Pid: 1878, comm: achook Tainted: G W 2.6.34-rc7 #20
<4>Call Trace:
<4> [<ffffffff81054736>] warn_slowpath_common+0x76/0xb0
<4> [<ffffffff810547cc>] warn_slowpath_fmt+0x3c/0x40
<4> [<ffffffff8111242b>] sysfs_add_one+0xab/0xc0
<4> [<ffffffff8111249e>] create_dir+0x5e/0xb0
<4> [<ffffffff81112506>] sysfs_create_subdir+0x16/0x20
<4> [<ffffffff8111387a>] internal_create_group+0x5a/0x190
<4> [<ffffffff811139de>] sysfs_create_group+0xe/0x10
<4> [<ffffffff813c1c95>] cpufreq_governor_dbs+0x75/0x330
<4> [<ffffffff813bf92e>] __cpufreq_governor+0x4e/0xe0
<4> [<ffffffff813c05c0>] ? lock_policy_rwsem_write+0x20/0x40
<4> [<ffffffff813c088c>] __cpufreq_set_policy+0x13c/0x180
<4> [<ffffffff813c0b6a>] store_scaling_governor+0xca/0x200
<4> [<ffffffff813c10b0>] ? handle_update+0x0/0x10
<4> [<ffffffff81526400>] ? do_nanosleep+0x90/0xc0
<4> [<ffffffff813c0722>] store+0x62/0x90
<4> [<ffffffff81110f4d>] sysfs_write_file+0xed/0x170
<4> [<ffffffff810bfbdd>] vfs_write+0xad/0x170
<4> [<ffffffff810bfecc>] sys_write+0x4c/0x80
<4> [<ffffffff81029c49>] ? do_device_not_available+0x9/0x10
<4> [<ffffffff81027c68>] system_call_fastpath+0x16/0x1b
<4>---[ end trace 2ed7331f299577b7 ]---
<4>------------[ cut here ]------------
<4>WARNING: at /home/andrej/kernel/linux/fs/sysfs/dir.c:451
sysfs_add_one+0xab/0xc0()
<4>Hardware name: 287655G
<4>sysfs: cannot create duplicate filename
'/devices/system/cpu/cpu0/cpufreq/conservative'
<4>Modules linked in:
<4>Pid: 1878, comm: achook Tainted: G W 2.6.34-rc7 #20
<4>Call Trace:
<4> [<ffffffff81054736>] warn_slowpath_common+0x76/0xb0
<4> [<ffffffff810547cc>] warn_slowpath_fmt+0x3c/0x40
<4> [<ffffffff8111242b>] sysfs_add_one+0xab/0xc0
<4> [<ffffffff8111249e>] create_dir+0x5e/0xb0
<4> [<ffffffff81112506>] sysfs_create_subdir+0x16/0x20
<4> [<ffffffff8111387a>] internal_create_group+0x5a/0x190
<4> [<ffffffff8104fa74>] ? __cond_resched+0x24/0x40
<4> [<ffffffff811139de>] sysfs_create_group+0xe/0x10
<4> [<ffffffff813c2bf5>] cpufreq_governor_dbs+0x75/0x380
<4> [<ffffffff813bf92e>] __cpufreq_governor+0x4e/0xe0
<4> [<ffffffff813c08c3>] __cpufreq_set_policy+0x173/0x180
<4> [<ffffffff813c0b6a>] store_scaling_governor+0xca/0x200
<4> [<ffffffff813c10b0>] ? handle_update+0x0/0x10
<4> [<ffffffff81526400>] ? do_nanosleep+0x90/0xc0
<4> [<ffffffff813c0722>] store+0x62/0x90
<4> [<ffffffff81110f4d>] sysfs_write_file+0xed/0x170
<4> [<ffffffff810bfbdd>] vfs_write+0xad/0x170
....

So there is a lock needed to avoid this race condition (old staff is not
jet removed and new staff is added). I think it is not a bad idea to
protect policy object in store_scaling_governor (this is a shared
object). That if your remove the new policy after cpufreq_parse_governor
call? Then you will try to set a policy, which is not available any
more, so i think cpufreq_governor_mutex is proper
mutex here.

Regards,
Andrej Gelenberg

On 05/12/2010 10:08 AM, Am�rico Wang wrote:
>
> Sorry, I don't get it, cpufreq_governor_mutex is used to protect
> cpufreq_governor_list. What is the point of moving it up?
> Can you explain what the race condition is?
>
> Thanks!
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrew Morton on 12 May 2010 18:10

On Tue, 11 May 2010 16:20:41 +0200
Andrej Gelenberg <andrej.gelenberg(a)udo.edu> wrote:

> Wrap store_scaling_governor with mutex lock cpufreq_governor_mutex.
> Fix kernel panic if switch scaling governor very fast.
> Bug in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=15948
>

Looks sane, I guess.

I am afraid of moving all those functions inside
cpufreq_governor_mutex. Not for any specific reason, apart from a long
history of nasty deadlocks with cpufreq global locks :(

Has this change been well-tested with lockdep enabled?

>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 75d293e..6ba42f9 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -403,8 +403,6 @@ static int cpufreq_parse_governor(char
> *str_governor, unsigned int *policy,
> } else if (cpufreq_driver->target) {
> struct cpufreq_governor *t;
>
> - mutex_lock(&cpufreq_governor_mutex);
> -
> t = __find_governor(str_governor);
>
> if (t == NULL) {
> @@ -429,8 +427,6 @@ static int cpufreq_parse_governor(char
> *str_governor, unsigned int *policy,
> *governor = t;
> err = 0;
> }
> -
> - mutex_unlock(&cpufreq_governor_mutex);
> }
> out:
> return err;
> @@ -521,7 +517,7 @@ static ssize_t show_scaling_governor(struct
> cpufreq_policy *policy, char *buf)
> /**
> * store_scaling_governor - store policy for the specified CPU
> */
> -static ssize_t store_scaling_governor(struct cpufreq_policy *policy,
> +static ssize_t _store_scaling_governor(struct cpufreq_policy *policy,
> const char *buf, size_t count)
> {
> unsigned int ret = -EINVAL;
> @@ -553,6 +549,16 @@ static ssize_t store_scaling_governor(struct
> cpufreq_policy *policy,
> return count;
> }
>
> +static ssize_t store_scaling_governor(struct cpufreq_policy *policy,
> + const char *buf, size_t count)
> +{
> + ssize_t ret;
> + mutex_lock(&cpufreq_governor_mutex);
> + ret = _store_scaling_governor(policy, buf, count);
> + mutex_unlock(&cpufreq_governor_mutex);
> + return ret;
> +}
> +
> /**
> * show_scaling_driver - show the cpufreq driver currently loaded
> */

Your email client replaces tabs with spaces and is wordwrapping the
text. I fixed that up in my copy of the patch.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Andrej Gelenberg on 12 May 2010 20:00

Hi,

On 05/13/2010 12:00 AM, Andrew Morton wrote:
>
> Looks sane, I guess.
>
> I am afraid of moving all those functions inside
> cpufreq_governor_mutex. Not for any specific reason, apart from a long
> history of nasty deadlocks with cpufreq global locks :(
>
> Has this change been well-tested with lockdep enabled?

It prevent at least the kernel panic and warnings from sysfs,
but cause a deadlock. I can confirm the bug in 2.6.33-ARCH (last stable
kernel in archlinux):

------------[ cut here ]------------
WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xc5/0x150()
Hardware name: 287655G
sysfs: cannot create duplicate filename
'/devices/system/cpu/cpu0/cpufreq/ondemand'
Modules linked in: cpufreq_conservative cpufreq_ondemand powernow_k8
freq_table joydev radeon ttm drm_kms_helper snd_seq_dummy uvcvideo drm
videodev rfkill i2c_algo_bit snd_seq_oss v4l1_compat usb_storage
v4l2_compat_ioctl32 snd_seq_midi_event led_class snd_seq snd_seq_device
nvram snd_hda_codec_conexant snd_hda_intel video snd_pcm_oss
snd_mixer_oss output snd_hda_codec snd_hwdep snd_pcm snd_timer snd
ohci_hcd soundcore shpchp ehci_hcd ac wmi battery sg thermal processor
button snd_page_alloc psmouse i2c_piix4 edac_core pci_hotplug r8169
usbcore mii edac_mce_amd serio_raw i2c_core k8temp evdev pcspkr rtc_cmos
rtc_core rtc_lib ext4 mbcache jbd2 crc16 cryptd aes_x86_64 aes_generic
xts gf128mul dm_crypt dm_mod sd_mod ahci libata scsi_mod
Pid: 3136, comm: test_cpu.sh Tainted: G W 2.6.33-ARCH #1
Call Trace:
[<ffffffff810529f6>] warn_slowpath_common+0x76/0xb0
[<ffffffff81052a8c>] warn_slowpath_fmt+0x3c/0x40
[<ffffffff81187f45>] sysfs_add_one+0xc5/0x150
[<ffffffff81188033>] create_dir+0x63/0xc0
[<ffffffff811880a6>] sysfs_create_subdir+0x16/0x20
[<ffffffff8118950a>] internal_create_group+0x5a/0x190
[<ffffffff8118966e>] sysfs_create_group+0xe/0x10
[<ffffffffa056fcfc>] cpufreq_governor_dbs+0xac/0x3e0 [cpufreq_ondemand]
[<ffffffff810788bd>] ? notifier_call_chain+0x4d/0x70
[<ffffffff81293f25>] __cpufreq_governor+0xf5/0x1e0
[<ffffffff812954ec>] __cpufreq_set_policy+0x13c/0x180
[<ffffffff812958f8>] store_scaling_governor+0xe8/0x220
[<ffffffff81296240>] ? handle_update+0x0/0x10
[<ffffffff811cb7ba>] ? kobject_get+0x1a/0x30
[<ffffffff81295382>] store+0x62/0x90
[<ffffffff81186820>] sysfs_write_file+0xe0/0x160
[<ffffffff81121576>] vfs_write+0xb6/0x190
[<ffffffff8103175d>] ? do_page_fault+0x15d/0x320
[<ffffffff811218ac>] sys_write+0x4c/0x80
[<ffffffff81009f02>] system_call_fastpath+0x16/0x1b
---[ end trace 939cd7811bc2accf ]---

Here is my test script:

#!/bin/sh
for k in {1..4}
do
for i in conservative ondemand performance
do
echo $i > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor &
done
done

for k in {1..4}
do
for i in conservative ondemand performance
do
echo $i > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor &
done
done
echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Américo Wang on 13 May 2010 05:10

On Thu, May 13, 2010 at 01:58:14AM +0200, Andrej Gelenberg wrote:
>Hi,
>
>On 05/13/2010 12:00 AM, Andrew Morton wrote:
>>
>>Looks sane, I guess.
>>
>>I am afraid of moving all those functions inside
>>cpufreq_governor_mutex. Not for any specific reason, apart from a long
>>history of nasty deadlocks with cpufreq global locks :(
>>
>>Has this change been well-tested with lockdep enabled?
>
>It prevent at least the kernel panic and warnings from sysfs,
>but cause a deadlock. I can confirm the bug in 2.6.33-ARCH (last
>stable kernel in archlinux):
>

Well, this is not a panic, it is just a WARNING.

>------------[ cut here ]------------
>WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xc5/0x150()
>Hardware name: 287655G
>sysfs: cannot create duplicate filename
>'/devices/system/cpu/cpu0/cpufreq/ondemand'
>Modules linked in: cpufreq_conservative cpufreq_ondemand powernow_k8
>freq_table joydev radeon ttm drm_kms_helper snd_seq_dummy uvcvideo
>drm videodev rfkill i2c_algo_bit snd_seq_oss v4l1_compat usb_storage
>v4l2_compat_ioctl32 snd_seq_midi_event led_class snd_seq
>snd_seq_device nvram snd_hda_codec_conexant snd_hda_intel video
>snd_pcm_oss snd_mixer_oss output snd_hda_codec snd_hwdep snd_pcm
>snd_timer snd ohci_hcd soundcore shpchp ehci_hcd ac wmi battery sg
>thermal processor button snd_page_alloc psmouse i2c_piix4 edac_core
>pci_hotplug r8169 usbcore mii edac_mce_amd serio_raw i2c_core k8temp
>evdev pcspkr rtc_cmos rtc_core rtc_lib ext4 mbcache jbd2 crc16 cryptd
>aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod sd_mod ahci
>libata scsi_mod
>Pid: 3136, comm: test_cpu.sh Tainted: G W 2.6.33-ARCH #1
>Call Trace:
> [<ffffffff810529f6>] warn_slowpath_common+0x76/0xb0
> [<ffffffff81052a8c>] warn_slowpath_fmt+0x3c/0x40
> [<ffffffff81187f45>] sysfs_add_one+0xc5/0x150
> [<ffffffff81188033>] create_dir+0x63/0xc0
> [<ffffffff811880a6>] sysfs_create_subdir+0x16/0x20
> [<ffffffff8118950a>] internal_create_group+0x5a/0x190
> [<ffffffff8118966e>] sysfs_create_group+0xe/0x10
> [<ffffffffa056fcfc>] cpufreq_governor_dbs+0xac/0x3e0 [cpufreq_ondemand]
> [<ffffffff810788bd>] ? notifier_call_chain+0x4d/0x70
> [<ffffffff81293f25>] __cpufreq_governor+0xf5/0x1e0
> [<ffffffff812954ec>] __cpufreq_set_policy+0x13c/0x180
> [<ffffffff812958f8>] store_scaling_governor+0xe8/0x220
> [<ffffffff81296240>] ? handle_update+0x0/0x10
> [<ffffffff811cb7ba>] ? kobject_get+0x1a/0x30
> [<ffffffff81295382>] store+0x62/0x90
> [<ffffffff81186820>] sysfs_write_file+0xe0/0x160
> [<ffffffff81121576>] vfs_write+0xb6/0x190
> [<ffffffff8103175d>] ? do_page_fault+0x15d/0x320
> [<ffffffff811218ac>] sys_write+0x4c/0x80
> [<ffffffff81009f02>] system_call_fastpath+0x16/0x1b
>---[ end trace 939cd7811bc2accf ]---
>

Hmm, so two processes enter store_scaling_governor() at
the same time, one will enter mutex_lock(&dbs_mutex);
while the other one is blocking, when that one leaves
mutex_unlock(&dbs_mutex), the other one enters.

Yeah, makes sense, but I am still not sure if we could
reuse this cpufreq_governor_mutex...

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2
Prev: cpuset,mm: fix no node to alloc memory when changing cpuset's mems - fix2
Next: [PATCH] USB: gadget: f_mass_storage: fix in fsg_common_init() error recovery

[CPUFREQ] fix race condition in store_scaling_governor