From: Tejun Heo on 29 Jun 2010 14:20 Hello, Arjan. On 06/29/2010 08:07 PM, Arjan van de Ven wrote: > we might be talking past eachother. ;-) > > let me define an example that is simple so that we can get on the same page > > assume a system with "enough" cpus, say 32. > lets say we have 2 async tasks, that each do an mdelay(1000); (yes I > know stupid, but exagerating things makes things easier to talk about) That's the main point to discuss tho. If you exaggerate the use case out of proportion, you'll end up with something which in the end is useful only in the imagination and we'll be doing things just because we can. Creating full number of unbound threads might look like a good idea to extract maximum cpu parallelism if you exaggerate the use case like the above but with the current actual use case, it's not gonna buy us anything and might even cost us more via unnecessary thread creations. So, let's talk about whether it's _actually_ useful for the current use cases. If so, sure, let's do it that way. If not, there is no reason to go there, right? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Arjan van de Ven on 29 Jun 2010 14:30 On 6/29/2010 11:15 AM, Tejun Heo wrote: > Hello, Arjan. > > On 06/29/2010 08:07 PM, Arjan van de Ven wrote: > >> we might be talking past eachother. ;-) >> >> let me define an example that is simple so that we can get on the same page >> >> assume a system with "enough" cpus, say 32. >> lets say we have 2 async tasks, that each do an mdelay(1000); (yes I >> know stupid, but exagerating things makes things easier to talk about) >> > That's the main point to discuss tho. If you exaggerate the use case > out of proportion, you'll end up with something which in the end is > useful only in the imagination and we'll be doing things just because > we can. Creating full number of unbound threads might look like a > good idea to extract maximum cpu parallelism if you exaggerate the use > case like the above but with the current actual use case, it's not > gonna buy us anything and might even cost us more via unnecessary > thread creations. > I'm not trying to suggest "unbound". I'm trying to suggest "don't start bounding until you hit # threads >= # cpus you have some clever tricks to deal with bounding things; but lets make sure that the simple case of having less work to run in parallel than the number of cpus gets dealt with simple and unbound. You also consolidate the thread pools so that you have one global pool, so unlike the current situation where you get O(Nr pools * Nr cpus), you only get O(Nr cpus) number of threads... that's not too burdensome imo. If you want to go below that then I think you're going too far in reducing the number of threads in your pool. Really. so... back to my question; will those two tasks run in parallel or sequential ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Tejun Heo on 29 Jun 2010 14:40 Hello, On 06/29/2010 08:22 PM, Arjan van de Ven wrote: > I'm not trying to suggest "unbound". I'm trying to suggest "don't > start bounding until you hit # threads >= # cpus you have some > clever tricks to deal with bounding things; but lets make sure that > the simple case of having less work to run in parallel than the > number of cpus gets dealt with simple and unbound. Well, the thing is, for most cases, binding to cpus is simply better. That's the reason why our default workqueue was per-cpu to begin with. There just are a lot more opportunities for optimization for both memory access and synchronization overheads. > You also consolidate the thread pools so that you have one global > pool, so unlike the current situation where you get O(Nr pools * Nr > cpus), you only get O(Nr cpus) number of threads... that's not too > burdensome imo. If you want to go below that then I think you're > going too far in reducing the number of threads in your > pool. Really. I lost you in the above paragraph, but I think it would be better to keep kthread pools separate. It behaves much better regarding memory access locality (work issuer and worker are on the same cpu and stack and other memory used by worker are likely to be already hot). Also, we don't do it yet, but when creating kthreads we can allocate the stack considering NUMA too. > so... back to my question; will those two tasks run in parallel or > sequential ? If they are scheduled on the same cpu, they won't. If that's something actually necessary, let's implement it. I have no problem with that. cmwq already can serve as simple execution context provider without concurrency control and pumping contexts to async isn't hard at all. I just wanna know whether it's something which is actually useful. So, where would that be useful? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Arjan van de Ven on 29 Jun 2010 14:50 On 6/29/2010 11:34 AM, Tejun Heo wrote: > Hello, > > On 06/29/2010 08:22 PM, Arjan van de Ven wrote: > >> I'm not trying to suggest "unbound". I'm trying to suggest "don't >> start bounding until you hit # threads>= # cpus you have some >> clever tricks to deal with bounding things; but lets make sure that >> the simple case of having less work to run in parallel than the >> number of cpus gets dealt with simple and unbound. >> > Well, the thing is, for most cases, binding to cpus is simply better. > depends on the user. For "throw over the wall" work, this is unclear. Especially in the light of hyperthreading (sharing L1 cache) or even modern cpus (where many cores share a fast L3 cache). I'm fine with a solution that has the caller say 'run anywhere' vs 'try to run local'. I suspect there will be many many cases of 'run anywhere'.isn't hard at all. I just wanna know whether it's something which is > actually useful. So, where would that be useful? > I think it's useful for all users of your worker pool, not (just) async. it's a severe limitation of the current linux infrastructure, and your infrastructure has the chance to fix this... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Tejun Heo on 29 Jun 2010 15:10 Hello, On 06/29/2010 08:41 PM, Arjan van de Ven wrote: >> Well, the thing is, for most cases, binding to cpus is simply better. > > depends on the user. Heh, yeah, sure, can't disagree with that. :-) > For "throw over the wall" work, this is unclear. Especially in the > light of hyperthreading (sharing L1 cache) or even modern cpus > (where many cores share a fast L3 cache). There will be many variants of memory configurations and the only way the generic kernel can optimize memory access is if it localizes stuff per cpu which is visible to the operating system. That's the lowest common denominator. From there, we sure can add considerations for specific shared configurations but I don't think that will be too common outside of scheduler and maybe memory allocator. It just becomes too specific to apply to generic kernel core code. > I'm fine with a solution that has the caller say 'run anywhere' vs > 'try to run local'. I suspect there will be many many cases of 'run > anywhere'.isn't hard at all. I just wanna know whether it's > something which is Yeah, sure. I can almost view the code in my head right now. If I'm not completely mistaken, it should be really easy. When a cpu goes down, all the left works are already executed unbound, so all the necessary components are already there. The thing is that once it's not bound to a cpu, where, how and when a worker runs is best regulated by the scheduler. That's why I kept talking about wq being simple context provider. If something is not CPU intensive, CPU parallelism doesn't buy much, so works which would benefit from parallel execution are likely to be CPU intensive ones. For CPU intensive tasks, fairness, priority and all that stuff are pretty important and that's scheduler's job. cmwq can provide contexts and put some safety limitations but most are best left to the scheduler. >> actually useful. So, where would that be useful? > > I think it's useful for all users of your worker pool, not (just) async. > > it's a severe limitation of the current linux infrastructure, and your > infrastructure has the chance to fix this... Yeah, there could be situations where having a generic context provider can be useful. I'm just not sure async falls in that category. For the current users, I think we would be (marginally) better off with bound workers. So, that's the reluctance I have about updating async conversion. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: workqueue: update cwq alignement Next: [PATCH] VMware balloon: force compiling as a module |