From: Andrea Bastoni on
On 08/03/2010 05:46 AM, Peter Zijlstra wrote:
> On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote:
>> If you want to do G-EDF with limited and different budgets on each CPU
>> (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but
>> for 400 out of 1000 ms on CPU 1), then you are entering the domain of
>> restricted-supply scheduling, which is significantly more complicated
>> to analyze (see [1,2]).
>
> Would making the thing homogenious by assuming the worst for all cpus
> make the analysis easier? That is, in the above example, only allow the
> G-EDF scheduler to run for 100 out of 1000 ms on both cpus.

Both [1,2] (and also [4]) assumes a more general model than the one based on the worst for all
CPUs, therefore the analysis (based on these papers) will likely be more pessimistic, but not
necessarily easier.

- Andrea

[4] E. Bini, M. Bertogna, S. Baruah, Virtual Multiprocessor Platforms: Specification and Use. In
Proceedings of the 2009 30th IEEE Real-Time Systems Symposium, 437-446, 2009.

--
Andrea Bastoni
Visiting Ph.D. Student
Dept. of Computer Science
University of North Carolina at Chapel Hill
http://www.sprg.uniroma2.it/home/bastoni/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Bjoern Brandenburg on

On Aug 3, 2010, at 5:46 AM, Peter Zijlstra <peterz(a)infradead.org> wrote:

> On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote:
>> If you want to do G-EDF with limited and different budgets on each CPU
>> (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but
>> for 400 out of 1000 ms on CPU 1), then you are entering the domain of
>> restricted-supply scheduling, which is significantly more complicated
>> to analyze (see [1,2]).
>
> Would making the thing homogenious by assuming the worst for all cpus
> make the analysis easier? That is, in the above example, only allow the
> G-EDF scheduler to run for 100 out of 1000 ms on both cpus.

It would, but that severely limits your non-G-EDF tasks. What if you want to provision a P-EDF task with a period of ten milliseconds? If you allow a G-EDF slice of guaranteed 100 ms, then your P-EDF task might miss a deadline if it arrives at the wrong time. If you let it preempt the G-EDF slice, then you get "out of sync" and the analysis can become tricky. If you scale down the G-EDF time slice length to less than ten ms, then overheads increase needlessly on all processors. EDF-HSB addresses specifically this problem (for the case where G-EDF is only used for soft tasks):

B. Brandenburg and J. Anderson, “Integrating Hard/Soft Real-Time Tasks and Best-Effort Jobs on Multiprocessors”, Proceedings of the 19th Euromicro Conference on Real-Time Systems (ECRTS 2007), pp. 61-70. IEEE, July 2007. http://www.cs.unc.edu/~anderson/papers/ecrts07b.pdf

- Björn--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Bjoern Brandenburg on
On Aug 3, 2010, at 5:41 AM, Peter Zijlstra <peterz(a)infradead.org> wrote:

> On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote:
>>
>> If you want to do G-EDF with limited and different budgets on each CPU
>> (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but
>> for 400 out of 1000 ms on CPU 1), then you are entering the domain of
>> restricted-supply scheduling, which is significantly more complicated
>> to analyze (see [1,2]).
>
> Without having looked at the refs, won't the soft case still have
> bounded tardiness? Since the boundedness property mostly depends on
> u<=1, that is, as long as we can always run everything within the
> available time we won't start drifting.

No, not in all cases. Suppose you have a soft task with a utilization of 0.2 and two G-EDF reservations of utilization 0.15 each (the rest of the CPU time is allocated to partitioned hard tasks), reservation periods and task period are equal, and no CPU is overutilized. If the reservations become available at different times during each period, then the soft task has bounded tardiness, but if they become available in parallel, then tardiness can grow unboundedly because it can only execute on one CPU at a time.

Hennadiy Leontvey has analyzed this in the bounded-tardiness context extensively and provides some counter examples in his dissertation. For hard real-time, the MPR papers that Andrea pointed out are the state of the art afaik. (I stand by my opinion that global hard real-time is less pressing for the Linux kernel.)

>
>> As far as I know there is no exiting analysis for "almost G-EDF",
>> i.e., the case where each task may only migrate among a subset of the
>> processors (= affinity masks), except for the special case of
>> clustered EDF (C-EDF), wherein the subsets of processors are
>> non-overlapping.
>
> Right, affinity masks are a pain, hence I proposed to limit that to
> either 1 cpu (yielding fully paritioned) or the full cluster.

Yes, I agree.

> That will leave us with only having to stack a partitioned and global
> scheduler on top of each other, and per the previous point, I think that
> ought to work out trivially for soft, hard otoh will get 'interesting'.

As I mentioned in my other reply, this is exactly what EDF-HSB (http://www.cs.unc.edu/~anderson/papers/ecrts07b.pdf) does.

- Björn

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Bjoern Brandenburg on
On Aug 2, 2010, at 3:34 PM, Peter Zijlstra <peterz(a)infradead.org> wrote:

> On Sun, 2010-07-11 at 09:32 +0200, Bjoern Brandenburg wrote:
>
>> Trying to infer whether a task is "hard" or "soft" from task
>> parameters is not a good idea, IMO. It's much better to make this an
>> explicit part of the task model that is configured via sched_setparam.
>> By default, tasks should be marked "soft" (this leaves more wiggle
>> room to the kernel); users who care can change the flag to "hard".
>
> I think we're in violent agreement here ;-) and I was convinced that was
> what we were talking about. The question was only how to represent that
> in the sched_param_ex structure, the options were:
>
> struct sched_param_ex params;
>
> params.flags |= SF_SOFT;
> sched_setscheduler_ex( .policy = SCHED_DEADLINE, .param = &params);
>
> vs
>
> sched_setscheduler_ex( .policy = SCHED_DEADLINE_{SOFT,HARD},
> .param = &params);

From my point of view I don't really see a difference—the first approach probably lends itself better to adding additional variants but makes it harder to tweak a particular feature once it's used in userspace.

>
>> Taking a step back, I think the problem here is that we are trying to
>> shove too many concepts and properties into a single scheduler. Hard
>> (no tardiness) is different from soft (bounded tardiness) is different
>> from global is different from partitioned.
>>
>> From my point of view, it makes no sense to support hard deadlines
>> under G-EDF (this is backed up by our schedulability studies [1]).
>> Hard deadlines are best served by a P-EDF implementation (that only
>> migrates on task creation/admission).
>>
> The problem is more that we need to support things like cpu affinity and
> cpusets within the context of a 'global' scheduler.
>
> Using cpusets we can partition the 'load-balancer' and create clusters
> (root-domains in linux scheduler speak).
>
> Using cpu affinity we can limit tasks to a subset of their cluster's
> cpus.

Neither makes much sense in the context of a truly global scheduler. Creating non-overlapping clusters makes sense to create clustered schedulers.

>
> Esp. the latter is very hard to do, and I think we can get away with
> only allowing a single cpu or the full cluster (its a new policy, so
> there is no existing userspace to break).

It is probably a good idea to keep it as limited as possible in the beginning. There is no supporting theory afaik for overlapping clusters or non-uniform CPU affinities (i.e., nobody's looked at a case in which CPU affinities can vary from task to task).

>
> This ends up meaning we need to support both P-EDF and G-EDF for soft,
> and since we want to re-use pretty much all the code and only have a
> different admission test for hard (initially), it would end up also
> being P/G-EDF for hard

Couldn't you simply reject tasks that call setscheduler() with a combination of 'hard' and a CPU affinity mask with more than one CPU allowed?

> (even though as you rightly point out, hard G-EDF
> is pretty pointless -- but since the policy doesn't promise EDF, we
> could later improve it to be PD^2 or whatever, at which point global
> hard does start to make sense).
>
> (which I guess would suggest we use different policies instead of a
> flag, since that would make most sense if we end up replacing the hard
> part with another policy)

I don't think the implemented policy is just a hidden implementation detail that can be transparently changed in a future kernel version. Sure, other sporadic schedulers exist, but a lot of things in real-time systems depend on the scheduling policy (e.g., retry bounds for lock-free data structures, locking protocols, overhead accounting, etc.) so that changing it without revalidating/testing everything is not really an option. (At least not for applications that could be considered 'almost hard real-time').

>
> So what I want to have is a sporadic task scheduler, not an EDF
> scheduler (hence also the request to s/SCHED_EDF/SCHED_DEADLINE/ --
> avoiding the obvious SCHED_SPORADIC in order to avoid confusion with the
> POSIX thing).
>
> EDF is just the easiest of the many different ways to schedule a
> sporadic task set.

In theory I fully agree, but I strongly suspect that applications are going to assume a specific implementation anyway in practice. Changing the implemented scheduling policy for sporadic tasks would essentially force all existing embedded apps that make use of this API back into testing, which likely means that newer kernel versions could not be adopted for older projects (which may be the case anyway). In light of this, it may not be unreasonable to promise a specific policy.

However, a generic SCHED_DEADLINE does not preclude the option of supporting specific policies later, so this might not be a big deal in the end.
>


- Björn--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Bjoern Brandenburg on
On Jul 12, 2010, at 6:21 AM, Harald Gustafsson <hgu1972(a)gmail.com> wrote:

> 2010/7/11 Bjoern Brandenburg <bbb(a)email.unc.edu>:
>> Trying to infer whether a task is "hard" or "soft" from task parameters is not a good idea, IMO. It's much better to make this an explicit part of the task model that is configured via sched_setparam. By default, tasks should be marked "soft" (this leaves more wiggle room to the kernel); users who care can change the flag to "hard".
>
> Yes, this was along the lines of my suggestions also.
>
>> Taking a step back, I think the problem here is that we are trying to shove too many concepts and properties into a single scheduler. Hard (no tardiness) is different from soft (bounded tardiness) is different from global is different from partitioned.
>>
>> From my point of view, it makes no sense to support hard deadlines under G-EDF (this is backed up by our schedulability studies [1]). Hard deadlines are best served by a P-EDF implementation (that only migrates on task creation/admission).
>>
>> Also, it makes little sense to worry about supporting both hard and soft in a P-EDF implementation. Just schedule everything by its deadline and both hard and soft tasks will be fine (if the admissions test was passed).
>>
>> In my opinion, it is much more useful to have a two-level scheduling framework: a truly partitioned EDF implementation at the top level to support "hard" tasks, and a truly global EDF implementation at the second level for "soft" tasks, with the property that the second level implementation is only invoked if the first level is idle. This nicely matches Linux's scheduling class hierarchy.
>
> Would this prevent soft RT tasks to be pinned to a specific CPU, i.e.
> have an affinity set of one CPU? I'm asking since we would like to use
> soft deadline RT (bounded tardiness, e.g. with relative deadlines
> short than periods), but with threads set to run on a specific CPU.
> This is due to that we do an analysis on data dependencies etc in user
> space applications (or offline) and distribute jobs among threads on
> each CPU. At the same time we would like the isolation between
> parallel running such applications and other CPU activities.

I think this is asking the wrong question. In the context of a global scheduler, it doesn't make sense to pin threads to individual CPUs. However, nobody is forcing you to use a global scheduler—just use partitioned EDF for your soft tasks, too, and you get bounded tardiness (the bound happens to be zero for implicit-deadline tasks). The appeal of using a global scheduler at all is that some workloads have bounded tardiness under global scheduling but not under partitioned scheduling, but the reverse is not true.

>
> I'm not saying that all soft RT needs to be pinned or that some such
> tasks can't be made to meet hard admission control but we would need
> the possibility. If I would need to priorities between soft deadline
> RT and CPU affinity, soft deadline RT would win, since it is still
> quite likely that the tasks get scheduled on individual cores when
> needed.
>
> When you say truly global EDF implementation does that mean the
> scheduler only considers relative deadline and may "migrate" a thread
> among all the CPUs at each scheduling instance, disregarding cache
> effects, power management (e.g. waking an idle core), etc?

Global EDF does not consider cache affinity (but a reasonable implementation will avoid needless migrations). Applications that require string cache affinity are better served by partitioned schedulers if it is possible to partition the workload.

Power management is likely an area that will require some additional thought. However, at ECRTS, Thomas was strongly advocating race-to-idle, and this is what G-EDF does anyway: use all cores to get everything out of the way, asap.

> Or can
> these things be factored into the relative deadlines (maybe they
> already are by the user)?

I'm not sure what you mean by that. Relative deadlines represent an application's timeliness requirements and nothing else. I guess the user could tweak parameters to force certain scheduling decisions, but I wouldn't recommend to encourage that in the kernel API.

- Björn


>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/