From: Peter Zijlstra on
On Sun, 2010-07-11 at 09:32 +0200, Bjoern Brandenburg wrote:

> Trying to infer whether a task is "hard" or "soft" from task
> parameters is not a good idea, IMO. It's much better to make this an
> explicit part of the task model that is configured via sched_setparam.
> By default, tasks should be marked "soft" (this leaves more wiggle
> room to the kernel); users who care can change the flag to "hard".

I think we're in violent agreement here ;-) and I was convinced that was
what we were talking about. The question was only how to represent that
in the sched_param_ex structure, the options were:

struct sched_param_ex params;

params.flags |= SF_SOFT;
sched_setscheduler_ex( .policy = SCHED_DEADLINE, .param = &params);

vs

sched_setscheduler_ex( .policy = SCHED_DEADLINE_{SOFT,HARD},
.param = &params);

> Taking a step back, I think the problem here is that we are trying to
> shove too many concepts and properties into a single scheduler. Hard
> (no tardiness) is different from soft (bounded tardiness) is different
> from global is different from partitioned.
>
> From my point of view, it makes no sense to support hard deadlines
> under G-EDF (this is backed up by our schedulability studies [1]).
> Hard deadlines are best served by a P-EDF implementation (that only
> migrates on task creation/admission).
>
The problem is more that we need to support things like cpu affinity and
cpusets within the context of a 'global' scheduler.

Using cpusets we can partition the 'load-balancer' and create clusters
(root-domains in linux scheduler speak).

Using cpu affinity we can limit tasks to a subset of their cluster's
cpus.

Esp. the latter is very hard to do, and I think we can get away with
only allowing a single cpu or the full cluster (its a new policy, so
there is no existing userspace to break).

This ends up meaning we need to support both P-EDF and G-EDF for soft,
and since we want to re-use pretty much all the code and only have a
different admission test for hard (initially), it would end up also
being P/G-EDF for hard (even though as you rightly point out, hard G-EDF
is pretty pointless -- but since the policy doesn't promise EDF, we
could later improve it to be PD^2 or whatever, at which point global
hard does start to make sense).

(which I guess would suggest we use different policies instead of a
flag, since that would make most sense if we end up replacing the hard
part with another policy)

So what I want to have is a sporadic task scheduler, not an EDF
scheduler (hence also the request to s/SCHED_EDF/SCHED_DEADLINE/ --
avoiding the obvious SCHED_SPORADIC in order to avoid confusion with the
POSIX thing).

EDF is just the easiest of the many different ways to schedule a
sporadic task set.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Sun, 2010-07-11 at 08:46 +0200, Bjoern Brandenburg wrote:
> I'd be hesitant to just assume that it "approximates G-EDF"
> sufficiently well to apply any of the published G-EDF tests.

OK, suppose that for each cpu we keep the earliest and next-earliest
deadline in a table. Then on wakeup (job release) we pick the cpu with
the currently last deadline to preempt (we push the task).

On sleep (job completion) we look for the earliest among all
next-earliest deadlines to select the next runnable task (we pull the
task).

If we serialize all this using one big lock around this [ {earliest,
next-earliest} ] table, we've basically implemented G-EDF, agreed?

Now replace that global lock with an algorithm that looks at the table,
finds the last-earliest or earliest-next-earliest in a lock-less
fashion, then locks the target cpu's rq->lock, verifies the result and
either continues or tries again.

So we replace the global lock with cmpxchg like loops using 2 per-cpu
locks. Our current SCHED_FIFO balancer does just this and is found to be
a very good approximation of global-fifo (iirc there's one funny case,
but I can't remember, maybe Steve or Gregory can remember the details).


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Zijlstra on
On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote:
> If you want to do G-EDF with limited and different budgets on each CPU
> (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but
> for 400 out of 1000 ms on CPU 1), then you are entering the domain of
> restricted-supply scheduling, which is significantly more complicated
> to analyze (see [1,2]).

Would making the thing homogenious by assuming the worst for all cpus
make the analysis easier? That is, in the above example, only allow the
G-EDF scheduler to run for 100 out of 1000 ms on both cpus.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Gregory Haskins on
Hello Peter, Steven, Thomas, et al. !

Its been a while.

>>> On 8/3/2010 at 04:16 AM, in message <1280823360.1923.419.camel(a)laptop>, Peter
Zijlstra <peterz(a)infradead.org> wrote:

[snip]

>
> So we replace the global lock with cmpxchg like loops using 2 per-cpu
> locks. Our current SCHED_FIFO balancer does just this and is found to be
> a very good approximation of global-fifo (iirc there's one funny case,
> but I can't remember, maybe Steve or Gregory can remember the details).

The only thing I remember at the moment is that Steven and I found one last remaining bug in an extreme corner
case where priority order may be violated. Sadly, I can no longer recall the exact specifics and would have to dig
into the code to remember. The good news is I believe the issue is fixable. Its just a matter of impl+test, which
neither of us ever seem to have found the time to properly dedicate. Perhaps this will be the catalyst ;)

-Greg


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrea Bastoni on
On 08/03/2010 05:41 AM, Peter Zijlstra wrote:
> On Sun, 2010-07-11 at 08:42 +0200, Bjoern Brandenburg wrote:
>>
>> If you want to do G-EDF with limited and different budgets on each CPU
>> (e.g., G-EDF tasks may only run for 100 out of 1000 ms on CPU 0, but
>> for 400 out of 1000 ms on CPU 1), then you are entering the domain of
>> restricted-supply scheduling, which is significantly more complicated
>> to analyze (see [1,2]).
>
> Without having looked at the refs, won't the soft case still have
> bounded tardiness? Since the boundedness property mostly depends on
> u<=1, that is, as long as we can always run everything within the
> available time we won't start drifting.

Yes, the soft case will still have bounded tardiness (see [2]), although the reason is more
related to the fact that priorities are defined by deadlines, than to U<=1.

Anyway, both hard and soft real-time cases become very difficult to analyze if limited/different
budgets are allowed on each CPU.

>> As far as I know there is no exiting analysis for "almost G-EDF",
>> i.e., the case where each task may only migrate among a subset of the
>> processors (= affinity masks), except for the special case of
>> clustered EDF (C-EDF), wherein the subsets of processors are
>> non-overlapping.
>
> Right, affinity masks are a pain, hence I proposed to limit that to
> either 1 cpu (yielding fully paritioned) or the full cluster.

I'm not sure I get what you mean by "full cluster". With G-EDF-like scheduling policies it only
makes sense to cluster cores around some memory level (cache Lx, NUMA node...), as the idea is
to reduce the cost of a task migration among cores. Depending on the workload, a higher (lower)
level of clustering may perform better.

A "full cluster" therefore should be created around some memory level. But if a socket has, for
example, two level of caches (L2 + L3) and a "full cluster" forces to select all cores in the
socket (first hierarchy level in cpusets), we miss the possibility to cluster the cores that
shares the L2 (and this configuration may lead better performance). In these clusters the
processors are _non-overlapping_.

Instead, if you want to use the cpuset + affinity to define possibly _overlapping_ clusters (or
containers, or servers) to support different budgets on each CPU (something similar to cgroup,
see [1,3]), forcing only two configuration (single cpu/full cluster) may be restrictive.

Thanks,

- Andrea

[3] H. Leontyev and J. Anderson, " A Hierarchical Multiprocessor Bandwidth Reservation Scheme
with Timing Guarantees", Real-Time Systems, Volume 43, Number 1, pp. 60-92, September 2009.
http://www.cs.unc.edu/~anderson/papers/rtj09b.pdf

--
Andrea Bastoni
Visiting Ph.D. Student
Dept. of Computer Science
University of North Carolina at Chapel Hill
http://www.sprg.uniroma2.it/home/bastoni/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/