From: Luca Abeni on
Hi all,

first of all, thanks for including me in these emails, and sorry for the
delay...

On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
> Hi all,
>
> So, talking to Peter and Thomas at the OSPERT workshop in Brussels [1],
> the so called "sporaidic task model" came out many many times!
I assume here you are simply talking about tasks with relative deadline
different from period, right? (the term "sporadic" is more often
associated with non-periodic activation patterns).

[...]
> - do you think it could be useful to have a different syscall to deal
> with the period parameter (if it's different from deadline), e.g.,
> something useful to make the task periodic as you have (if I remember
> well) in Xenomai or RTAI?
Maybe I am confused because I missed the initial part of the discussion,
but here I think there is the risk to mix two different concepts: the
"reservation period" (that is, the period P used to postpone the
scheduling deadline when the budget arrives to 0), and the "task
period" (which has to do with the periodicity of tasks activations). For
implementing a periodic behaviour in the task (this is, AFAIK, what RTAI
similar API provide), no new syscall is needed: clock_nanosleep() is
enough. See http://www.disi.unitn.it/~abeni/RTOS/rtapi.pdf for a
(stupid) example.
The reservation period, on the other hand, is a scheduling parameter,
and I think that setting it with extended versions of sched_setparam(),
sched_setscheduler() and similar is ok.


> If you think it's worth doing that, do you think the
> task_wait_interval() syscall that we already added could/should do
> the job?
I do not remember what task_wait_interval() does :)
Is it the syscall you added to indicate the end of a job?


> Basically, from the scheduling point of view, what it could happen is
> that I'm still _NOT_ going to allow a task with runtime Q_i, deadline
> D_i and period P_i to use more bandwidth than Q_i/P_i, I'm still using D
> for scheduling but the passing of the simple in-kernel admission test
> Sum_i(Q_i/P_i)<1 won't guarantee that the task will always finish its
> jobs before D.
I think if you want a different P_i and D_i you can use D_i for
generating new scheduling deadlines on task arrivals as "d = t + D_i",
and P_i to postpone the scheduling deadlines as "d = d + T_i" when the
budget is 0.
Depending on the replenishment amount you use, you might need to modify
the admission test as "Sum_i(Q_i/min{P_i,D_i}) < 1" or not (if you
always replenish to Q_i, then you need a modified admission test;
otherwise, you can compute the replenishment amount so that the
admission test is unaffected).


Thanks,
Luca

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Luca Abeni on
On Fri, 2010-07-09 at 16:24 +0200, Peter Zijlstra wrote:
> On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
> > Basically, from the scheduling point of view, what it could happen is
> > that I'm still _NOT_ going to allow a task with runtime Q_i, deadline
> > D_i and period P_i to use more bandwidth than Q_i/P_i, I'm still using D
> > for scheduling but the passing of the simple in-kernel admission test
> > Sum_i(Q_i/P_i)<1 won't guarantee that the task will always finish its
> > jobs before D.
>
> But the tardiness would still be bounded, right? So its a valid Soft-RT
> model?
I think that if Sum_i(Q_i/P_i)<1 but Sum_i(Q_i/min{P_i,D_i})>=1 then you
can have sporadic deadline misses, but it should still be possible to
compute an upper bound for the tardiness.
But this is just a feeling, I have no proof... :)


Luca

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Raistlin on
On Fri, 2010-07-09 at 16:32 +0200, Peter Zijlstra wrote:
> On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
> > - do you think it could be useful to have a different syscall to deal
> > with the period parameter (if it's different from deadline), e.g.,
> > something useful to make the task periodic as you have (if I remember
> > well) in Xenomai or RTAI?
> > If you think it's worth doing that, do you think the
> > task_wait_interval() syscall that we already added could/should do
> > the job?
>
> Again, I'm afraid I need some extra education here in order to make a
> sensible comment.
>
Hey, fine, where's the problem? :-P

> What are the exact semantics of this extra proposed syscall?
>
Right now, it is:
task_wait_interval(t) --> "wake me up at the first instant after t when
you can give me my full runtime"

> What exactly are the benefits over not having it, and simply rely on the
> task to not wake up more often, but if it does have it run into the lack
> of budget and sort it that way?
>
What you're saying obviously will always work, and it is actually a
quite common usage pattern (we use it like that a lot! :-)).

The new syscall might help when it is important for a task to
synchronize with the budget provisioning mechanism. It might be
uncommon, but there could be situations --more in hard than in soft
scenarios-- where you want to be sure that you're next job (and all the
subsequent ones, if you behave well) will get its full runtime, even if
this means waiting a little bit.

what I was wondering was if this semantic should be modified by the
introduction of the "period", but I also agree with Luca that we must do
our best to avoid confusion!

Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / raistlin(a)ekiga.net /
dario.faggioli(a)jabber.org
From: Raistlin on
On Fri, 2010-07-09 at 18:35 +0200, Peter Zijlstra wrote:
> I think the easiest path for now would indeed be to split between hard
> and soft rt tasks, and limit hard to d==p, and later worry about
> supporting d<p for hard.
>
Mmm... I see... Are you thinking of another scheduling class? Or maybe
just another queue with "higher priority" inside the same scheduling
class (sched_dl.c)?

I mean I can do both, so I prefer to do what you like most in the first
place, instead of having to do it twice!! :-P

Maybe having two policies inside the same class (maybe handled in
separate queues/rb-trees) might save a lot of code duplication.
If we want to go like this, suggestions on the name(s) of the new (or of
both) policy(-ies) are more than welcome. :-D

> It will very much depend on how we're going to go about doing things
> with that 'load-balancer' thingy anyway.
>
Agree. The "load-balancer" right now pushes/pulls tasks to/from the
various runqueue --just how the saame thing happens in sched-rt-- to,
say, approximate G-EDF. Code is on the git... I just need some time to
clean up a little bit more and post the patches, but it's already
working at least... :-)

> The idea is that we approximate G-EDF by moving tasks around, but Dario
> told me the admission tests are still assuming P-EDF.
>
Yep, as said above, that's what we've done since now. Regarding
"partitioned admission", let me try to explain this.

You asked me to use sched_dl_runtime_us/sched_dl_period_us to let people
decide how much bandwidth should be devoted to EDF tasks. This obviously
yields to _only_one_ bandwidth value that is then utilized as the
utilization cap on *each* CPU, mainly for consistency reasons with
sched_rt_{runtime,period}_us. At that time I was using such value as the
"overall EDF bandwidth", but I changed to your suggested semantic.

Obviously this works perfectly as long as tasks stay on the CPU were
they are created, and if they're manually migrated (by explicitly
changing their affinity) I can easily check if there's enough bandwidth
on the target CPU, and if yes move the task and its bandwidth there.
That's how things were before the 'load-balancer' (and still does, if
you set affinities of tasks so to have a fully partitioned setup).

With global scheduling in place, we have this new situation. A task is
forked on a CPU (say 0), and I allow that if there's enough bandwidth
for it on that processor (and obviously, if yes, I also consume such
amount of bw). When the task is dynamically migrated to CPU 1 I have two
choices:
(a) I move the bandwidth it occupies from 0 to 1 or,
(b) I leave it (the bw, not the task) where it is, on 0.

If I go for (b) and the scheduler wants to move a 0.2 task from CPU 0
(loaded up to 1.0) to CPU 1 loaded up to 0.9, I'm having a "transitory"
situation with load 0.7 on CPU 0 and load 1.1 on CPU 1 --which I really
don't like--, but at least I'm still enforcing Sum_i(EDFtask_i)<1.9.
Moreover, If a new 0.1 task is being forked by someone on CPU 1
(independently whether it finds 1.1 or 1.0 load there), it will fail,
even if there is room for it in the system (on CPU 1) --which I really
don't like!! This is what, as I said to Peter in Brussels, I mean with
"still partitioned" admission test.

If I go for (a), and again the scheduler tries to move a 0.2 task from
CPU 0 (loaded up to 1) to CPU 1 (loaded up to 0.9) I again have, two
choices, failing or permitting this. Failing would mean another
limitation to global scheduling --which I really don't like-- but
allowing that would mean that another task of 0.2 can be created on CPU
0 so that I end up in bw(CPU0)+bw(CPU1)=1+1.1=2.1 --which I really don't
like!! :-O :-O

More-moreover, if my bw limit is 0.7 on each of the 2 CPUs I have,
keeping the bandwidth separated forbids me to create a 0.2 task if both
the CPU are loaded up to 0.6, while it probably could be scheduled,
since we have global EDF! :-O

If you look at the code, you'll find (b) implemented right now, but, as
you might have understood, it's something I really don't like! :-(

If we want something better I cannot think on anything that doesn't
include having a global (per-domain should be fine as well) mechanism
for bandwidth accounting...

> Add to that the interesting problems of task affinity and we might soon
> all have a head-ache ;-)
>
We right now support affinity, i.e., tasks will be pushed/pulled to/by
CPUs where they can run. I'm not aware of any academic work that
analyzes such a situation, but this doesn't mean we can't figure
something out... Just to give people an example of "why real-time
scheduling theory still matters"!! ;-P ;-P

> One thing we can do is limit the task affinity to either 1 cpu or all
> cpus in the load-balance domain. Since there don't yet exist any
> applications we can disallow things to make life easier.
>
> If we only allow pinned tasks and free tasks, splitting the admission
> test in two would suffice I think, if keep one per-cpu utilization
> measure and use the maximum of these over all cpus to start the global
> utilization measure, things ought to work out.
>
Ok, that seems possible to me, but since I have to write the code you
must tell me what you want the semantic of (syswide and per-group)
sched_dl_{runtime,period} to become and how should I treat them! :-)

Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / raistlin(a)ekiga.net /
dario.faggioli(a)jabber.org
From: Raistlin on
On Fri, 2010-07-09 at 16:30 +0200, Peter Zijlstra wrote:
> On Fri, 2010-07-09 at 15:38 +0200, Raistlin wrote:
> >
> > Therefore, here it is my proposal:
> > - if the programmer specify both period and deadline, I act as above,
> > running the _only_necessary_ test in the kernel, assuming that
> > userspace performed some other kind of (correct!) admission test by
> > its own, or that it is cool with missing some deadlines... What do
> > you think?
>
> It would be good to provide room in the interface to improve upon this
> situation.
>
Well, room is already there, problem is decide how to use it.

> One thing we could do, although this would make the proposed scheduler a
> wee bit more complex, is split between hard and soft realtime. Only
> accept P==rel_D for hard, and schedule the soft tasks in the hard slack
> or something like that.
>
As said in the other mail, this could be done, just let me know as much
as you can how you think it should happen (scheduling class, scheduling
policy, two (or more) separate queues, ecc.

Btw, Dhaval introduced me to the obscure world of IRC... I am (if
everything worked! :-P) dariof on #sched and/or #sched-rt, so feel free
to contact me even there, if at te PC I'll be glad to discuss even
there! :-D

Thanks and regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa (Italy)

http://blog.linux.it/raistlin / raistlin(a)ekiga.net /
dario.faggioli(a)jabber.org