From: Vlad Yasevich on


Yaogong Wang wrote:

> There is an important design issue here: when should the user set this
> socket option?
>
> My current assumption is that the user choose the scheduling algorithm
> after declaring the socket but before the establishment of any
> association. Therefore, the granularity of control is socket-based
> rather than association-based. In one-to-many style socket case, all
> associations inherit the scheduling algorithm of the socket. The
> problems with this approach are:
> 1. Cannot specify different scheduling algorithm for each association
> under the same socket
> 2. Since the option is set before the establishment of any
> association, it doesn't know how many streams would be negotiated. It
> could only consult initmsg for the intended number of outgoing
> streams.
>
> If we go with the association-based approach, the above problems can
> be solved. But the question is: in this case, when should the user set
> this option? In one-to-many style socket, associations are implicitly
> established. There isn't a point where association is established but
> data transfer hasn't started yet so that the user can choose the
> scheduling algorithm. Any suggestion on this dilemma?

I've been thinking about this and trying to come up with scenarios of
when performing this function at association level may be useful.

The most compelling example is really in a 1-1 or peeled off case. Currently,
one can not change the scheduling on a peeled off association and that
might be a rather useful feature. Typically, associations are peeled
off just so that they can provide additional performance. Such association
may wants the additional benefit of stream prioritization.

I can however see the issues on both side of this. A good counter argument is
that if a server should provide the same level of service for a given port.
In the end, I think I'll leave it up to you. My initial question was from the
API perspective.

Now there are some technical challenges in allowing per-association option.
a)How do we deal with changing algorithms that require additional storage?
b)What do we do with DATA that's been queued before algorithm is chosen?
There may be others ones I haven't though off yet.

The simplest answer to a) is that such operations are forbidden until the
queue is drained. That's the simplest to implement and may not be that
bad for the application, considering we have a SENDER_DRY event that can
allow the application to control when the call is made.

As for b), we might have to go with a lobby or a 2 level queue approach.
For this, we can probably re-use the socket send queue, that can be drained
when association can send DATA. This might be a good thing to have anyway
so that we do not waste space for streams that haven't been negotiated.

-vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vlad Yasevich on


Yaogong Wang wrote:
> With regard to sched_priv and sched_internal, here are some explanations:
>
> sched_priv: Parameters of a scheduling algorithm that should be
> configured by the user. For example, the weight of each stream in WFQ
> or the priority of each stream in priority queue.
>
> sched_internal: Internal states used by a scheduling algorithm. Not
> all algorithms need it. For example, memoryless algorithms such as
> priority queue don't need it. Priority queue only look at the current
> state and pick the highest priority chunk available. But for WFQ, we
> need to store the remaining token of each stream in sched_internal. We
> also need to memorize who is scheduled last time.
>
> In the current design, sched_priv is specific to a socket. Even if two
> sockets both use WFQ, they may set different weights for different
> streams. sched_internal is specific to each association/queue. Even if
> two associations use the same scheduling algorithm with exactly the
> same configuration, the internal states are separate.
>
> I cannot move these states in to sched_ops since sched_ops is specific
> to each scheduling algorithm.

OK, but you can have an opaque pointer in the association that can point
to a dedicated structure that's specific to the algorithm.

That way, when/if other algorithms are introduced, they don't pollute the
association and/or outqueue structure. They can just define their
own private data structure and during initialization phase, the can
construct proper DATA.

Remember, that interfaces to this data structure need to be explained/defined
for the API, so that users can tune stream priories.

-vlad


>
> Yaogong
>
> On Thu, Jun 3, 2010 at 7:43 AM, Vlad Yasevich <vladislav.yasevich(a)hp.com> wrote:
>>
>> Wei Yongjun wrote:
>>>> Augment SCTP socket API with a new socket option to choose and
>>>> configure scheduling algorithm.
>>>>
>>>> Signed-off-by: Yaogong Wang <ywang15(a)ncsu.edu>
>>>> ---
>>>> diff -uprN -X linux-2.6.32.8/Documentation/dontdiff
>>>> p3/include/net/sctp/structs.h p4/include/net/sctp/structs.h
>>>> --- p3/include/net/sctp/structs.h 2010-06-02 12:57:24.000000000 -0700
>>>> +++ p4/include/net/sctp/structs.h 2010-06-02 12:58:11.000000000 -0700
>>>> @@ -326,6 +326,8 @@ struct sctp_sock {
>>>>
>>>> /* Multistream scheduling */
>>>> const struct sctp_sched_ops *sched_ops;
>>>> + __u32 sched_priv_len;
>>>> + __u16 *sched_priv;
>>>>
>>>> struct sctp_initmsg initmsg;
>>>> struct sctp_rtoinfo rtoinfo;
>>>> @@ -1691,6 +1693,8 @@ struct sctp_association {
>>>>
>>>> /* Multistream scheduling */
>>>> const struct sctp_sched_ops *sched_ops;
>>>> + __u32 sched_priv_len;
>>>> + __u16 *sched_priv;
>>>>
>>> I also see you add "__u16 *sched_internal;" to asoc, those are
>>> the sche base values, and are different between sche.
>>> The socket and assoc may not want to know the detail, raw data
>>> is enough for them, and if we want to add a new sche, we do
>>> not change the struct sctp_sock or struct sctp_association every
>>> time.
>> I agree. These should probably go into sched_ops.
>>
>> -vlad
>>
>>>> /* Heartbeat interval: The endpoint sends out a Heartbeat chunk to
>>>> * the destination address every heartbeat interval. This value
>>>> diff -uprN -X linux-2.6.32.8/Documentation/dontdiff
>>>> p3/include/net/sctp/user.h p4/include/net/sctp/user.h
>>>> --- p3/include/net/sctp/user.h 2010-05-28 10:59:23.000000000 -0700
>>>> +++ p4/include/net/sctp/user.h 2010-05-28 11:54:47.000000000 -0700
>>>> @@ -67,6 +67,8 @@ enum sctp_optname {
>>>> #define SCTP_ASSOCINFO SCTP_ASSOCINFO
>>>> SCTP_INITMSG,
>>>> #define SCTP_INITMSG SCTP_INITMSG
>>>> + SCTP_SCHED,
>>>> +#define SCTP_SCHED SCTP_SCHED
>>>> SCTP_NODELAY, /* Get/set nodelay option. */
>>>> #define SCTP_NODELAY SCTP_NODELAY
>>>> SCTP_AUTOCLOSE,
>>>> @@ -171,8 +173,22 @@ struct sctp_initmsg {
>>>> __u16 sinit_max_init_timeo;
>>>> };
>>>>
>>>> +/*
>>>> + * SCTP Scheduling Structure (SCTP_SCHED)
>>>> + *
>>>> + * cmsg_level cmsg_type cmsg_data[]
>>>> + * ------------ ------------ ----------------------
>>>> + * IPPROTO_SCTP SCTP_SCHED struct sctp_sched
>>>> + *
>>>> + */
>>>> #define SCTP_SCHED_NAME_MAX 16
>>>>
>>>> +struct sctp_sched {
>>>> + char ssched_name[SCTP_SCHED_NAME_MAX];
>>>> + __u32 ssched_priv_len;
>>>> + __u16 ssched_priv[0];
>>>> +};
>>>> +
>>>> /*
>>>> * 5.2.2 SCTP Header Information Structure (SCTP_SNDRCV)
>>>> *
>>>> diff -uprN -X linux-2.6.32.8/Documentation/dontdiff
>>>> p3/net/sctp/associola.c p4/net/sctp/associola.c
>>>> --- p3/net/sctp/associola.c 2010-06-02 12:57:06.000000000 -0700
>>>> +++ p4/net/sctp/associola.c 2010-06-02 12:57:57.000000000 -0700
>>>> @@ -187,6 +187,14 @@ static struct sctp_association *sctp_ass
>>>>
>>>> /* Multistream scheduling */
>>>> asoc->sched_ops = sp->sched_ops;
>>>> + asoc->sched_priv_len = sp->sched_priv_len;
>>>> + if (asoc->sched_priv_len) {
>>>> + asoc->sched_priv = kmalloc(asoc->sched_priv_len, gfp);
>>>> + if (!asoc->sched_priv)
>>>> + goto fail_init;
>>>> + memcpy(asoc->sched_priv, sp->sched_priv, asoc->sched_priv_len);
>>>> + } else
>>>> + asoc->sched_priv = NULL;
>>>>
>>>> /* Allocate storage for the ssnmap after the inbound and outbound
>>>> * streams have been negotiated during Init.
>>>> @@ -464,6 +472,8 @@ static void sctp_association_destroy(str
>>>> {
>>>> SCTP_ASSERT(asoc->base.dead, "Assoc is not dead", return);
>>>>
>>>> + kfree(asoc->sched_priv);
>>>> +
>>>> sctp_endpoint_put(asoc->ep);
>>>> sock_put(asoc->base.sk);
>>>>
>>>> diff -uprN -X linux-2.6.32.8/Documentation/dontdiff
>>>> p3/net/sctp/socket.c p4/net/sctp/socket.c
>>>> --- p3/net/sctp/socket.c 2010-05-28 12:38:09.000000000 -0700
>>>> +++ p4/net/sctp/socket.c 2010-05-28 12:36:37.000000000 -0700
>>>> @@ -2580,6 +2580,50 @@ static int sctp_setsockopt_initmsg(struc
>>>> return 0;
>>>> }
>>>>
>>>> +/* Set the multistream scheduling algorithm*/
>>>> +static int sctp_setsockopt_sched(struct sock *sk, char __user *optval,
>>>> + unsigned int optlen)
>>>> +{
>>>> + struct sctp_sched *ssched = NULL;
>>>> + struct sctp_sock *sp = sctp_sk(sk);
>>>> + int ret = 0;
>>>> +
>>>> + if (optlen < sizeof(struct sctp_sched))
>>>> + return -EINVAL;
>>>> +
>>>> + ssched = kmalloc(optlen, GFP_KERNEL);
>>>> + if (!ssched)
>>>> + return -ENOMEM;
>>>> +
>>>> + if (copy_from_user(ssched, optval, optlen)) {
>>>> + ret = -EFAULT;
>>>> + goto out;
>>>> + }
>>>> +
>>>> + if (optlen != sizeof(struct sctp_sched) + ssched->ssched_priv_len) {
>>>> + ret = -EINVAL;
>>>> + goto out;
>>>> + }
>>>> +
>>>> + ret = sctp_set_sched(sk, ssched->ssched_name);
>>>> + if (ret)
>>>> + goto out;
>>>> + sp->sched_priv_len = ssched->ssched_priv_len;
>>>> + kfree(sp->sched_priv);
>>>> + if (sp->sched_priv_len) {
>>>> + sp->sched_priv = kmalloc(sp->sched_priv_len, GFP_KERNEL);
>>>> + if (!sp->sched_priv) {
>>>> + ret = -ENOMEM;
>>>> + goto out;
>>>> + }
>>>> + memcpy(sp->sched_priv, ssched->ssched_priv, sp->sched_priv_len);
>>>> + }
>>>> +
>>>> +out:
>>>> + kfree(ssched);
>>>> + return ret;
>>>> +}
>>>> +
>>>> /*
>>>> * 7.1.14 Set default send parameters (SCTP_DEFAULT_SEND_PARAM)
>>>> *
>>>> @@ -3417,6 +3461,9 @@ SCTP_STATIC int sctp_setsockopt(struct s
>>>> retval = sctp_setsockopt_partial_delivery_point(sk, optval, optlen);
>>>> break;
>>>>
>>>> + case SCTP_SCHED:
>>>> + retval = sctp_setsockopt_sched(sk, optval, optlen);
>>>> + break;
>>>> case SCTP_INITMSG:
>>>> retval = sctp_setsockopt_initmsg(sk, optval, optlen);
>>>> break;
>>>> @@ -3642,7 +3689,10 @@ SCTP_STATIC int sctp_init_sock(struct so
>>>> sp->initmsg.sinit_max_attempts = sctp_max_retrans_init;
>>>> sp->initmsg.sinit_max_init_timeo = sctp_rto_max;
>>>>
>>>> + /* Initialize default scheduling algorithm */
>>>> sp->sched_ops = sctp_default_sched_ops;
>>>> + sp->sched_priv_len = 0;
>>>> + sp->sched_priv = NULL;
>>>>
>>>> /* Initialize default RTO related parameters. These parameters can
>>>> * be modified for with the SCTP_RTOINFO socket option.
>>>> @@ -3735,6 +3785,9 @@ SCTP_STATIC void sctp_destroy_sock(struc
>>>>
>>>> SCTP_DEBUG_PRINTK("sctp_destroy_sock(sk: %p)\n", sk);
>>>>
>>>> + sctp_cleanup_sched(sk);
>>>> + kfree(sctp_sk(sk)->sched_priv);
>>>> +
>>>> /* Release our hold on the endpoint. */
>>>> ep = sctp_sk(sk)->ep;
>>>> sctp_endpoint_free(ep);
>>>> @@ -4351,6 +4404,35 @@ static int sctp_getsockopt_initmsg(struc
>>>> return 0;
>>>> }
>>>>
>>>> +/* Get the multistream scheduling algorithm*/
>>>> +static int sctp_getsockopt_sched(struct sock *sk, int len, char __user *optval,
>>>> + int __user *optlen)
>>>> +{
>>>> + struct sctp_sched *ssched;
>>>> + int sz = sizeof(struct sctp_sched) + sctp_sk(sk)->sched_priv_len;
>>>> + int ret = 0;
>>>> +
>>>> + if (len < sz)
>>>> + return -EINVAL;
>>>> + if (put_user(sz, optlen))
>>>> + return -EFAULT;
>>>> +
>>>> + ssched = kmalloc(sz, GFP_KERNEL);
>>>> + if (!ssched)
>>>> + return -EFAULT;
>>>> + memcpy(ssched->ssched_name, sctp_sk(sk)->sched_ops->name,
>>>> + SCTP_SCHED_NAME_MAX);
>>>> + ssched->ssched_priv_len = sctp_sk(sk)->sched_priv_len;
>>>> + memcpy(ssched->ssched_priv, sctp_sk(sk)->sched_priv,
>>>> + ssched->ssched_priv_len);
>>>> +
>>>> + if (copy_to_user(optval, ssched, sz))
>>>> + ret = -EFAULT;
>>>> +
>>>> + kfree(ssched);
>>>> + return ret;
>>>> +}
>>>> +
>>>> static int sctp_getsockopt_peer_addrs_num_old(struct sock *sk, int len,
>>>> char __user *optval,
>>>> int __user *optlen)
>>>> @@ -5605,6 +5687,9 @@ SCTP_STATIC int sctp_getsockopt(struct s
>>>> case SCTP_INITMSG:
>>>> retval = sctp_getsockopt_initmsg(sk, len, optval, optlen);
>>>> break;
>>>> + case SCTP_SCHED:
>>>> + retval = sctp_getsockopt_sched(sk, len, optval, optlen);
>>>> + break;
>>>> case SCTP_GET_PEER_ADDRS_NUM_OLD:
>>>> retval = sctp_getsockopt_peer_addrs_num_old(sk, len, optval,
>>>> optlen);
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>>> the body of a message to majordomo(a)vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at http://www.tux.org/lkml/
>>>>
>>>>
>>>>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Yaogong Wang on
I'd like to keep the socket-based design but refine it to avoid
wasting resources on streams that are not negotiated. The modified
version will work as follow:

The choice and configuration of the scheduling algorithm is still
socket-based. Users are supposed to set this socket option after
declaring the socket but before association establishment. When the
socket option is set, the configurations are stored in sctp_sock.

In sctp_association_init, we don't immediate let the user-specified
scheduling algorithm take effect. Instead, the default FCFS is used.
The user-specified scheduling algorithm takes effect when the
association turn into ESTABLISHED state. At that point, we already
know how many stream are actually negotiated so that we can allocate
resources accordingly. Any remaining data chunks in the initial FCFS
queue will be moved to the new queue(s).

If the negotiated number of streams is smaller that what the user
requested, the scheduling algorithm should still work, but in a
truncated version. For example, suppose the user originally wants 4
streams and chooses weighted fair queue. The weight for stream 0, 1,
2, 3 are 1024, 1024, 2048, 2048 respectively. If only 3 streams are
negotiated in the end, WFQ will still be used but with 3 streams whose
weights are 1024, 1024, 2048.

Users are not supposed to change the scheduling algorithm on the fly.
If they do, in TCP-style socket, it won't take effect. The
configuration is only stored in sctp_sock but not in the association
or the outq. For UDP-style socket, changing scheduling configuration
on the fly will only affect new associations established after the
change. This simplifies the implementation and I think users typically
don't want to change the scheduling algorithm of an on-going
association.

Yaogong

On Mon, Jun 14, 2010 at 9:36 AM, Vlad Yasevich
<vladislav.yasevich(a)hp.com> wrote:
>
>
> Yaogong Wang wrote:
>
>> There is an important design issue here: when should the user set this
>> socket option?
>>
>> My current assumption is that the user choose the scheduling algorithm
>> after declaring the socket but before the establishment of any
>> association. Therefore, the granularity of control is socket-based
>> rather than association-based. In one-to-many style socket case, all
>> associations inherit the scheduling algorithm of the socket. The
>> problems with this approach are:
>> 1. Cannot specify different scheduling algorithm for each association
>> under the same socket
>> 2. Since the option is set before the establishment of any
>> association, it doesn't know how many streams would be negotiated. It
>> could only consult initmsg for the intended number of outgoing
>> streams.
>>
>> If we go with the association-based approach, the above problems can
>> be solved. But the question is: in this case, when should the user set
>> this option? In one-to-many style socket, associations are implicitly
>> established. There isn't a point where association is established but
>> data transfer hasn't started yet so that the user can choose the
>> scheduling algorithm. Any suggestion on this dilemma?
>
> I've been thinking about this and trying to come up with scenarios of
> when performing this function at association level may be useful.
>
> The most compelling example is really in a 1-1 or peeled off case. �Currently,
> one can not change the scheduling on a peeled off association and that
> might be a rather useful feature. �Typically, associations are peeled
> off just so that they can provide additional performance. �Such association
> may wants the additional benefit of stream prioritization.
>
> I can however see the issues on both side of this. �A good counter argument is
> that if a server should provide the same level of service for a given port.
> In the end, I think I'll leave it up to you. �My initial question was from the
> API perspective.
>
> Now there are some technical challenges in allowing per-association option.
> a)How do we deal with changing algorithms that require additional storage?
> b)What do we do with DATA that's been queued before algorithm is chosen?
> There may be others ones I haven't though off yet.
>
> The simplest answer to a) is that such operations are forbidden until the
> queue is drained. �That's the simplest to implement and may not be that
> bad for the application, considering we have a SENDER_DRY event that can
> allow the application to control when the call is made.
>
> As for b), we might have to go with a lobby or a 2 level queue approach.
> For this, we can probably re-use the socket send queue, that can be drained
> when association can send DATA. �This might be a good thing to have anyway
> so that we do not waste space for streams that haven't been negotiated.
>
> -vlad
>



--
========================
Yaogong Wang, PhD candidate
Department of Computer Science
North Carolina State University
http://www4.ncsu.edu/~ywang15/
========================
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Vlad Yasevich on


Yaogong Wang wrote:
> I'd like to keep the socket-based design but refine it to avoid
> wasting resources on streams that are not negotiated. The modified
> version will work as follow:
>
> The choice and configuration of the scheduling algorithm is still
> socket-based. Users are supposed to set this socket option after
> declaring the socket but before association establishment. When the
> socket option is set, the configurations are stored in sctp_sock.
>
> In sctp_association_init, we don't immediate let the user-specified
> scheduling algorithm take effect. Instead, the default FCFS is used.

Actually, at this stage, you don't need any algorithms as no data is
permitted out. Yes, we need the ability to queue the data, but we can
just as easily queue it to the socket as to the association.

> The user-specified scheduling algorithm takes effect when the
> association turn into ESTABLISHED state. At that point, we already

Well, COOKIE-ECHOED state is probably better, since we are allowed
to send 1 DATA with a COOKIE-ECHO chunk, as well as with COOKIE-ACK.

> know how many stream are actually negotiated so that we can allocate
> resources accordingly. Any remaining data chunks in the initial FCFS
> queue will be moved to the new queue(s).
>
> If the negotiated number of streams is smaller that what the user
> requested, the scheduling algorithm should still work, but in a
> truncated version. For example, suppose the user originally wants 4
> streams and chooses weighted fair queue. The weight for stream 0, 1,
> 2, 3 are 1024, 1024, 2048, 2048 respectively. If only 3 streams are
> negotiated in the end, WFQ will still be used but with 3 streams whose
> weights are 1024, 1024, 2048.

Agreed.

>
> Users are not supposed to change the scheduling algorithm on the fly.
> If they do, in TCP-style socket, it won't take effect. The
> configuration is only stored in sctp_sock but not in the association
> or the outq. For UDP-style socket, changing scheduling configuration
> on the fly will only affect new associations established after the
> change. This simplifies the implementation and I think users typically
> don't want to change the scheduling algorithm of an on-going
> association.

Ok. Looking forward to the next version.

-vlad

>
> Yaogong
>
> On Mon, Jun 14, 2010 at 9:36 AM, Vlad Yasevich
> <vladislav.yasevich(a)hp.com> wrote:
>>
>> Yaogong Wang wrote:
>>
>>> There is an important design issue here: when should the user set this
>>> socket option?
>>>
>>> My current assumption is that the user choose the scheduling algorithm
>>> after declaring the socket but before the establishment of any
>>> association. Therefore, the granularity of control is socket-based
>>> rather than association-based. In one-to-many style socket case, all
>>> associations inherit the scheduling algorithm of the socket. The
>>> problems with this approach are:
>>> 1. Cannot specify different scheduling algorithm for each association
>>> under the same socket
>>> 2. Since the option is set before the establishment of any
>>> association, it doesn't know how many streams would be negotiated. It
>>> could only consult initmsg for the intended number of outgoing
>>> streams.
>>>
>>> If we go with the association-based approach, the above problems can
>>> be solved. But the question is: in this case, when should the user set
>>> this option? In one-to-many style socket, associations are implicitly
>>> established. There isn't a point where association is established but
>>> data transfer hasn't started yet so that the user can choose the
>>> scheduling algorithm. Any suggestion on this dilemma?
>> I've been thinking about this and trying to come up with scenarios of
>> when performing this function at association level may be useful.
>>
>> The most compelling example is really in a 1-1 or peeled off case. Currently,
>> one can not change the scheduling on a peeled off association and that
>> might be a rather useful feature. Typically, associations are peeled
>> off just so that they can provide additional performance. Such association
>> may wants the additional benefit of stream prioritization.
>>
>> I can however see the issues on both side of this. A good counter argument is
>> that if a server should provide the same level of service for a given port.
>> In the end, I think I'll leave it up to you. My initial question was from the
>> API perspective.
>>
>> Now there are some technical challenges in allowing per-association option.
>> a)How do we deal with changing algorithms that require additional storage?
>> b)What do we do with DATA that's been queued before algorithm is chosen?
>> There may be others ones I haven't though off yet.
>>
>> The simplest answer to a) is that such operations are forbidden until the
>> queue is drained. That's the simplest to implement and may not be that
>> bad for the application, considering we have a SENDER_DRY event that can
>> allow the application to control when the call is made.
>>
>> As for b), we might have to go with a lobby or a 2 level queue approach.
>> For this, we can probably re-use the socket send queue, that can be drained
>> when association can send DATA. This might be a good thing to have anyway
>> so that we do not waste space for streams that haven't been negotiated.
>>
>> -vlad
>>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/