From: Cong Wang on
Eric Dumazet wrote:
> Le mardi 16 février 2010 à 21:06 +0800, Cong Wang a écrit :
>> Octavian Purdila wrote:
>>> On Tuesday 16 February 2010 11:37:04 you wrote:
>>>>> BUILD_BUG_ON(sizeof(struct inet_skb_parm) > sizeof(dummy_skb->cb));
>>>>>
>>>>> + sysctl_local_reserved_ports = kzalloc(65536 / 8, GFP_KERNEL);
>>>>> + if (!sysctl_local_reserved_ports)
>>>>> + goto out;
>>>>> +
>>>> I think we should also consider the ports in ip_local_port_range,
>>>> since we can only reserve the ports in that range.
>>>>
>>> That is subject to changes at runtime, which means we will have to readjust
>>> the bitmap at runtime which introduces the need for additional synchronization
>>> operations which I would rather avoid.
>> Why? As long as the bitmap is global, this will not be hard.
>>
>> Consider that if one user writes a port number which is beyond
>> the ip_local_port_range into ip_local_reserved_ports, we should
>> not accept this, because it doesn't make any sense. But with your
>> patch, we do.
>
> I disagree with you. This is perfectly OK.
>
> A port not being flagged in ip_local_reserved_ports doesnt mean it can
> be used for allocation.
>
> If you want to really block ports from being used at boot, you could for
> example :
>
> # temporarly reduce the ip_local_port_range
> echo "61000 61001" >/proc/sys/net/ipv4/ip_local_port_range
> # Build our bitmap (could be slow, if a remote database is read)
> for port in $LIST_RESERVED_PORT
> do
> echo $port >/proc/sys/net/ipv4/ip_local_reserved_ports
> done
> echo "10000 61000" >/proc/sys/net/ipv4/ip_local_port_range
>
>

I don't think so, if you want to avoid race condition, you just need to
write the reserved ports before any networking application starts, IOW,
as early as possible during boot.

Thanks.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
Eric Dumazet wrote:
> Le jeudi 18 février 2010 à 00:13 +0800, Cong Wang a écrit :
>
>> I don't think so, if you want to avoid race condition, you just need to
>> write the reserved ports before any networking application starts, IOW,
>> as early as possible during boot.
>>
>
> Sure, but I was thinking retrieving the list of reserved port by a
> database query, using network :)
>
> Anyway, I just feel your argument is not applicable.
>
> Our kernel is capable of doing an intersection for us, we dont need
> to forbid user to mark a port as 'reserved' if this port is already
> blacklisted by another mechanism (for example, if this port is already
> in use)
>

Oh, I see your points.

But this still could make people confused, like me. I think we'd better
document this.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
Octavian Purdila wrote:
> This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
> allows users to reserve ports for third-party applications.
>
> The reserved ports will not be used by automatic port assignments
> (e.g. when calling connect() or bind() with port number 0). Explicit
> port allocation behavior is unchanged.
>
> Signed-off-by: Octavian Purdila <opurdila(a)ixiacom.com>
> Signed-off-by: WANG Cong <amwang(a)redhat.com>
> Cc: Neil Horman <nhorman(a)tuxdriver.com>
> Cc: Eric Dumazet <eric.dumazet(a)gmail.com>
> Cc: Eric W. Biederman <ebiederm(a)xmission.com>
> ---
> Documentation/networking/ip-sysctl.txt | 14 ++++++++++++++
> drivers/infiniband/core/cma.c | 7 ++++++-
> include/net/ip.h | 6 ++++++
> net/ipv4/af_inet.c | 8 +++++++-
> net/ipv4/inet_connection_sock.c | 6 ++++++
> net/ipv4/inet_hashtables.c | 2 ++
> net/ipv4/sysctl_net_ipv4.c | 17 +++++++++++++++++
> net/ipv4/udp.c | 3 ++-
> net/sctp/socket.c | 2 ++
> 9 files changed, 62 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index 2dc7a1d..6534ee7 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -564,6 +564,20 @@ ip_local_port_range - 2 INTEGERS
> (i.e. by default) range 1024-4999 is enough to issue up to
> 2000 connections per second to systems supporting timestamps.
>
> +ip_local_reserved_ports - list of comma separated ranges
> + Specify the ports which are reserved for known third-party
> + applications. These ports will not be used by automatic port
> + assignments (e.g. when calling connect() or bind() with port
> + number 0). Explicit port allocation behavior is unchanged.
> +
> + The format used for both input and output is a comma separated
> + list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and
> + 10). Writing to the file will clear all previously reserved
> + ports and update the current list with the one given in the
> + input.
> +
> + Default: Empty
> +
> ip_nonlocal_bind - BOOLEAN
> If set, allows processes to bind() to non-local IP addresses,
> which can be quite useful - but may break some applications.
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 875e34e..06c9fa5 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -1979,6 +1979,8 @@ retry:
> /* FIXME: add proper port randomization per like inet_csk_get_port */
> do {
> ret = idr_get_new_above(ps, bind_list, next_port, &port);
> + if (inet_is_reserved_local_port(port))
> + ret = -EAGAIN;
> } while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
>
> if (ret)
> @@ -2995,10 +2997,13 @@ static int __init cma_init(void)
> {
> int ret, low, high, remaining;
>
> - get_random_bytes(&next_port, sizeof next_port);
> inet_get_local_port_range(&low, &high);
> +again:
> + get_random_bytes(&next_port, sizeof next_port);
> remaining = (high - low) + 1;
> next_port = ((unsigned int) next_port % remaining) + low;
> + if (inet_is_reserved_local_port(next_port))
> + goto again;
>
> cma_wq = create_singlethread_workqueue("rdma_cm");
> if (!cma_wq)
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 503994a..3da9004 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -184,6 +184,12 @@ extern struct local_ports {
> } sysctl_local_ports;
> extern void inet_get_local_port_range(int *low, int *high);
>
> +extern unsigned long *sysctl_local_reserved_ports;
> +static inline int inet_is_reserved_local_port(int port)
> +{
> + return test_bit(port, sysctl_local_reserved_ports);
> +}
> +
> extern int sysctl_ip_default_ttl;
> extern int sysctl_ip_nonlocal_bind;
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 33b7dff..e283fbe 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1546,9 +1546,13 @@ static int __init inet_init(void)
>
> BUILD_BUG_ON(sizeof(struct inet_skb_parm) > sizeof(dummy_skb->cb));
>
> + sysctl_local_reserved_ports = kzalloc(65536 / 8, GFP_KERNEL);
> + if (!sysctl_local_reserved_ports)
> + goto out;
> +
> rc = proto_register(&tcp_prot, 1);
> if (rc)
> - goto out;
> + goto out_free_reserved_ports;
>
> rc = proto_register(&udp_prot, 1);
> if (rc)
> @@ -1647,6 +1651,8 @@ out_unregister_udp_proto:
> proto_unregister(&udp_prot);
> out_unregister_tcp_proto:
> proto_unregister(&tcp_prot);
> +out_free_reserved_ports:
> + kfree(sysctl_local_reserved_ports);
> goto out;
> }
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 8da6429..1acb462 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -37,6 +37,9 @@ struct local_ports sysctl_local_ports __read_mostly = {
> .range = { 32768, 61000 },
> };
>
> +unsigned long *sysctl_local_reserved_ports;
> +EXPORT_SYMBOL(sysctl_local_reserved_ports);
> +

Sorry, this looks somewhat weird, why not just export
inet_is_reserved_local_port()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
Octavian Purdila wrote:
> This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
> allows users to reserve ports for third-party applications.
>
> The reserved ports will not be used by automatic port assignments
> (e.g. when calling connect() or bind() with port number 0). Explicit
> port allocation behavior is unchanged.
>
> Changes from the previous version:
> - switch the /proc entry format to coma separated list of range ports
> - treat -EFAULT just like any other error and acknowledge written values
> - use isdigit() in proc_get_ulong
>
> Octavian Purdila (3):
> sysctl: refactor integer handling proc code
> sysctl: add proc_do_large_bitmap
> net: reserve ports for applications using fixed port numbers

Hi,

This version looks fine for me, but I need to give them a test, and
I will put feedbacks asap. Thanks for your work!

Still two things:

1) bitops are always atomic on every arch, right? If yes, then ok.
2) I hope you could add some documentation to show the relations
between ip_local_port_range and ip_local_reserved_ports.

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cong Wang on
Octavian Purdila wrote:
> On Saturday 20 February 2010 10:11:40 you wrote:
>
>> 2) I hope you could add some documentation to show the relations
>> between ip_local_port_range and ip_local_reserved_ports.
>>
>
> How does this sound:
>
> ip_local_reserved_ports - list of comma separated ranges
> Specify the ports which are reserved for known third-party
> applications. These ports will not be used by automatic port
> assignments (e.g. when calling connect() or bind() with port
> number 0). Explicit port allocation behavior is unchanged.
>
> The format used for both input and output is a comma separated
> list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and
> 10). Writing to the file will clear all previously reserved
> ports and update the current list with the one given in the
> input.
>
> Note that ip_local_port_range and ip_local_port_range settings
> are independent and both are considered by the kernel when
> determining which ports are available for automatic port
> assignments.
>
> You can reserve ports which are not in the current
> ip_local_port_range, e.g.:
>
> $ cat /proc/sys/net/ipv4/ip_local_port_range
> 32000 61000
> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
> 8080,9148
>
> although this is redundant. However such a setting is useful
> if later the port range is changed to a value that will
> include the reserved ports.

This looks fine for me.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/