From: Michal Simek on
Hi All,

I am doing several network benchmarks on Microblaze cpu with MMU.
I am seeing one issue which is weird and I would like know where the
problem is.
I am using the same hw design and the same Linux kernel. I have done
only change in memory size (in DTS).

32MB: 18.3Mb/s
64MB: 15.2Mb/s
128MB: 10.6Mb/s
256MB: 3.8Mb/s

There is huge difference between systems with 32MB and 256MB ram.

I am running iperf TCP tests with these commands.
On x86: iperf -c 192.168.0.105 -i 5 -t 50
On microblaze: iperf -s

I look at pte misses which are the same on all configurations which
means that the number of do_page_fault exceptions is the same on all
configurations.
I added some hooks to low level kernel code to be able to see number of
tlb misses. There is big differences between number of misses on system
with 256MB and 32MB. I measured two kernel settings. First column is
kernel with asm optimized memcpy/memmove function and the second is
without optimization. (Kernel with asm optimized lib functions is 30%
faster than system without optimization)

32MB: 12703 13641
64MB: 1021750 655644
128MB: 1031644 531879
256MB: 1011322 430027

Most of them are data tlb misses. Microblaze MMU doesn't use any LRU
mechanism to find TLB victim that's why we there is naive TLB
replacement strategy based on incrementing counter. We using 2 tlbs for
kernel itself which are not updated that's why we can use "only" 62 TLBs
from 64.

I am using two LL_TEMAC driver which use dma and I observe the same
results on both that's why I think that the problem is in kernel itself.

It could be connection with memory management or with cache behavior.

Have you ever met with this system behavior?
Do you know about tests which I can do?



I also done several tests to identify weak kernel places via Qemu
and this is the most called functions.

Unknown label means functions outside kernel. Numbers are in %

TCP
31.47 - memcpy
15.00 - do_csum
11.93 - unknown
5.62 - __copy_tofrom_user
2.94 - memset
2.49 - default idle
1.66 - __invalidate_dcache_range
1.57 - __kmalloc
1.32 - skb_copy_bits
1.23 - __alloc_skb

UDP
51.86 - unknown
9.31 - default_idle
6.01 - __copy_tofrom_user
4.00 - do_csum
2.05 - schedule
1.92 - __muldi3
1.39 - update_curr
1.20 - __invalidate_dcache_range
1.12 - __enqueue_entity

I optimized copy_tofrom_user function to support word-copying. (Just
cover aligned cases because the most copying is aligned.) Also uaccess
unification was done.

Do you have any idea howto improve TCP/UDP performance in general?
Or tests which can point me on weak places.

I am using microblaze-next branch. The same code is in linux-next tree.

Thanks,
Michal

--
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Dumazet on
Le lundi 29 mars 2010 à 13:33 +0200, Michal Simek a écrit :

> Do you have any idea howto improve TCP/UDP performance in general?
> Or tests which can point me on weak places.

Could you post "netstat -s" on your receiver, after fresh boot and your
iperf session, for 32 MB and 256 MB ram case ?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michal Simek on
Eric Dumazet wrote:
> Le lundi 29 mars 2010 à 13:33 +0200, Michal Simek a écrit :
>
>> Do you have any idea howto improve TCP/UDP performance in general?
>> Or tests which can point me on weak places.
>
> Could you post "netstat -s" on your receiver, after fresh boot and your
> iperf session, for 32 MB and 256 MB ram case ?
>

I am not sure if is helpful but look below.

Thanks,
Michal

~ # ./netstat -s
Ip:
0 total packets received
0 forwarded
0 incoming packets discarded
0 incoming packets delivered
0 requests sent out
Icmp:
0 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
0 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
Tcp:
0 active connections openings
0 passive connection openings
0 failed connection attempts
0 connection resets received
0 connections established
0 segments received
0 segments send out
0 segments retransmited
0 bad segments received.
0 resets sent
Udp:
0 packets received
0 packets to unknown port received.
0 packet receive errors
0 packets sent
RcvbufErrors: 0
SndbufErrors: 0
UdpLite:
InDatagrams: 0
NoPorts: 0
InErrors: 0
OutDatagrams: 0
RcvbufErrors: 0
SndbufErrors: 0
error parsing /proc/net/snmp: Success



--
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michal Simek on
Michal Simek wrote:
> Eric Dumazet wrote:
>> Le lundi 29 mars 2010 à 13:33 +0200, Michal Simek a écrit :
>>
>>> Do you have any idea howto improve TCP/UDP performance in general?
>>> Or tests which can point me on weak places.
>>
>> Could you post "netstat -s" on your receiver, after fresh boot and your
>> iperf session, for 32 MB and 256 MB ram case ?
>>
>
> I am not sure if is helpful but look below.
>
Sorry I forget to c&p that second part. :-(

Look below.
Michal

32MB

~ # cat /proc/meminfo | head -n 1
MemTotal: 30024 kB
~ # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 6] local 192.168.0.10 port 5001 connected with 192.168.0.101 port 43577
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-50.0 sec 78.0 MBytes 13.1 Mbits/sec
~ # ./netstat -s
Ip:
56596 total packets received
0 forwarded
0 incoming packets discarded
56596 incoming packets delivered
15752 requests sent out
Icmp:
0 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
0 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
Tcp:
0 active connections openings
1 passive connection openings
0 failed connection attempts
0 connection resets received
0 connections established
56596 segments received
15752 segments send out
0 segments retransmited
0 bad segments received.
0 resets sent
Udp:
0 packets received
0 packets to unknown port received.
0 packet receive errors
0 packets sent
RcvbufErrors: 0
SndbufErrors: 0
UdpLite:
InDatagrams: 0
NoPorts: 0
InErrors: 0
OutDatagrams: 0
RcvbufErrors: 0
SndbufErrors: 0
error parsing /proc/net/snmp: Success



256MB


~ # cat /proc/meminfo | head -n 1
MemTotal: 257212 kB
~ # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 6] local 192.168.0.10 port 5001 connected with 192.168.0.101 port 46069
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-50.2 sec 19.5 MBytes 3.26 Mbits/sec
~ # ./netstat -s
Ip:
14163 total packets received
0 forwarded
0 incoming packets discarded
14163 incoming packets delivered
5209 requests sent out
Icmp:
0 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
0 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
Tcp:
0 active connections openings
1 passive connection openings
0 failed connection attempts
0 connection resets received
0 connections established
14163 segments received
5209 segments send out
0 segments retransmited
0 bad segments received.
0 resets sent
Udp:
0 packets received
0 packets to unknown port received.
0 packet receive errors
0 packets sent
RcvbufErrors: 0
SndbufErrors: 0
UdpLite:
InDatagrams: 0
NoPorts: 0
InErrors: 0
OutDatagrams: 0
RcvbufErrors: 0
SndbufErrors: 0
error parsing /proc/net/snmp: Success



--
Michal Simek, Ing. (M.Eng)
PetaLogix - Linux Solutions for a Reconfigurable World
w: www.petalogix.com p: +61-7-30090663,+42-0-721842854 f: +61-7-30090663
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rick Jones on
I don't know how to set fixed socket buffer sizes in iperf, if you were running
netperf though I would suggest fixing the socket buffer sizes with the
test-specific -s (affects local) and -S (affects remote) options:

netperf -t TCP_STREAM -H <remote> -l 30 -- -s 32K -S 32K -m 32K

to test the hypothesis that the autotuning of the socket buffers/window size is
allowing the windows to grow in the larger memory cases beyond what the TLB in
your processor is comfortable with.

Particularly if you didn't see much degredation as RAM is increased on something
like:

netperf -t TCP_RR -H <remote> -l 30 -- -r 1

which is a simple request/response test that will never try to have more than
one packet in flight at a time, regardless of how large the window gets.

happy benchmarking,

rick jones
http://www.netperf.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/