From: Eric Dumazet on
Le jeudi 08 avril 2010 à 15:54 +0800, Zhang, Yanmin a écrit :

> If there are 2 nodes in the machine, processes on node 0 will contact MCH of
> node 1 to access memory of node 1. I suspect the MCH of node 1 might enter
> a power-saving mode when all the cpus of node 1 are free. So the transactions
> from MCH 1 to MCH 0 has a larger latency.
>

Hmm, thanks for the hint, I will investigate this.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Zhang, Yanmin on
On Thu, 2010-04-08 at 09:00 +0200, Eric Dumazet wrote:
> Le jeudi 08 avril 2010 � 07:39 +0200, Eric Dumazet a �crit :
> > I suspect NUMA is completely out of order on current kernel, or my
> > Nehalem machine NUMA support is a joke
> >
> > # numactl --hardware
> > available: 2 nodes (0-1)
> > node 0 size: 3071 MB
> > node 0 free: 2637 MB
> > node 1 size: 3062 MB
> > node 1 free: 2909 MB
> >
> >
> > # cat try.sh
> > hackbench 50 process 5000
> > numactl --cpubind=0 --membind=0 hackbench 25 process 5000 >RES0 &
> > numactl --cpubind=1 --membind=1 hackbench 25 process 5000 >RES1 &
> > wait
> > echo node0 results
> > cat RES0
> > echo node1 results
> > cat RES1
> >
> > numactl --cpubind=0 --membind=1 hackbench 25 process 5000 >RES0_1 &
> > numactl --cpubind=1 --membind=0 hackbench 25 process 5000 >RES1_0 &
> > wait
> > echo node0 on mem1 results
> > cat RES0_1
> > echo node1 on mem0 results
> > cat RES1_0
> >
> > # ./try.sh
> > Running with 50*40 (== 2000) tasks.
> > Time: 16.865
> > node0 results
> > Running with 25*40 (== 1000) tasks.
> > Time: 16.767
> > node1 results
> > Running with 25*40 (== 1000) tasks.
> > Time: 16.564
> > node0 on mem1 results
> > Running with 25*40 (== 1000) tasks.
> > Time: 16.814
> > node1 on mem0 results
> > Running with 25*40 (== 1000) tasks.
> > Time: 16.896
>
> If run individually, the tests results are more what we would expect
> (slow), but if machine runs the two set of process concurrently, each
> group runs much faster...
If there are 2 nodes in the machine, processes on node 0 will contact MCH of
node 1 to access memory of node 1. I suspect the MCH of node 1 might enter
a power-saving mode when all the cpus of node 1 are free. So the transactions
from MCH 1 to MCH 0 has a larger latency.

>
>
> # numactl --cpubind=0 --membind=1 hackbench 25 process 5000
> Running with 25*40 (== 1000) tasks.
> Time: 21.810
>
> # numactl --cpubind=1 --membind=0 hackbench 25 process 5000
> Running with 25*40 (== 1000) tasks.
> Time: 20.679
>
> # numactl --cpubind=0 --membind=1 hackbench 25 process 5000 >RES0_1 &
> [1] 9177
> # numactl --cpubind=1 --membind=0 hackbench 25 process 5000 >RES1_0 &
> [2] 9196
> # wait
> [1]- Done numactl --cpubind=0 --membind=1 hackbench
> 25 process 5000 >RES0_1
> [2]+ Done numactl --cpubind=1 --membind=0 hackbench
> 25 process 5000 >RES1_0
> # echo node0 on mem1 results
> node0 on mem1 results
> # cat RES0_1
> Running with 25*40 (== 1000) tasks.
> Time: 13.818
> # echo node1 on mem0 results
> node1 on mem0 results
> # cat RES1_0
> Running with 25*40 (== 1000) tasks.
> Time: 11.633
>
> Oh well...
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Dumazet on
Le jeudi 08 avril 2010 à 09:54 +0200, Eric Dumazet a écrit :
> Le jeudi 08 avril 2010 à 15:54 +0800, Zhang, Yanmin a écrit :
>
> > If there are 2 nodes in the machine, processes on node 0 will contact MCH of
> > node 1 to access memory of node 1. I suspect the MCH of node 1 might enter
> > a power-saving mode when all the cpus of node 1 are free. So the transactions
> > from MCH 1 to MCH 0 has a larger latency.
> >
>
> Hmm, thanks for the hint, I will investigate this.

Oh well,

perf timechart record &

Instant crash

Call Trace:
perf_trace_sched_switch+0xd5/0x120
schedule+0x6b5/0x860
retint_careful+0xd/0x21

RIP ffffffff81010955 perf_arch_fetch_caller_regs+0x15/0x40
CR2: 00000000d21f1422


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Lameter on
On Thu, 8 Apr 2010, Eric Dumazet wrote:

> I suspect NUMA is completely out of order on current kernel, or my
> Nehalem machine NUMA support is a joke
>
> # numactl --hardware
> available: 2 nodes (0-1)
> node 0 size: 3071 MB
> node 0 free: 2637 MB
> node 1 size: 3062 MB
> node 1 free: 2909 MB

How do the cpus map to the nodes? cpu 0 and 1 both on the same node?

> # ./try.sh
> Running with 50*40 (== 2000) tasks.
> Time: 16.865
> node0 results
> Running with 25*40 (== 1000) tasks.
> Time: 16.767
> node1 results
> Running with 25*40 (== 1000) tasks.
> Time: 16.564
> node0 on mem1 results
> Running with 25*40 (== 1000) tasks.
> Time: 16.814
> node1 on mem0 results
> Running with 25*40 (== 1000) tasks.
> Time: 16.896
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Dumazet on
Le jeudi 08 avril 2010 à 10:34 -0500, Christoph Lameter a écrit :
> On Thu, 8 Apr 2010, Eric Dumazet wrote:
>
> > I suspect NUMA is completely out of order on current kernel, or my
> > Nehalem machine NUMA support is a joke
> >
> > # numactl --hardware
> > available: 2 nodes (0-1)
> > node 0 size: 3071 MB
> > node 0 free: 2637 MB
> > node 1 size: 3062 MB
> > node 1 free: 2909 MB
>
> How do the cpus map to the nodes? cpu 0 and 1 both on the same node?

one socket maps to 0 2 4 6 8 10 12 14 (Node 0)
one socket maps to 1 3 5 7 9 11 13 15 (Node 1)

# numactl --cpubind=0 --membind=0 numactl --show
policy: bind
preferred node: 0
interleavemask:
interleavenode: 0
nodebind: 0
membind: 0
cpubind: 1 3 5 7 9 11 13 15 1024

(strange 1024 report...)

# numactl --cpubind=1 --membind=1 numactl --show
policy: bind
preferred node: 1
interleavemask:
interleavenode: 0
nodebind:
membind: 1
cpubind: 0 2 4 6 8 10 12 14



[ 0.161170] Booting Node 0, Processors #1
[ 0.248995] CPU 1 MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
[ 0.269177] Ok.
[ 0.269453] Booting Node 1, Processors #2
[ 0.356965] CPU 2 MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
[ 0.377207] Ok.
[ 0.377485] Booting Node 0, Processors #3
[ 0.464935] CPU 3 MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
[ 0.485065] Ok.
[ 0.485217] Booting Node 1, Processors #4
[ 0.572906] CPU 4 MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
[ 0.593044] Ok.
....
grep "physical id" /proc/cpuinfo
physical id : 1
physical id : 0
physical id : 1
physical id : 0
physical id : 1
physical id : 0
physical id : 1
physical id : 0
physical id : 1
physical id : 0
physical id : 1
physical id : 0
physical id : 1
physical id : 0
physical id : 1
physical id : 0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/