From: Rahul on
Rick Jones <rick.jones2(a)hp.com> wrote in news:i1fvvl$adq$1
@usenet01.boi.hp.com:

>> Why is the cache maintained on a time basis?
>
> It helps to bound "fail-over" time when an IP is migrated from being
> associated with one MAC address to another.
>

My other concern is that a lot of my codes are latency sensitive. Thus
whenever a IP is not found in the cache this means an additional ARP lookup
will be needed. So I am afraid that this will degrade my effective latency.

That's why I am trying to keep all my MAC<->IP pairs cached. If that is a
reasonable strategy.

--
Rahul
From: Pascal Hambourg on
Hello,

Rahul a �crit :
>
> I didn't realize the switches have a ARP cache timeout too.

They don't. At least pure layer-2 switches, because they don't care
about ARP or any other protocol above ethernet.

>> Something is fucked with your capture data. Example - the first line
>> shows Dull 58:ec:29 _broadcasting_ an ARP reply. That should be a
>> unicast from Dull 58:ec:29 to the MAC of the querying system.

That could be some kind of gratuitous ARP.
From: Rick Jones on
Rahul <nospam(a)nospam.invalid> wrote:
> My other concern is that a lot of my codes are latency
> sensitive. Thus whenever a IP is not found in the cache this means
> an additional ARP lookup will be needed. So I am afraid that this
> will degrade my effective latency.

Just *how* latency sensitive? If one request/response pair out of N,
where N could be quite large depending on how fast you are running,
has an extra RTT added to it is that really going to cause problems?
Is a LAN RTT even a non-trivial fraction of the service time of your
application?

> That's why I am trying to keep all my MAC<->IP pairs cached. If that is a
> reasonable strategy.

Reasonable is subjective.

Some platforms allow the addition of "permanent" entries in the ARP
cache via the likes of the arp command. If one does add a permanent
entry, s/he becomes responsible for dealing with the IP moving from
one MAC to another case themselves.

rick jones
--
a wide gulf separates "what if" from "if only"
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
From: Moe Trin on
On Mon, 12 Jul 2010, in the Usenet newsgroup comp.os.linux.networking, in
article <Xns9DB393D05BB736650A1FC0D7811DDBC81(a)85.214.73.210>, Rahul wrote:

>(Moe Trin) wrote

>> For the Linux kernel, this is NORMALLY a compile-time setting. You
>> may be able to increase the timeout.

>Why is the cache maintained on a time basis?

Because that's what RFC1122 suggests ;-). Weeding out unused entries
(which translates to efficiencies) and fall-over basically. See the
first sentence in section 2.3.2.1 of the RFC.

>Isn't it more logial to specify the max number of ARP cache entries?
>Or are the two approaches identical?

John Madden (co-)wrote a book in 1987 with a title "One Size Doesn't
Fit All". The title also applies to network design/operation.

>> How "busy" is the network - how many hosts talking to how many
>> hosts how often?

>But I have no way to quantify it right now. In fact, what tool does
>one use to answer the question you raised: "How "busy" is the network?"

[compton ~]$ whatis ntop ngrep tcpdump
ntop (8) - display top network users
ngrep (8) - network grep
tcpdump (8) - dump traffic on a network
[compton ~]$

Though with the switches, you'll have to do some thinking about where
to do the sniffing. A separate box attached to the monitor port on
the switches? With tcpdump (or wireshark, or any of the twenty-odd
other packet sniffer applications), I'd probably try to capture N
seconds of network traffic from X representative locations and analyze
it "off-line".

>I don't have access to the switches so can't get any switch side
>stats. unfortunately. All monitoring will have to be server-side.

That makes it more difficult, and confusing. To get "the whole
picture", you'll need to sniff in more places, then attempt to patch
the results/observations together.

>> Overlaying networks rarely serves any useful purpose other than
>> to increase overhead. Are you sure this is needed?

>I am not sure. Maybe my design decision was wrong. The situation is
>that we have normal traffic as well as IPMI (maintainance mode)
>traffic piggybacking over the same physical wire and adapters.
>Conceptually I thought it made sense to keep those seperate? But I
>am open to sugesstions if this was a bad idea.

Well, it does make sense, but piggybacking isn't separating things
(and is actually increasing the traffic slightly). You might look at
policy based routing (man tc) though a lot depends on how much the
IPMI traffic is, and how it conflicts with the normal traffic.

>> It's bad enough with 265 hosts in one collision domain, never
>> mind 530.

>But that is only relevant for broadcast traffic, correct? Unicast
>traffic will be intelligently handled by the switch so that the
>collission domain is only equal to the number of switch ports?
>Pardon my networking ignorance if this is wrong.

You may want to check the specifications on those switches, but the
last time I looked, most can't handle full port speed on every port
at the same time - by a long shot. Again - who is talking to who
and how does that traffic get from Host A to Host N or what-ever.
How much traffic is there on each "network"?

>> Doing a traffic analysis (who is talking to who) could be a real
>> eye-opener, suggesting a more efficient layout.

>Is tcpdump the tool of choice for this? Or wireshark? Or something else?

Those are packet capture tools - once you've captured the traffic, you
would want to analyze what you've got. That may be ntop, ngrep, or
even tcpdump piped to grep or similar. Looking at "from", "to", "time"
and "packet size" may provide enough clues.

>Is there a downside to having a larger ARP cache? I mean sure, it
>takes more memory but these days RAM is cheap and anyways a 1000
>row IP<->MAC lookup table is not a big size.

Your network stack doesn't use CPU cycles? ;-) No, the lookup is
a fairly light burden. Where it matters is the idea of having all of
those hosts on a single wire - most of us don't put that many on a
single network, and thus the larger cache/table is wasted space/CPU
cycles. Again - one size doesn't fit all.

>> If the network is using switches, _broadcast_ packets are heard by all

>The network is switched. Each switch takes around 48 hosts so we have
>6 Cisco-Catalyst switches interconnected with 10GigE fiber links.

OK, I think.

>> If using switches, you need also look at the timeouts in the
>> individual switches as well.

>Ah! Thanks! I didn't realize the switches have a ARP cache timeout
>too. Makes sense. I'll ask my networking folks about that.

They're not really ARP caches, so much as a lookup table of which host
is connected to which hose. When the switch looses it - or when it
can't figure out which port a host is on, it will often broadcast the
packet to all ports - not good for efficiency.

>> Something is fucked with your capture data. Example - the first line
>> shows Dull 58:ec:29 _broadcasting_ an ARP reply.

Pascal suggests that could be gratuitous ARP - a system asking if any
one else is using this IP address. I'd hope that your setup is NOT
using DHCP, and your application isn't addressing other hosts in the
cluster by name. Both would add overhead packets that are not needed.

>Wow! You are right. I never noticed this. I will definately dig deeper
>into this. Something is not right.

If absolute network speed is required/desired, I'd look at ways to
minimize un-needed traffic. If that means a second independent network
for maintenance/admin traffic - so be it. Ethernet is a MAC oriented
protocol, and overhead can be minimized by using fixed (not DHCP) IP
addresses for all, using a 'fixed' /etc/ethers file in place of
over-the-wire ARP[1], and seeing that the application[s] are using IP
addresses rather than node/host names - if that can't be avoided, then
put those names into a /etc/hosts file. They may seem like little
things, but they add up.

Old guy

[1] 'man ifconfig' and you may discover the '-arp' option. I do NOT
recommend this, but DO recommend '/sbin/arp -f /etc/ethers' or similar.
From: Rahul on
Rick Jones <rick.jones2(a)hp.com> wrote in
news:i1ia16$7sa$2(a)usenet01.boi.hp.com:

> Is a LAN RTT even a non-trivial fraction of the service time of your
> application?

Yes, I think it is. It is a MPI application (computational chemistry :
VASP) using distributed memory that does a fair amount of small-packet
traffic.

> Just *how* latency sensitive?

It is hard to say since I don't know of a way to vary latency on demand (is
there a way? I'd be eager to know!) to test response. These are the data
points I have:

Using 6 servers with 8 cores each.
RT Latency Job Runtime (normalised secs)
130 usec 10x
18 usec 1.5x
7 usec 1x

18 usec is my current network.

>If one request/response pair out of N,
> where N could be quite large depending on how fast you are running,
> has an extra RTT added to it is that really going to cause problems?

You are probably right. It won't matter. It depends on how large is N.

--
Rahul