From: Doug Freyburger on
Has anyone seen ping/route delays across a crossover Ethernet cable?

I've got two clustered pairs of Oracle hosts where every couple of
months one side gets ejected from the cluster because of a heatbeat
timeout. I've traced it to a delay over the private LAN connection used
as a heartbeat. It happens on two different clusters that are set up
the same way. One is production the other the QA systems. There is
both ASM and OCFS2.

At layer two the connection on eth2 is clean -

$ netstat --interfaces=eth2
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth2 1500 0 844008521 0 0 0 1160340279 0
0 0 BMRU

The arp table remains stable with the two hosts seeing each other -

$ arp -an | grep eth2
? (192.168.2.111) at 00:18:FE:83:C3:12 [ether] on eth2

Switching my focus to layer 3 from here on in ...

The default route goes out eth1. There is a 192.168.2 route out of eth2
and also a route to 169.254 having to do with DHCP that "should" not
have any effect -

$ netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
10.10.3.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth2
10.10.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth1
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth2
0.0.0.0 10.10.1.1 0.0.0.0 UG 0 0
0 eth1

On the QA cluster I've turned off ZEROCONF just in case and that
change will go into effct on their next reboots -

$ grep ZERO /etc/sysconfig/network
NOZEROCONF=yes

There's no routed or gated running. It's just the default route as far
as I can tell. Oracle ASM appears to run "rdisc" when it joins the
cluster but it leaves the default route in place.

What I see is most of the time a traceroute from one node in the
clustered pair to the other returns fast -

Mon Aug 2 17:43:58 CDT 2010
traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
1 oracleprod01-priv (192.168.2.111) 0.100 ms 0.082 ms 0.071 ms
-----------------
Mon Aug 2 17:44:08 CDT 2010
traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
1 oracleprod01-priv (192.168.2.111) 0.107 ms 0.087 ms 0.072 ms

But every once in a while it goes slowly -

Tue Jul 27 13:14:09 CDT 2010
traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
1 * oracleprod01-priv (192.168.2.111) 0.125 ms 0.084 ms
-----------------
Tue Jul 27 13:14:24 CDT 2010
traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
1 * oracleprod01-priv (192.168.2.111) 0.100 ms 0.081 ms
-----------------
Tue Jul 27 13:14:39 CDT 2010
traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
1 * oracleprod01-priv (192.168.2.111) 0.117 ms 0.084 ms
-----------------

The star represents a 5 second timeout in the response. That's a delay
across a twisted pair connection with no change int he routing table.

One thing I noticed is the star lasts an even multiple of 90 seconds.
Usually it goes away after 90 seconds, sometimes after 180 or 270
seconds. On occasion it switches to a different number of stars. I
think the cluster ejection and panic happens when it reaches a 15 second
delay or 3 stars.

When I saw the 90 second timeframe of the delays my first thought was
the RIP route calculation cycle. That's when I checked that neither
gated nor routed were running and I confirmed the routing table does not
change. I also switched from a switch to a crossover cable to eliminate
the possibiliy that the switch is doing a routing calculation.

Has anyone seen a delay like this over a dedicated link?

The link is gigabit ethernet that typically runs under 1% capacity so I
don't think it's overwhelmed with other traffic. The 90 time scale of
the delays doesn't fit and the times when the star appears in my trace
log don't match peak database loads.

Thanks in advance if anyone has seen an unexplained delay across an
ethernet connection.

The hosts are in each other's /etc/hosts file and my traceroute loop
uses the IP number so it isn't DNS -

$ grep hosts /etc/nsswitch.conf
#hosts: db files ldap nis dns
hosts: files dns

With DNS the delay would happen before the traceroute tries to send the
packet anyways.
From: The Natural Philosopher on
Doug Freyburger wrote:
>
> Has anyone seen a delay like this over a dedicated link?
>

Guesswork. Are the interfaces running duplex?
have you tried using a switch rather than crossover?

> The link is gigabit ethernet that typically runs under 1% capacity so I
> don't think it's overwhelmed with other traffic. The 90 time scale of
> the delays doesn't fit and the times when the star appears in my trace
> log don't match peak database loads.
>
> Thanks in advance if anyone has seen an unexplained delay across an
> ethernet connection.
>
Surely that's a packet LOSS not a delay.
From: Nico Kadel-Garcia on
On Aug 2, 6:59 pm, Doug Freyburger <dfrey...(a)yahoo.com> wrote:
> Has anyone seen ping/route delays across a crossover Ethernet cable?

Not from well-made cables. Is it possible that you have a bad cable or
interface? And if you have GigE ports on each end, you don't need a
crossover cable: copper GigE ports are hermaphroditic, and can use
standard or crossover cables. If someone handmade that cable who's not
very good (as many hurried or casual cablemakers are not), then it's a
reasonable suspect and should be cheap to replace.

> I've got two clustered pairs of Oracle hosts where every couple of
> months one side gets ejected from the cluster because of a heatbeat
> timeout.  I've traced it to a delay over the private LAN connection used
> as a heartbeat.  It happens on two different clusters that are set up
> the same way.  One is production the other the QA systems.  There is
> both ASM and OCFS2.
>
> At layer two the connection on eth2 is clean -
>
> $ netstat --interfaces=eth2
> Kernel Interface table
> Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
> eth2       1500   0 844008521      0      0      0 1160340279      0  
>  0      0 BMRU
>
> The arp table remains stable with the two hosts seeing each other -
>
> $ arp -an | grep eth2
> ? (192.168.2.111) at 00:18:FE:83:C3:12 [ether] on eth2
>
> Switching my focus to layer 3 from here on in ...
>
> The default route goes out eth1.  There is a 192.168.2 route out of eth2
> and also a route to 169.254 having to do with DHCP that "should" not
> have any effect -

Yeah, the 169.254 is irritating but fairly normal.

> $ netstat -rn            
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
> 10.10.3.0       0.0.0.0         255.255.255.0   U         0 0          0 eth1
> 192.168.2.0     0.0.0.0         255.255.255.0   U         0 0          0 eth2
> 10.10.0.0       0.0.0.0         255.255.0.0     U         0 0          0 eth1
> 169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth2
> 0.0.0.0         10.10.1.1       0.0.0.0         UG        0 0        
> 0 eth1
>
> On the QA cluster I've turned off ZEROCONF just in case and that
> change will go into effct on their next reboots -
>
> $ grep ZERO /etc/sysconfig/network
> NOZEROCONF=yes
>
> There's no routed or gated running.  It's just the default route as far
> as I can tell.  Oracle ASM appears to run "rdisc" when it joins the
> cluster but it leaves the default route in place.

Well, injecting any sophisticated clustering into the system is often
begging for pain. The number of times with expensive, sophisticated,
"high-availability" tools where they proceeded to screw up my
environments is legion, and I've gathered a very strong distrust of
all of them.

> What I see is most of the time a traceroute from one node in the
> clustered pair to the other returns fast -
>
> Mon Aug  2 17:43:58 CDT 2010
> traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
>  1  oracleprod01-priv (192.168.2.111)  0.100 ms  0.082 ms  0.071 ms
> -----------------
> Mon Aug  2 17:44:08 CDT 2010
> traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
>  1  oracleprod01-priv (192.168.2.111)  0.107 ms  0.087 ms  0.072 ms
>
> But every once in a while it goes slowly -
>
> Tue Jul 27 13:14:09 CDT 2010
> traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
>  1  * oracleprod01-priv (192.168.2.111)  0.125 ms  0.084 ms
> -----------------
> Tue Jul 27 13:14:24 CDT 2010
> traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
>  1  * oracleprod01-priv (192.168.2.111)  0.100 ms  0.081 ms
> -----------------
> Tue Jul 27 13:14:39 CDT 2010
> traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
>  1  * oracleprod01-priv (192.168.2.111)  0.117 ms  0.084 ms
> -----------------
>
> The star represents a 5 second timeout in the response.  That's a delay
> across a twisted pair connection with no change int he routing table.

It's an overall delay. The cable is a suspect: so are your NIC's,a nd
so is your system.

> One thing I noticed is the star lasts an even multiple of 90 seconds.
> Usually it goes away after 90 seconds, sometimes after 180 or 270
> seconds.  On occasion it switches to a different number of stars.  I
> think the cluster ejection and panic happens when it reaches a 15 second
> delay or 3 stars.

That seems reasonable.


> When I saw the 90 second timeframe of the delays my first thought was
> the RIP route calculation cycle.  That's when I checked that neither
> gated nor routed were running and I confirmed the routing table does not
> change.  I also switched from a switch to a crossover cable to eliminate
> the possibiliy that the switch is doing a routing calculation.
>
> Has anyone seen a delay like this over a dedicated link?

Not without some actual problem going on. Is it associated with heavy
load?

> The link is gigabit ethernet that typically runs under 1% capacity so I
> don't think it's overwhelmed with other traffic.  The 90 time scale of
> the delays doesn't fit and the times when the star appears in my trace
> log don't match peak database loads.

Is the traffic heavy in harsh bursts?

> Thanks in advance if anyone has seen an unexplained delay across an
> ethernet connection.
>
> The hosts are in each other's /etc/hosts file and my traceroute loop
> uses the IP number so it isn't DNS -
>
> $ grep hosts /etc/nsswitch.conf
> #hosts:     db files ldap nis dns
> hosts:      files dns
>
> With DNS the delay would happen before the traceroute tries to send the
> packet anyways.

From: Marc Haber on
Doug Freyburger <dfreybur(a)yahoo.com> wrote:
>Has anyone seen ping/route delays across a crossover Ethernet cable?
>
>I've got two clustered pairs of Oracle hosts where every couple of
>months one side gets ejected from the cluster because of a heatbeat
>timeout. I've traced it to a delay over the private LAN connection used
>as a heartbeat. It happens on two different clusters that are set up
>the same way. One is production the other the QA systems. There is
>both ASM and OCFS2.

When do these disconnects happen? At random times? Or always at the
same time of day? When the cleaning woman enters the server room?

Using a different patch cable is a good idea.

>
>At layer two the connection on eth2 is clean -
>
>$ netstat --interfaces=eth2
>Kernel Interface table
>Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
>eth2 1500 0 844008521 0 0 0 1160340279 0
> 0 0 BMRU
>
>The arp table remains stable with the two hosts seeing each other -
>
>$ arp -an | grep eth2
>? (192.168.2.111) at 00:18:FE:83:C3:12 [ether] on eth2

That doesn't mean that your layer 2 is OK, that only means that it
_was_ ok somewhere in the last arp timeout number of seconds.

>Switching my focus to layer 3 from here on in ...
>
>The default route goes out eth1. There is a 192.168.2 route out of eth2
>and also a route to 169.254 having to do with DHCP that "should" not
>have any effect -

169.254.0.0/16 is the network used by APIPA which is used in the
_absence_ of a DHCP server on a DHCP-enabled client.

>But every once in a while it goes slowly -
>
>Tue Jul 27 13:14:09 CDT 2010
>traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
> 1 * oracleprod01-priv (192.168.2.111) 0.125 ms 0.084 ms
>-----------------
>Tue Jul 27 13:14:24 CDT 2010
>traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
> 1 * oracleprod01-priv (192.168.2.111) 0.100 ms 0.081 ms
>-----------------
>Tue Jul 27 13:14:39 CDT 2010
>traceroute to oracleprod01-priv (192.168.2.111), 30 hops max, 38 byte packets
> 1 * oracleprod01-priv (192.168.2.111) 0.117 ms 0.084 ms
>-----------------

That's a loss, and a layer 2 issue. Just leave a ping running and see.

No offense intended, but you seem to be using the wrong tools to
diagnose your issue, draw the wrong conclusions from their
misinterpreted results and are still running a database cluster. How
about asking somebody who knows his way around the systems to fix your
issues?

Greetings
Marc
--
-------------------------------------- !! No courtesy copies, please !! -----
Marc Haber | " Questions are the | Mailadresse im Header
Mannheim, Germany | Beginning of Wisdom " | http://www.zugschlus.de/
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 621 72739834
From: Doug Freyburger on
The Natural Philosopher <t...(a)invalid.invalid> wrote:
> Doug Freyburger wrote:
>
> > Has anyone seen a delay like this over a dedicated link?
>
> Guesswork. Are the interfaces running duplex?

ethtool says the interface is runing 1 Gb/s full duplex on both hosts.

> have you tried using a switch rather than crossover?

The switch was the initial state. Switching to a crossover cable did
not change the behavior.

> > Thanks in advance if anyone has seen an unexplained delay across an
> > ethernet connection.
>
> Surely that's a packet LOSS not a delay.

I'll run a ping into a log file in parallel to try to measure that.
I'll also
take another look at TCP and UDP error and retransmit rates. They
looked low to me and that's why I moved on to the routing theory.

Nico Kadel-Garcia wrote:
>
> Well, injecting any sophisticated clustering into the system is often
> begging for pain. The number of times with expensive, sophisticated,
> "high-availability" tools where they proceeded to screw up my
> environments is legion, and I've gathered a very strong distrust of
> all of them.

Exactly. Managers tend to want clustering as a way to get around
the MTBF of the hardware. It's a strategy that ignores the MTBF of
the software and the MTBF of the staff. My first experience with
clustering was about 1993 and at that time the software failed so
often I pulled it to improve reliability.

Not that I can be positive it isn't a hardware problem. The
interfaces
are in full duplex as any gigabit Ethernet has to be but full duplex
mode turns off a lot of level 2 error reporting.

> the 169.254 is irritating but fairly normal

Reading up on it taught me that. May as well turn it off at the next
reboot anyways but it's not likely to have anything to do with it.

Marc Haber wrote:
>
> When do these disconnects happen? At random times? Or always at the
> same time of day?

Unfortunately the timing of the delays does not map to Oracle high
traffic times. That's one of the first things Oracle asked about a
year
ago after the first panic reboot. Oracle keeps saying it's a network
problem and I agreed with that early on. But the reboots have been
too infrequent to get good data until the more frequent traceroute
delays were measured.

> When the cleaning woman enters the server room?

This actually happened to me about 1981. A VAX-11/750 kept
crashing and I was the last one in the office so I kept getting
quizzed about it. Then one day it happened when I had left early.
I watched where the janitors plugged in their vacuum cleaners
and asked about what plugs were on what circuits.

> Using a different patch cable is a good idea.

Two different cables have been swapped from known good
connections to double check that.

> >$ arp -an | grep eth2
> >? (192.168.2.111) at 00:18:FE:83:C3:12 [ether] on eth2
>
> That doesn't mean that your layer 2 is OK, that only means that it
> _was_ ok somewhere in the last arp timeout number of seconds.

With the further caveat that in full duplex mode the zeros from
"netstat -in" also don't mean layer 2 is okay. That's my best lead.