From: Sukadev Bhattiprolu on


I am unable to get networking to work with 2.6.18-mm1 on my system.

But 2.6.18 kernel on same system works fine. Here is some info about
the system/debug attempts. Attached are the lspci output and config.

Appreciate any help. Please let me know if you need more info.

Suka

System info:

x326, 2 CPU (AMD Opteron Processor 250)

Kernel info:

$ uname -a
Linux elm3b166 2.6.18-mm1 #4 SMP PREEMPT Tue Sep 26 18:11:58 PDT 2006
x86_64 GNU/Linux

Config tokens differing between the 2.6.18 kernel that works and
the 2.6.18-mm1 that does not are:

Tokens in 2.6.18 but not in 2.6.18-mm1 config

CONFIG_SCSI_FC_ATTRS=y
CONFIG_SCSI_SATA_SIL=y
CONFIG_SCSI_SATA=y

Tokens in 2.6.18-mm1 but not in 2.6.18 config

CONFIG_PROC_SYSCTL=y
CONFIG_SATA_SIL=y
CONFIG_ATA=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_BLOCK=y
CONFIG_VIDEO_V4L1_COMPAT=y
CONFIG_ZONE_DMA=y
CONFIG_FB_DDC=y

All drivers compiled into kernel in both cases.

Debug info:

Checked hardware connections :-)
(Rebooting on 2.6.18 kernel works - consistently)

$ ethtool -i eth0
driver: e1000
version: 7.2.7-k2
firmware-version: N/A

$ ip addr
seems fine (up, broadcasting etc)

$ ip -s link
shows no errors/drops/overruns

$ ip route
shows the correct gw

$ ethtool -S eth0

shows non-zero tx/rx packets/bytes but *rx_missed_errors*
quite large (~138K) and increasing over time

$ ping <own-ip-addr>
works fine

$ ping <gateway>
no response.

$ tcpdump -i eth0 host <broken-host>

while pinging gateway, tcpdump shows messages like:

18:03:45.936161 arp who-has <gateway> tell <broken-host>

(Config file and lspci output are attached)
From: Auke Kok on
Sukadev Bhattiprolu wrote:
>
> I am unable to get networking to work with 2.6.18-mm1 on my system.
>
> But 2.6.18 kernel on same system works fine. Here is some info about
> the system/debug attempts. Attached are the lspci output and config.
>
> Appreciate any help. Please let me know if you need more info.
>
> Suka
>
> System info:
>
> x326, 2 CPU (AMD Opteron Processor 250)
>
> Kernel info:
>
> $ uname -a
> Linux elm3b166 2.6.18-mm1 #4 SMP PREEMPT Tue Sep 26 18:11:58 PDT 2006
> x86_64 GNU/Linux
>
> Config tokens differing between the 2.6.18 kernel that works and
> the 2.6.18-mm1 that does not are:
>
> Tokens in 2.6.18 but not in 2.6.18-mm1 config
>
> CONFIG_SCSI_FC_ATTRS=y
> CONFIG_SCSI_SATA_SIL=y
> CONFIG_SCSI_SATA=y
>
> Tokens in 2.6.18-mm1 but not in 2.6.18 config
>
> CONFIG_PROC_SYSCTL=y
> CONFIG_SATA_SIL=y
> CONFIG_ATA=y
> CONFIG_ARCH_POPULATES_NODE_MAP=y
> CONFIG_CRYPTO_ALGAPI=y
> CONFIG_MICROCODE_OLD_INTERFACE=y
> CONFIG_BLOCK=y
> CONFIG_VIDEO_V4L1_COMPAT=y
> CONFIG_ZONE_DMA=y
> CONFIG_FB_DDC=y
>
> All drivers compiled into kernel in both cases.
>
> Debug info:
>
> Checked hardware connections :-)
> (Rebooting on 2.6.18 kernel works - consistently)
>
> $ ethtool -i eth0
> driver: e1000
> version: 7.2.7-k2
> firmware-version: N/A
>
> $ ip addr
> seems fine (up, broadcasting etc)
>
> $ ip -s link
> shows no errors/drops/overruns
>
> $ ip route
> shows the correct gw
>
> $ ethtool -S eth0
>
> shows non-zero tx/rx packets/bytes but *rx_missed_errors*
> quite large (~138K) and increasing over time
>
> $ ping <own-ip-addr>
> works fine
>
> $ ping <gateway>
> no response.
>
> $ tcpdump -i eth0 host <broken-host>
>
> while pinging gateway, tcpdump shows messages like:
>
> 18:03:45.936161 arp who-has <gateway> tell <broken-host>
>
> (Config file and lspci output are attached)

how about dmesg? Perhaps it shows some valuable information.

also, since this is a networking problem, please include `ifconfig eth0` and the full
output of `ethtool eth0` and `ethtool -S eth0`

Cheers,

Auke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Sukadev Bhattiprolu on
Thanks. See below for additional info

Auke Kok [auke-jan.h.kok(a)intel.com] wrote:
| Sukadev Bhattiprolu wrote:
| >
| >I am unable to get networking to work with 2.6.18-mm1 on my system.
| >
| >But 2.6.18 kernel on same system works fine. Here is some info about
| >the system/debug attempts. Attached are the lspci output and config.
| >
| >Appreciate any help. Please let me know if you need more info.
| >
| >Suka
| >
| >System info:
| >
| > x326, 2 CPU (AMD Opteron Processor 250)
| >
| >Kernel info:
| >
| > $ uname -a
| > Linux elm3b166 2.6.18-mm1 #4 SMP PREEMPT Tue Sep 26 18:11:58 PDT 2006
| > x86_64 GNU/Linux
| >
| > Config tokens differing between the 2.6.18 kernel that works and
| > the 2.6.18-mm1 that does not are:
| >
| > Tokens in 2.6.18 but not in 2.6.18-mm1 config
| >
| > CONFIG_SCSI_FC_ATTRS=y
| > CONFIG_SCSI_SATA_SIL=y
| > CONFIG_SCSI_SATA=y
| >
| > Tokens in 2.6.18-mm1 but not in 2.6.18 config
| >
| > CONFIG_PROC_SYSCTL=y
| > CONFIG_SATA_SIL=y
| > CONFIG_ATA=y
| > CONFIG_ARCH_POPULATES_NODE_MAP=y
| > CONFIG_CRYPTO_ALGAPI=y
| > CONFIG_MICROCODE_OLD_INTERFACE=y
| > CONFIG_BLOCK=y
| > CONFIG_VIDEO_V4L1_COMPAT=y
| > CONFIG_ZONE_DMA=y
| > CONFIG_FB_DDC=y
| >
| > All drivers compiled into kernel in both cases.
| >
| >Debug info:
| >
| > Checked hardware connections :-)
| > (Rebooting on 2.6.18 kernel works - consistently)
| >
| > $ ethtool -i eth0
| > driver: e1000
| > version: 7.2.7-k2
| > firmware-version: N/A
| >
| > $ ip addr
| > seems fine (up, broadcasting etc)
| >
| > $ ip -s link
| > shows no errors/drops/overruns
| >
| > $ ip route
| > shows the correct gw
| >
| > $ ethtool -S eth0
| >
| > shows non-zero tx/rx packets/bytes but *rx_missed_errors*
| > quite large (~138K) and increasing over time
| >
| > $ ping <own-ip-addr>
| > works fine
| >
| > $ ping <gateway>
| > no response.
| >
| > $ tcpdump -i eth0 host <broken-host>
| >
| > while pinging gateway, tcpdump shows messages like:
| >
| > 18:03:45.936161 arp who-has <gateway> tell <broken-host>
| >
| >(Config file and lspci output are attached)
|
| how about dmesg? Perhaps it shows some valuable information.

Am attaching the dmesg.out

|
| also, since this is a networking problem, please include `ifconfig eth0`
| and the full output of `ethtool eth0` and `ethtool -S eth0`

$ ifconfig eth0

eth0 Link encap:Ethernet HWaddr 00:02:B3:9D:D4:D7
inet addr:10.0.67.166 Bcast:10.0.67.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:564 errors:0 dropped:5927 overruns:0 frame:0
TX packets:105 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:81803 (79.8 KiB) TX bytes:6720 (6.5 KiB)
Base address:0x3400 Memory:e8240000-e8260000


$ ethtool eth0

Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: umbg
Wake-on: g
Current message level: 0x00000007 (7)
Link detected: yes

$ ethtool -S eth0

NIC statistics:
rx_packets: 564
tx_packets: 105
rx_bytes: 81803
tx_bytes: 6720
rx_errors: 0
tx_errors: 0
tx_dropped: 0
multicast: 11
collisions: 0
rx_length_errors: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
rx_no_buffer_count: 310
rx_missed_errors: 5865
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_window_errors: 0
tx_abort_late_coll: 0
tx_deferred_ok: 0
tx_single_coll_ok: 0
tx_multi_coll_ok: 0
tx_timeout_count: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
rx_align_errors: 0
tx_tcp_seg_good: 0
tx_tcp_seg_failed: 0
rx_flow_control_xon: 0
rx_flow_control_xoff: 0
tx_flow_control_xon: 0
tx_flow_control_xoff: 0
rx_long_byte_count: 81803
rx_csum_offload_good: 0
rx_csum_offload_errors: 0
rx_header_split: 0
alloc_rx_buff_failed: 0

Hope this helps. Let me know if you need more info.

Suka
From: Jesse Brandeburg on
On 9/28/06, Sukadev Bhattiprolu <sukadev(a)us.ibm.com> wrote:
> Thanks. See below for additional info
>
> Auke Kok [auke-jan.h.kok(a)intel.com] wrote:
> | Sukadev Bhattiprolu wrote:
> | >
> | >I am unable to get networking to work with 2.6.18-mm1 on my system.
> | >
> | >But 2.6.18 kernel on same system works fine. Here is some info about
> | >the system/debug attempts. Attached are the lspci output and config.
> | >
> | >Appreciate any help. Please let me know if you need more info.

It seems you're having interrupt delivery problems or interrupts are
getting lost.
rx_missed_errors indicates frames that were dropped due to the e1000
adapter's fifo getting full and over flowing.
> rx_no_buffer_count: 310
> rx_missed_errors: 5865
rx_no_buffer_count indicates that the driver didn't return buffers to
the hardware soon enough, but the hardware was able to store the
packet (at the time of reception) in the fifo to try again.

Both these indicate to me that there is something wrong with
interrupts. Maybe interrupt sharing

can you possibly try a back to back connection with another linux box
and run tcpdump on both ends then ping? it will tell us if traffic is
truely getting out and coming in okay.

also please send output of lspci -vv and cat /proc/interrupts

Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Sukadev Bhattiprolu on
Jesse Brandeburg [jesse.brandeburg(a)gmail.com] wrote:
| On 9/28/06, Sukadev Bhattiprolu <sukadev(a)us.ibm.com> wrote:
| >Thanks. See below for additional info
| >
| >Auke Kok [auke-jan.h.kok(a)intel.com] wrote:
| >| Sukadev Bhattiprolu wrote:
| >| >
| >| >I am unable to get networking to work with 2.6.18-mm1 on my system.
| >| >
| >| >But 2.6.18 kernel on same system works fine. Here is some info about
| >| >the system/debug attempts. Attached are the lspci output and config.
| >| >
| >| >Appreciate any help. Please let me know if you need more info.
|
| It seems you're having interrupt delivery problems or interrupts are
| getting lost.
| rx_missed_errors indicates frames that were dropped due to the e1000
| adapter's fifo getting full and over flowing.
| >rx_no_buffer_count: 310
| >rx_missed_errors: 5865
| rx_no_buffer_count indicates that the driver didn't return buffers to
| the hardware soon enough, but the hardware was able to store the
| packet (at the time of reception) in the fifo to try again.
|
| Both these indicate to me that there is something wrong with
| interrupts. Maybe interrupt sharing
|
| can you possibly try a back to back connection with another linux box
| and run tcpdump on both ends then ping? it will tell us if traffic is
| truely getting out and coming in okay.

Unfortunately, I can't try this week, but can try it early next week.

|
| also please send output of lspci -vv and cat /proc/interrupts

lspci-vv.out is attached. Here is the /proc/interrupts:

$ cat /proc/interrupts

CPU0 CPU1
0: 18316 0 IO-APIC-edge timer
2: 0 0 XT-PIC-level cascade
4: 1023 0 IO-APIC-edge serial
8: 0 0 IO-APIC-edge rtc
17: 3380 0 IO-APIC-fasteoi libata
19: 174 0 IO-APIC-fasteoi ohci_hcd:usb1, ohci_hcd:usb2
28: 0 0 IO-APIC-fasteoi eth0
NMI: 96 35
LOC: 18251 18524
ERR: 0