From: Christophe Jelger on
Hello,

I am currently "resurrecting" a Linux module (called LUNAR) which I
co-developed in 2007 and I'm having a weird kernel crash. This code
basically used to work fine up to 2.6.18 which was the latest version
before we stopped our development. I quickly ported it to 2.6.{31,32}:
it compiles fine and loads fine, but it crashes/hangs the kernel when
it's really being used.

The module is a virtual device used for MANET routing: with the current
version, it basically "captures" DNS requests sent to the virtual
interface --> this triggers the sending of a fake DNS reply (see below)
and the creation of an ARP table entry for the destination (the MANET
route is built at the same time). Packets can then be sent to the
destination.

The problem I'm having is that the kernel quickly hangs after I create a
new ARP entry (actually only if it's being used). If the entry I create
is set to NUD_PERMANENT, then everything works fine! I use
__neigh_lookup_errno to lookup/create the entry and neigh_lookup to
set/update the MAC address. Note that the ARP entry is created without
problem, but typically even just doing a userspace "arp -a" command can
crash the kernel (it also hangs the userspace command!). Doing "arp -na"
usually does NOT crash the kernel!

I guess the problem comes from a combination of ARP + DNS
lookups/replies. Note that my kernel module has its own internal fake
DNS server which captures lookups and sends replies directly back to the
stack. What is amazing: if the ARP entry I create is set to
NUD_PERMANENT, then I don't get any crash (however I cannot develop my
module with permanent ARP entries).

I'm wondering if there were any major changes to the neighbor and arp
code (between 2.6.18 and 2.6.31) that are somehow causing this problem ?...

Any hint is very welcome.

thanks in advance,
Christophe

PS: I can easily reproduce the problem, and was trying to debug with
qemu and gdb server but so fra no success to clearly identify the
problem. Last point: it seems the kernel does not really "crash" but
rather ends up in some unstable state and maybe in a loop.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Dumazet on
Le lundi 07 juin 2010 à 12:21 +0200, Christophe Jelger a écrit :
> Hello,
>
> I am currently "resurrecting" a Linux module (called LUNAR) which I
> co-developed in 2007 and I'm having a weird kernel crash. This code
> basically used to work fine up to 2.6.18 which was the latest version
> before we stopped our development. I quickly ported it to 2.6.{31,32}:
> it compiles fine and loads fine, but it crashes/hangs the kernel when
> it's really being used.
>
> The module is a virtual device used for MANET routing: with the current
> version, it basically "captures" DNS requests sent to the virtual
> interface --> this triggers the sending of a fake DNS reply (see below)
> and the creation of an ARP table entry for the destination (the MANET
> route is built at the same time). Packets can then be sent to the
> destination.
>
> The problem I'm having is that the kernel quickly hangs after I create a
> new ARP entry (actually only if it's being used). If the entry I create
> is set to NUD_PERMANENT, then everything works fine! I use
> __neigh_lookup_errno to lookup/create the entry and neigh_lookup to
> set/update the MAC address. Note that the ARP entry is created without
> problem, but typically even just doing a userspace "arp -a" command can
> crash the kernel (it also hangs the userspace command!). Doing "arp -na"
> usually does NOT crash the kernel!
>
> I guess the problem comes from a combination of ARP + DNS
> lookups/replies. Note that my kernel module has its own internal fake
> DNS server which captures lookups and sends replies directly back to the
> stack. What is amazing: if the ARP entry I create is set to
> NUD_PERMANENT, then I don't get any crash (however I cannot develop my
> module with permanent ARP entries).
>
> I'm wondering if there were any major changes to the neighbor and arp
> code (between 2.6.18 and 2.6.31) that are somehow causing this problem ?...
>
> Any hint is very welcome.
>
> thanks in advance,
> Christophe
>
> PS: I can easily reproduce the problem, and was trying to debug with
> qemu and gdb server but so fra no success to clearly identify the
> problem. Last point: it seems the kernel does not really "crash" but
> rather ends up in some unstable state and maybe in a loop.
> --

Hi Christophe

You should ask these kind of questions on netdev instead of lkml.

And of course, post your patch, or send us a crystal ball ;)

Yes, many things changed between 2.6.18 and 2.6.34


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/