From: Jamie Lokier on
Russell King - ARM Linux wrote:
> On Sat, Mar 06, 2010 at 09:24:49PM +1100, dave b wrote:
> > I had already reported it to debian -
> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=572653
> >
> > I have cc'ed linux-arm-kernel into this email.
>
> I think most of the points have already been convered, but just for
> completeness,
>
> What is the history of the hardware you're running these builds on?
> Has it proven itself on previous kernel versions running much the same
> tests?
>
> Another point to consider: how are you running the compiler - is it
> over NFS from a PC?
>
> The reason I ask is that you can suffer from very weird corruption
> issues - I have a nice illustration of one which takes a copy of 1GB
> worth of data each day, and every once in a while, bytes 8-15 of a
> naturally aligned 16 byte block in the data become corrupted somewhere
> between the network and disk. The probability of corruption happening
> is around 0.0000001%, but it still happens... and that makes it
> extremely difficult to track down.

We had annoying corruption in some totally different hardware a few
years ago, but not quite as rare as that.

It was only on ext3 filesystems, not vfat as was supplied with the
SDK. It turned out that the chip's IDE driver started DMA like this:

1. Write to DMA address and count registers.
2. Flush D-cache.
3. Write start DMA command to DMA controller.

We found step 1 preloaded the first 128 bytes into a chip FIFO
(undocumented of course), although the DMA didn't start until step 3.
Swapping steps 1 and 2 fixed it.

The chip supplier hadn't encountered corruption because the code path
from vfat down always had the side effect of flushing those cachelines.

With the cache handling complexity that some ARMs now seem to require,
I wonder if you're seeing a similarly missed cache flush? Adding
cache flushes at strategic points throughout the kernel was very
helpful in narrowing down the one we saw.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: dave b on
I believe that the 2.6.32.7 kernel I compiled and was using on the
device while compiling the 2.6.33 kernel had *issues* (although most
likely not kernel related). In particular various issues (apt-get not
working) including on one piece of hardware using the kernel binary as
produced by others.

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pavel Machek on
On Fri 2010-03-26 21:12:34, dave b wrote:
> I believe that the 2.6.32.7 kernel I compiled and was using on the
> device while compiling the 2.6.33 kernel had *issues* (although most
> likely not kernel related). In particular various issues (apt-get not
> working) including on one piece of hardware using the kernel binary as
> produced by others.

Interesting. I remember apt-get failing recently; I thought my
databases went corrupt, but then it started to work magically...

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/