From: Svend Olaf Mikkelsen on
On 27 Aug 2007 14:50:38 GMT, Arno Wagner <me(a)privacy.net> wrote:

>Come to think of it, it may be that Linux typically reads (and
>writes?) 1kB or 2kB aligned on an address divisible by 2 or 4
>respectively. Maybe only on SCSI, maybe on USB storage, maybe
>generally. On a fast browse through the sources of 2.6.18.8 I did not
>find anything relevant.
>
>This may mean that testing the presence of the problem under Linux
>could need a single-secor write (if Linux does that). If Linux
>allways does at least 1kB accesses aligned on an even address, then
>the problem would not manifest itself. If it only does this on
>reading, the problem could well be present for a single-sector
>write.
>
>Can you overwrite the critical sector with dd and then see
>whether it changed?
>
>Arno

So far I made sector 268435454 on /dev/hdc a bad sector, and tried
this:

dd if=/dev/hdc of=sector.bin bs=512 count=1 skip=268435455

I currently do not know how to capture the Linux console, but dd could
not read the sector, and I had this in /var/log/messages:


Aug 27 22:32:24 localhost kernel: hdc: dma_intr: status=0x51 {
DriveReady SeekComplete Error }
Aug 27 22:32:24 localhost kernel: hdc: dma_intr: error=0x40 {
UncorrectableError }, LBAsect=268435454, high=15, low=16777214,
sector=268435448
Aug 27 22:32:24 localhost kernel: ide: failed opcode was: unknown
Aug 27 22:32:24 localhost kernel: end_request: I/O error, dev hdc,
sector 268435448
Aug 27 22:32:24 localhost kernel: Buffer I/O error on device hdc,
logical block 33554431
Aug 27 22:32:28 localhost kernel: hdc: dma_intr: status=0x51 {
DriveReady SeekComplete Error }
Aug 27 22:32:28 localhost kernel: hdc: dma_intr: error=0x40 {
UncorrectableError }, LBAsect=268435454, high=15, low=16777214,
sector=268435448
Aug 27 22:32:28 localhost kernel: ide: failed opcode was: unknown
Aug 27 22:32:28 localhost kernel: end_request: I/O error, dev hdc,
sector 268435448
Aug 27 22:32:28 localhost kernel: Buffer I/O error on device hdc,
logical block 33554431


It may not matter, but this is the Hitachi disk.

This could give some indication that your read alignment theory is
correct, although it was not a USB disk.

In Windows sector 268435455 on the same disk with the bad sector can
be read without problems:

C:\>findpart getsect 2 16709 85 16 1 sector.bin noheader
OK

--
Svend Olaf
From: Folkert Rienstra on
Arno Wagner wrote in message news:5jg45uF3uid5pU1(a)mid.individual.net
> Previously Svend Olaf Mikkelsen <svolaf(a)partitionsupport.com> wrote:
> > As example the problem is present with the Prolific PL-2506
> > Hi-Speed USB to IDE Bridge Controller, the version I have, and the
> > Seagate ST3160212A disk with firmware 3.AAJ.
>
> > The problem is not present with the same chip and the Hitachi "Hitachi
> > HDS721616PLAT80" disk with firmware P22OA8BA.

> Ok, so this is either a disk or a controller issue that is triggerd
> by use of a mix of LBA 28 and LBA 48 commands in a border situation

> (namely sector 268435455, i.e. 2^28-1).

Which is the 2^28'th sector, babblebot.

> Since the LBA commands are created by the USB-to-ATA
> device, this would then not be OS specific, as USB uses

> SCSI sector numbers (32 or 64 bit)

There is no such thing as a "SCSI sector number", babblebot.

> anyways and users of other OSes would be equally at risk.
>
> > I am not certain who to blame, if any. One question is if a disk should
> > be able to read sector 268435455 using LBA 28-bit commands, ac-
> > cording to the ATA specifications.

> I hat a look into an ATA-8 Command Set draft (Jan 2006) and it says in
> 4.1 that IDENTIFY DEVICE will return the number of sectors plus one
> which (in 28 bit mode) may not exceed 0xfffffff,

"4.2.1 Definitions and value ranges of IDENTIFY DEVICE data words"

"Words (61:60) shall contain the value one greater than the total number of user-address-
able sectors in 28-bit addressing and shall not exceed 0FFFFFFFh. The content of words
(61:60) shall be greater than or equal to one and less than or equal to 268,435,455."

That's a complete crock and that has been wrong for a long time now.
The description of words 61:60 in 7.1 however is correct.

Here is what 4.14 says:

"If the value in contained IDENTIFY DEVICE data words (103:100)
is greater than 268,435,455, then the maximum value in words (61:60)
shall be 268,435,455. That is, if the device contains greater than the
capacity addressable with 28-bit commands, words (61:60) shall descri-
be the maximum capacity that can be addressed by 28-bit commands."

The description of words 61:60 in 7.17

"7.17.1.22 Word (61:60): Total number of user addressable sectors

This field contains a value that is one greater than the maximum user ac-
cessable logical block address (See 4.2).
The maximum value that shall be placed in this field is 0FFFFFFFh."

Again, that last line is questionable as it doesn't make any sense.
That field is 32-bit so 10000000h would have made perfect sense.

> i.e. the number of addressable secors in 28-bit mode is 268435454 at the most.

Nope, it's 268435455 sectors, sector 268435454 being the last one.

> However for actual sector numbers it seems 268435455 is allowed

Yeah, funny that.
Imagine that you would have a hole in your sector numbers at 268435455.

> (but can not happen unless a 48-bit IDENTIFY DEVICE was used).

There is no such thing as "a 48-bit IDENTIFY DEVICE", babblebot.

> My guess is that some HDD manufacturers screwed up and actually kept the
> LBA 28 commands at the limit that

> an LBA 28 IDENTIFY DEVICE imposes,

There is no such thing as "an LBA 28 IDENTIFY DEVICE", babblebot.

> even if that limit is not present with LBA 48. At the same time
> the USB-to-ATA bridge designers were careless and did not either
> use LBA 48 from 268435455 onwards (or generally), which, given
> the not too clear wording in the spec, would have been a good idea.

Whatever.
Get some sleep babblebot. You are obviously raving with lunacy.

>
> Arno
From: Folkert Rienstra on
Arno Wagner wrote in message news:5jg4vfF3uid5pU2(a)mid.individual.net
> Previously Svend Olaf Mikkelsen <svolaf(a)partitionsupport.com> wrote:
> > On Sun, 26 Aug 2007 12:17:21 -0700, "Eric Gisin" <gisin(a)uniserve.com>
> > wrote:
>
> > > But Windows NT only issues LBA-32 sector nums, which IDE drivers translate.
> > > The USB drives always get a SCSI LBA-32 sector, which the drive translates.
> > >
> > > The problem drives should be tested on OSX and Lunix.
>
> > In a system with the problem present in Windows, this command did read
> > the sector correctly in Linux:
>
> > dd if=/dev/sda of=sector.bin bs=512 count=1 skip=268435455
>
> > I do not know why the result is different between Windows and Linux.

> Very interesting. This should not be happening as far as I can tell.

No kidding, babblebot.

> At least if the problem is only an over-optimistic SCSI 320bit sector
> number to LBA 28 conversion. This probably means thet Winsows is
> (mis-)configuring something, while Linux is not.

Or maybe you should get some sleep, babblebot.

>
> > With a 255 heads, 63 sectors geometry this is cylinder 16709, head 85,
> > sector 16. I attempted to read the sector in Windows 2000 using
> > Findpart, but it failed. I could read the sector by reading 2 sectors
> > at the previous address,

> That would be consistent with a problem with the secor address
> in the command and no problem in the disk-internal handling
> of sector numbers.

No, it wouldn't.

>
> > and then edit the file using Windows edit,
> > and after that it matched the sector read in Linux:
>
> > C:\>findpart getsect 4 16709 85 16 1 sector.bin noheader
> > Some sectors could not be read.
>
> > C:\>findpart getsect 4 16709 85 15 2 sector.bin noheader
> > File already exists.
>
> > C:\>del sector.bin
>
> > C:\>findpart getsect 4 16709 85 15 2 sector.bin noheader
> > OK
>
> > C:\>edit /64 sector.bin
>
> > C:\>fc /b sector.bin l:\*.*
> > Comparing files SECTOR.BIN and L:\SECTOR.BIN
> > FC: no differences encountered
>
> > C:\>

> I think I have an idea:

Oh whoopteedoo, the sleep-deprived babblebot has an idea.

> Linux may be reading sectors earlier
> because of some pecularities in its read-ahead strategy.
> For example it may align reads on dividable-by-4 block
> numbers. This would give the behaviour you demonstrate above.
>
> I think without diving into the USB and/or SCSI code this may be
> difficult to find out.
>
> Anyways, valuable information! Thanks!
>
> Arno
From: Arno Wagner on
Previously Svend Olaf Mikkelsen <svolaf(a)partitionsupport.com> wrote:
> On 27 Aug 2007 14:50:38 GMT, Arno Wagner <me(a)privacy.net> wrote:

>>Come to think of it, it may be that Linux typically reads (and
>>writes?) 1kB or 2kB aligned on an address divisible by 2 or 4
>>respectively. Maybe only on SCSI, maybe on USB storage, maybe
>>generally. On a fast browse through the sources of 2.6.18.8 I did not
>>find anything relevant.
>>
>>This may mean that testing the presence of the problem under Linux
>>could need a single-secor write (if Linux does that). If Linux
>>allways does at least 1kB accesses aligned on an even address, then
>>the problem would not manifest itself. If it only does this on
>>reading, the problem could well be present for a single-sector
>>write.
>>
>>Can you overwrite the critical sector with dd and then see
>>whether it changed?
>>
>>Arno

> Reply no. 2.

> A variant of the problem can be seen in Linux.

> I made a partition on the USB Seagate disk beginning at sector
> 268435455:

Good thinking!

> Disk: 4 Cylinders: 19457 Heads: 255 Sectors: 63 MB: 152625

> --PCyl N ID -----Rel -----Num ---MB --Start CHS- ---End CHS-- BS CHS
> 0 1 61268435455 44141250 21553 16709# 85 16 19456*254 63 OK

> Then in Linux I did:

> dd if=/dev/sda of=sda.bin bs=512 count=1 skip=268435455

> and

> dd if=/dev/sda1 of=sda1.bin bs=512 count=1

> The file sda.bin has the correct content, while sda1.bin has wrong
> content, and is different between different attempts. This indicates
> that the sector was actually not read, without any warnings.

Indeed. So this problem is likely the hardware issue I described earlier.

With this Linux users could be safe, placing a partition start at
sector 268435455 seems to be the only way to trigger the problem.

To blame for this mess are the disk manufacturers and the bridge-chip
manufacturers for being careless.

Arno
From: Arno Wagner on
Previously Arno Wagner <me(a)privacy.net> wrote:
> Previously Svend Olaf Mikkelsen <svolaf(a)partitionsupport.com> wrote:

I did some additional digging and it seems that Linux uses 1kB as
default block size in the disk buffer-cache. This would mean that no
odd-numberd disk sector is ever read first, or, due to read-ahead, as
only secotr in a read request. On writes, I am not sure about the
implications. It could well be, that all writes are done in 1kB block
sizes, aligned on an even start number, as well.

I am not sure how to change the default block size in the
buffer chache, or if it is possible from userland.

This would also explain an other observation I had with disks that
have (visible) defective sectors: Copying them on disk level
with dd_rescue, you sometimes get multiple errors for a single
defective sector.

Arno