From: Martin K. Petersen on
>>>>> "hpa" == H Peter Anvin <hpa(a)zytor.com> writes:

>> Huh, what? My homedir is on a 4KiB LBS/PBS drive and has been for ~2
>> years.

hpa> For > 2 TiB drives with 4 KiB logical sectors and MS-DOS partition
hpa> tables, it is.

Ah, that. Already fixed, I believe.


>> With regards to XP compatibility I don't think we should go too much
>> out of our way to accommodate it. XP has been disowned by its master
>> and I think virtualization will take care of the rest.

hpa> I think that's is wildly optimistic,

I don't expect XP to go away any time soon. But do I think that the
number of fresh XP installs in combination with Linux will be fairly
limited. And general lack of hardware enablement will eventually kill
off XP on raw metal.

I think it's ok that we have stop-gap solutions in place for
interoperability. But I wouldn't want to waste all our resources on
designing for the past. I'm much more interested in making sure that
single-boot Linux is doing the right thing.


>> FWIW, recent fdisk has a command line flag that will enable/disable
>> DOS compatible layout.

hpa> Yes, unfortunately it is still on by default.

I agree that this is a don't-be-broken option and I would prefer it the
other way around (I know that's the plan for the next release. I just
hope the distributions get things right).

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Cláudio Martins on

On Tue, 09 Mar 2010 00:28:25 +0530 James Bottomley <James.Bottomley(a)suse.de> wrote:
>
> There's another problem that afflicts 4k drives emulating 512b: they
> have to do a read modify write for any isolated 512b write ... that
> leads to potential corruption of adjacent 512b blocks if power is lost
> at the moment the write is being done. Since most Linux filesystems are
> 4k sectors, misalignment really hammers this, plus most journal writes
> seem to be done in 512 byte increments. I suppose for USB this could be
> regarded as flakey as usual, though.
>

Most users assume that a single 512B sector write is atomic as far as
power failure is concerned. Hasn't this requirement been carried over
to the new 4k physical sector?

It seems reasonable that if a 512B sector write is atomic in the older
drives, a 4k sector write would also be atomic on the newer drives,
since the time required to write it is negligible when compared to
capacitor voltage decay and inertia of the disk platters.

Anyway, I suppose most of the energy/time required for a sector write
operation, is being expended on head assembly positioning and the wait
for the correct sector passing under the write head. That is, the write
operation itself takes so little time that it should make no difference
whether you write 512B or 4k.

So the question is: what are hard drive makers guaranteeing (if
anything at all)? Was a 512B sector write really atomic? Is a 4k one?
Or was it completely manufacturer-dependent to start?

Regards

Cláudio

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 03/08/2010 07:18 AM, Martin K. Petersen wrote:
>
> I'm Cc:'ing Karel Zak and Jim Meyering who have been doing all the
> alignment work for fdisk and parted respectively. Karel, Jim: The full
> writeup is here:
>
> http://ata.wiki.kernel.org/index.php/ATA_4_KiB_sector_issues
>
> It'd be great if you guys could share what you have been doing to the
> tooling.
>

Please correct the following bit in C-3:

"A different partition format - GPT[6] - should be used beyond 2^32
sectors, which could harm compatibility with older BIOSs or other
operating systems which don't recognize the new format."

BIOS does not care about the partition table format. There might be
issues with > 2^32 sectors for BIOSes (e.g. truncating sector counts),
but that would be unrelated.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Martin K. Petersen on
>>>>> "hpa" == H Peter Anvin <hpa(a)zytor.com> writes:

hpa> On the flipside, though, there really is very little net benefit to
hpa> 4K as opposed to 512 byte logical sectors: the additional protocol
hpa> overhead is relatively minimal, and as long as writes are aligned
hpa> full blocks, there shouldn't be any additional overhead on either
hpa> the OS or the drive side. On the plus side, you get full
hpa> compatibility with the existing software stack. The equation
hpa> really seems rather simple.

4KB sectors are not a win for anybody except the drive vendors.

There is a push in the industry right now to keep the 512-byte logical
blocks forever. The first step would be to report misaligned accesses
or accesses that are not a multiple of the physical block size. Second
step would be to eventually reject any write that's not a properly
aligned multiple of the physical block size.

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Martin K. Petersen on
>>>>> "Cláudio" == Cláudio Martins <ctpm(a)ist.utl.pt> writes:

Cláudio> So the question is: what are hard drive makers guaranteeing (if
Cláudio> anything at all)?

No guarantees. Nothing that you can get in writing, anyway.


Cláudio> Was a 512B sector write really atomic?

Sometimes.


Cláudio> Is a 4k one?

Sometimes, maybe.

The problem with 4KB physical blocks is that if you do a partial or
misaligned write you'll end up having to do read-modify-write. And that
introduces are scenario where a subsequent write error will affect
logical blocks that were not part of the I/O request.

However, you also have that with regular drives because they often write
more than the actual block undergoing I/O. For instance to reduce
hotspot bleed to adjacent sectors.

There have been several unsuccessful attempts at nudging the drive
vendors into giving us real guarantees (supercapacitors, NVRAM or
flash-backed write cache). No luck so far. So people that care use
arrays with non-volatile caches.

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/