From: Tejun Heo on
Hello, again.

On 03/09/2010 11:34 AM, Tejun Heo wrote:
>> - parted uses 1MiB alignment for disks with unknown topology, disks
>> with topology information are aligned to optimal (or minimum) I/O
>> size (parted >= 2.1)
>
> This will result in incorrect alignment for drives which lie about the
> physical sector size to work around BIOS/drivers issues (C-1). It
> would probably be best to align to at least 1MiB.

I misread it. C-1 would be disks w/o alignment information which will
be aligned to optimal_io_size which again would be 0 and thus 1MiB
alignment. So, this should work, right?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tejun Heo on
Hello,

On 03/09/2010 11:42 AM, Jeff Garzik wrote:
> On 03/08/2010 09:34 PM, Tejun Heo wrote:
>> libata is broken for logical 4KiB ATA devices tho. I'll fix it up.
>
> Does libata-dev.git#sectsize miss any details?

I haven't looked at it yet. I'll review it soon but the thing is
without actual hardware it would be a bit difficult to tell. It's not
only the drivers. I have this mighty unhappy feeling that some
controllers (especially some of the SATA ones with internal state
machine to emulate SFF) would be sniffing the commands and making the
wrong assumption if 4KiB logical sector size is used, so we'll need to
test various controllers. Some PATA-SATA bridge chips will definitely
be having problems too. Then there are the USB and other bridges too
but well those aren't libata's problem at least. :-)

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Martin K. Petersen on
>>>>> "Tejun" == Tejun Heo <tj(a)kernel.org> writes:

>>> Huh, what? My homedir is on a 4KiB LBS/PBS drive and has been for
>>> ~2 years.

Tejun> By default, they aren't aligned properly, are they?

Single partition. I did the alignment manually.


Tejun> libata is broken for logical 4KiB ATA devices tho. I'll fix it
Tejun> up.

Matthew implemented support for this a while back...


Tejun> I'm just a bit worried that it might generate a lot of frustrated
Tejun> bug reports. Well, maybe we should just advise users to install
Tejun> windows first and then install Linux.

Unfortunately there is no simple solution given that we can't go back in
time and fix legacy DOS/XP behavior.

The 1-alignment jumper (that some drives have) fixes things for the
first partition but will mess up our alignment for subsequent ones
unless the firmware actually reports the shift. So no matter what we do
the user will have to have a bare minimum of knowledge about 512-byte
LBS/4 KB PBS drives. That sucks. But even Windows users are presented
with extra documentation and alignment utilities during the transition.

Having a 1 MB alignment by default and hoping that devices that lie will
be 0-aligned is the best we can do, I think.

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Martin K. Petersen on
>>>>> "Tejun" == Tejun Heo <tj(a)kernel.org> writes:

>> http://people.redhat.com/msnitzer/docs/io-limits.txt

Tejun> Ah... this is great. I'll link the doc and shamelessly steal
Tejun> parts of it if that's okay with you.

There's also this one:

http://oss.oracle.com/~mkp/docs/linux-advanced-storage.pdf

It is more aimed at storage vendors than end users, though.

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Daniel Taylor on


-----Original Message-----
From: Tejun Heo [mailto:tj(a)kernel.org]
Sent: Monday, March 08, 2010 6:34 PM
To: Karel Zak
Cc: Martin K. Petersen; linux-ide(a)vger.kernel.org; lkml; Daniel Taylor; Jeff
Garzik; Mark Lord; tytso(a)mit.edu; H. Peter Anvin;
hirofumi(a)mail.parknet.co.jp; Andrew Morton; Alan Cox; irtiger(a)gmail.com;
Matthew Wilcox; aschnell(a)suse.de; knikanth(a)suse.de; jdelvare(a)suse.de; Jim
Meyering
Subject: Re: ATA 4 KiB sector issues.

Hello,

On 03/09/2010 04:58 AM, Karel Zak wrote:
>> Tejun> Reportedly, commonly used partitioners aren't ready to handle
>> Tejun> drives larger than 2 TiB in any configuration and alignment
>> Tejun> isn't
>
> The limit is specific for DOS partition table (with 512-byte log.
> sectors), but for example GPT uses 64-bit LBA. I believe that our
> partitioning tools don't introduce any other restriction.

Hmmm... the 'reportedly' was from Daniel Taylor or maybe I just
misinterpreted the conversation. Daniel, can you please fill in?

DLT> The problem that I see is that the installers and upper level
applications do not make good choices for partition layout.
DLT> "parted", itself, seems to work OK in the latest version. One of the
things I've heard since I started this process is that
DLT> there are some libraries associated with the process of
partitioning/formatting. Perhaps the upper layers and those
DLT> libraries aren't synced up?

>> Tejun> done properly for drives with 4 KiB physical sectors. 4 KiB
>> Tejun> logical sector support is broken in both the kernel
>>
>> Huh, what? My homedir is on a 4KiB LBS/PBS drive and has been for ~2
>> years.

By default, they aren't aligned properly, are they?

>> Tejun> (need more details and probably a whole section on partitioner
>> Tejun> behaviors)
>>
>> I'm Cc:'ing Karel Zak and Jim Meyering who have been doing all the
>> alignment work for fdisk and parted respectively. Karel, Jim: The
>> full writeup is here:
>>
>> http://ata.wiki.kernel.org/index.php/ATA_4_KiB_sector_issues
>>
>> It'd be great if you guys could share what you have been doing to the
>> tooling.
>
> small summary:
>
> - libblkid provides unified API to topology information, it supports:
> - ioctls (kernel >= 2.6.32)
> - sysfs (kernel >= 2.6.31)
> - stripe chunk size and stripe width for DM, MD. LVM and evms on
> old kernels
> - libparted and fdisk are linked against libblkid
>
> - fdisk supports 4KiB logical sector size (util-linux-ng >= 2.15
> - fdisk supports 4KiB physical sector size (util-linux-ng >= 2.17)
> - fdisk uses 1MiB alignment (or more if optimal I/O size is bigger)
> and alignment_offset for all partitions in non-DOS mode
> (util-linux-ng >= 2.17.1)

That's great. Daniel, maybe you were testing older versions? Or maybe
those failures were manifested from libata mishandling 4KiB r/w requets.

DLT> As I said, above, it could be libraries. I was not aware that so much
of the implementation was embedded there.

> - parted supports 4KiB physical sector size
> - parted uses 1MiB alignment for disks with unknown topology, disks
> with topology information are aligned to optimal (or minimum) I/O
> size (parted >= 2.1)

This will result in incorrect alignment for drives which lie about the
physical sector size to work around BIOS/drivers issues (C-1). It would
probably be best to align to at least 1MiB.

DLT> Please.

> - EFI GPT code in the kernel has been updated to works properly with
> 4KiB sectors (kernel >= 2.6.33)

libata is broken for logical 4KiB ATA devices tho. I'll fix it up.

> - mkfs.{ext,xfs,gfs2,ocfs2} have been update to work properly with
> topology information, mkfs.{ext,xfs} are linked against libblkid
> for compatibility with old kernel (for stripe chunk size / width)
>
> - Fedora-13/RHEL6 installer uses libparted with 4KiB support
>
> - alignment_offset & 4KiB support is planned for LUKS (cryptsetup)
>
>> Tejun> Unfortunately, the transition to 4 KiB sector size, physical
>> Tejun> only or logical too, is looking fairly ugly. Hopefully, a
>> Tejun> reasonable solution can be reached in not too distant future
>> Tejun> but even with all the software side updated, it looks like
>> Tejun> it's gonna cause significant amount of confusion and frustration.
>>
>> With regards to XP compatibility I don't think we should go too much
>> out of our way to accommodate it. XP has been disowned by its master
>> and I think virtualization will take care of the rest.

Yeah, good point. I'm just a bit worried that it might generate a lot of
frustrated bug reports. Well, maybe we should just advise users to install
windows first and then install Linux.

DLT> Simple reality is that XP is "forever". Drives >2TiB, which may be
USB-attached, used with XP will be MBR-partitioned
DLT> and use 4096-byte sectors. We need to be able to read/write those
disks on Linux systems.

>> FWIW, recent fdisk has a command line flag that will enable/disable
>> DOS compatible layout.
>
> yes, util-linux-ng 2.17.1, fdisk -c
>
> Note that non-DOS mode will be default in the next major
> util-linux-ng release.

I'll try to merge these information into the ata-4k doc.

Thank you very much.

DLT> One last comment: I just tried to partition and format a >2TiB drive on
fully updated Ubuntu 9.10 with GParted.
DLT> I selected not to cylinder align, use GPT and ext3, and to put 1 MiB
preceeding and following. libparted failed
DLT> with "unable to satisfy all constraints of the partition". Using
"parted", I created the partition, and then
DLT> GParted was able to apply the ext3 file system.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/