From: Martin K. Petersen on
>>>>> "DLT" == Daniel Taylor <Daniel.Taylor(a)wdc.com> writes:

DLT> Simple reality is that XP is "forever". Drives >2TiB, which may be
DLT> USB-attached, used with XP will be MBR-partitioned and use
DLT> 4096-byte sectors. We need to be able to read/write those disks on
DLT> Linux systems.

Shouldn't be a problem as long as the DOS partition table vs. 4 KiB
sectors thing is fixed.


DLT> One last comment: I just tried to partition and format a >2TiB
DLT> drive on fully updated Ubuntu 9.10 with GParted. I selected not to
DLT> cylinder align, use GPT and ext3, and to put 1 MiB preceeding and
DLT> following. libparted failed with "unable to satisfy all
DLT> constraints of the partition". Using "parted", I created the
DLT> partition, and then GParted was able to apply the ext3 file system.

I don't think ubuntu has adopted any of the relevant updates yet.

I believe the Fedora 13 Alpha is due to be released this week. That
would be the best test platform because several of the people who have
been actively engaged in the 4 KiB sector enablement process are Fedora
developers.

--
Martin K. Petersen Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mikael Abrahamsson on
On Mon, 8 Mar 2010, Tejun Heo wrote:

> http://ata.wiki.kernel.org/index.php/ATA_4_KiB_sector_issues

Excellent summary.

> C-2. Windows XP depends on the traditional partition layout.

Is this really true? WD ships their EARS drives with an alignment tool
that as far as I can understand, moves the partition so
it's aligned to 4KiB:

http://www.wdc.com/en/products/advancedformat/

So an XP fresh install (including letting XP partition the drive) will be
misaligned, but if you clone xp onto a properly aligned partition (or run
the tool and let it move the partition), it'll be ok. So saying that XP
"depends" on traditional partition layout might be a bit of a streth?

--
Mikael Abrahamsson email: swmike(a)swm.pp.se
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Tokarev on
Mike Snitzer wrote:
[]
> I've been keeping track of all the pieces in play, have coordinated
> with kzak and jim, and have a summary that offers some amount of macro
> detail (at the end I touch on parted and fdisk):
>
> http://people.redhat.com/msnitzer/docs/io-limits.txt

What I don't see in this thread and in this document is - any mention
of linux md layer. I think it is the first candidate to test the whole
thing, the easiest and most important one. I mean the alignment and
"recommended I/O size" and all this similar stuff.

Think of a raid5 array - with all the mentioned good stuff in place
fdisk should figure out to align partitions on the array stripe
boundary, and should do that automatically. And this should be
most easy to debug/test, since the whole thing is controllable
by kernel.

But apparently it does not implement anything of this sort.
Adding Neilb to the Cc list.......

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jim Meyering on
Karel Zak wrote:
> On Mon, Mar 08, 2010 at 10:18:27AM -0500, Martin K. Petersen wrote:
....
>> It'd be great if you guys could share what you have been doing to the
>> tooling.
>
> small summary:
>
> - libblkid provides unified API to topology information, it supports:
> - ioctls (kernel >= 2.6.32)
> - sysfs (kernel >= 2.6.31)
> - stripe chunk size and stripe width for DM, MD. LVM and evms on
> old kernels
> - libparted and fdisk are linked against libblkid
>
> - fdisk supports 4KiB logical sector size (util-linux-ng >= 2.15
> - fdisk supports 4KiB physical sector size (util-linux-ng >= 2.17)
> - fdisk uses 1MiB alignment (or more if optimal I/O size is bigger)
> and alignment_offset for all partitions in non-DOS mode
> (util-linux-ng >= 2.17.1)
>
> - parted supports 4KiB physical sector size
> - parted uses 1MiB alignment for disks with unknown topology, disks
> with topology information are aligned to optimal (or minimum) I/O
> size (parted >= 2.1)
>
> - EFI GPT code in the kernel has been updated to works properly with
> 4KiB sectors (kernel >= 2.6.33)
>
> - mkfs.{ext,xfs,gfs2,ocfs2} have been update to work properly with
> topology information, mkfs.{ext,xfs} are linked against libblkid
> for compatibility with old kernel (for stripe chunk size / width)
>
> - Fedora-13/RHEL6 installer uses libparted with 4KiB support
>
> - alignment_offset & 4KiB support is planned for LUKS (cryptsetup)

Thanks for the summary, Karel.
In case anyone wants more high-level detail on the parted front,
here's its NEWS file:

http://git.debian.org/?p=parted/parted.git;a=blob;f=NEWS

Currently, I'm not planning much for Parted, other than clean-up.
For example, I want to remove all of the FS-related code (it's
horribly bit-rotted) from the package, with the exception of
HFS/HFS+ and FAT resizing capabilities, since AFAIK, Parted
has the only free implementations. If any of you know of other
implementations or work in progress, please let me know.


Related information, prompted by my recent encounter with a
tool that refused to let me use a GPT partition table.

Partition table formats: prefer GUID/GPT:

Having spent more than my share of time looking at partition table
formats recently, I am now strongly biased against DOS partition
tables, and for GUID/GPT ones. In addition to allowing for >2GiB
partition offsets and lengths, GPT tables provide for better
protection in case of corruption (checksums, backup table at end
of disk) and don't have the anachronistic distinction of primary
and extended/logical partitions (all partitions are "primary").
You can even give each partition a name. The only reason to use a
DOS partition table on a new installation is if you're stuck with
a requirement of using an OS like XP on bare metal.

Please consider encouraging the use of GPT partition tables...
or at least do not *dis*courage their use.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Karel Zak on
On Tue, Mar 09, 2010 at 09:53:37AM +0300, Michael Tokarev wrote:
> Mike Snitzer wrote:
> []
> > I've been keeping track of all the pieces in play, have coordinated
> > with kzak and jim, and have a summary that offers some amount of macro
> > detail (at the end I touch on parted and fdisk):
> >
> > http://people.redhat.com/msnitzer/docs/io-limits.txt
>
> What I don't see in this thread and in this document is - any mention
> of linux md layer. I think it is the first candidate to test the whole
> thing, the easiest and most important one. I mean the alignment and
> "recommended I/O size" and all this similar stuff.
>
> Think of a raid5 array - with all the mentioned good stuff in place
> fdisk should figure out to align partitions on the array stripe
> boundary, and should do that automatically. And this should be

Yes. For userspace there is not a difference between RAID and non-RAID
device -- the topology support in kernel provides unified API to all
devices. It means we needn't any extra support for RAIDs in
fdisk/parted. The userspace tools follow topology data from kernel.

The good thing with 1MiB default alignment is that it is usable for
usual stripe sizes (for sizes greater than 1MiB we use optimal I/O
size).

> most easy to debug/test, since the whole thing is controllable
> by kernel.

I did almost all my tests with scsi_debug or MD RAID0 on scsi_debug.
It works as expected. (Note that kernel 2.6.31 has a problem with
alignment_offset calculation on stacked devices, so use the latest
kernel where the bug is already fixed.)

But I didn't tried to use unpartitioned (whole) 4K disks for RAIDs,
because scsi_debug does not allow to create more devices (and I don't
have a real HW).

Some tests are available in util-linux-ng sources:
http://git.kernel.org/?p=utils/util-linux-ng/util-linux-ng.git;a=tree;f=tests/ts/fdisk

Karel


# modprobe scsi_debug dev_size_mb=2500 sector_size=512 physblk_exp=3

[..create partitions...]

# fdisk -lcu /dev/sdb

Disk /dev/sdb: 2621 MB, 2621440000 bytes
255 heads, 63 sectors/track, 318 cylinders, total 5120000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 32768 bytes
Disk identifier: 0xb585b0be

Device Boot Start End Blocks Id System
/dev/sdb1 2048 1026047 512000 83 Linux
/dev/sdb2 1026048 2050047 512000 83 Linux
/dev/sdb3 2050048 3074047 512000 83 Linux
/dev/sdb4 3074048 4098047 512000 83 Linux


# mdadm --create /dev/md8 --level=5 --raid-devices=4 /dev/sdb{1,2,3,4}

[...create partitions on the raid...]

# fdisk -lcu /dev/md8

Disk /dev/md8: 1572 MB, 1572667392 bytes
2 heads, 4 sectors/track, 383952 cylinders, total 3071616 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disk identifier: 0x1bb6fd8d

Device Boot Start End Blocks Id System
/dev/md8p1 2048 1435647 716800 83 Linux
/dev/md8p2 1435648 2869247 716800 83 Linux


Check offsets (alignment):

# cat /sys/block/sdb/sdb{1,2,3,4}/alignment_offset
0
0
0
0

# cat /sys/block/md8/md8p{1,2}/alignment_offset
0
0

--
Karel Zak <kzak(a)redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/