From: H. Peter Anvin on
On 03/14/2010 07:26 PM, Denys Vlasenko wrote:
>>
>> Yes, but it does squat for a flash disk that wants, say, 256K alignment.
>
> 4K makes sense. 256K not so much.
>
> 256K alignment is hard to swallow for a lot of reasons anyway.
> Unless the filesystem packs small files into blocks a-la reiserfs,
> 256K block filesystems will be very inefficient for a typical
> storage scenarios.
>

Noone has talked about using 256K filesystem blocks. The fact of the
matter, though, is that both flash and RAID have much larger alignment
requirements than a mere 4K for optimal performance.

You might not like it, but that's the way it is.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: david on
On Mon, 15 Mar 2010, Denys Vlasenko wrote:

> On Monday 15 March 2010 02:21, H. Peter Anvin wrote:
>> On 03/10/2010 01:14 AM, Denys Vlasenko wrote:
>>>
>>> 63s/255h is more or less "standard" now.
>>>
>>> Alignment issues can be solved by picking a good multiple of
>>> _heads_ or _cylinders_:
>>>
>>> For first partition, pick the start at 8th head:
>>>
>>> cyl 0 head 1 sector 1: LBA sector 63) - bad
>>> cyl 0 head 8 sector 1: LBA sector 8*63) - good (4k aligned)
>>>
>>> For any other partition, pick start cylinder which is a multiple of 8:
>>>
>>> cyl 8*x head 0 sector 1: LBA sector 8*x*255*63 - good (4k aligned)
>>>
>>> This will actually work well for *any* geometry, not only for 63s/255h.
>>
>> Yes, but it does squat for a flash disk that wants, say, 256K alignment.
>
> 4K makes sense. 256K not so much.
>
> 256K alignment is hard to swallow for a lot of reasons anyway.
> Unless the filesystem packs small files into blocks a-la reiserfs,
> 256K block filesystems will be very inefficient for a typical
> storage scenarios.

the thing is, if the OS can learn that it's more efficiant to write in
256K aligned chunks, then it can batch up things so that the drive doesn't
have to do a read-modify-write cycle and can instead just replace the
entire chunk.

raid arrays can benifit from this as well as SSDs.

the OS can do this when writing things to swap, flushing dirty buffers,
mmaped files, etc (in fact, if the OS knows the full contents of the
chunk, it may be more efficiant for the OS to write the entire thing then
to write part of it and have the drive/array do the read-modify-write
cycle)

David Lang

> It looks like flash storage manufacturers just have to bite
> the bullet and develop smarter algorithms that combine wear
> leveling, block remapping and such and make their internal
> preference for huge continuous aligned writes nearly invisible
> from the outside - just like hard disks which do not expose
> their zoned recording, variable sector counts etc.
>
> Such algorithms aren't trivial, but they are possible.
> Whoever will incorporate them in their products,
> delivers a significantly better user experience.
>
> I just played with ubuntu installation on an usb stick.
> Yes, it works. Soft of. Write performance is abysmal.
> I would pay x2 or x3 for the same sized stick if it
> would perform better.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Denys Vlasenko on
On Monday 15 March 2010 06:20, david(a)lang.hm wrote:
> >>> For any other partition, pick start cylinder which is a multiple of 8:
> >>>
> >>> cyl 8*x head 0 sector 1: LBA sector 8*x*255*63 - good (4k aligned)
> >>>
> >>> This will actually work well for *any* geometry, not only for 63s/255h.
> >>
> >> Yes, but it does squat for a flash disk that wants, say, 256K alignment.
> >
> > 4K makes sense. 256K not so much.
> >
> > 256K alignment is hard to swallow for a lot of reasons anyway.
> > Unless the filesystem packs small files into blocks a-la reiserfs,
> > 256K block filesystems will be very inefficient for a typical
> > storage scenarios.
>
> the thing is, if the OS can learn that it's more efficiant to write in
> 256K aligned chunks, then it can batch up things so that the drive doesn't
> have to do a read-modify-write cycle and can instead just replace the
> entire chunk.

I think Linux already is doing this. The problem is, in many cases
OS can't possibly do this, short of using a specially designed
filesystem.

If you untar a Linux kernel source tarball on a seriously
fragmented ext2 filesystem, there will be a lot of discontiguous
and/or misaligned writes smaller than 256K.
Only smart firmware can help in this case.
--
vda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arnd Bergmann on
On Monday 15 March 2010, H. Peter Anvin wrote:
> > 256K alignment is hard to swallow for a lot of reasons anyway.
> > Unless the filesystem packs small files into blocks a-la reiserfs,
> > 256K block filesystems will be very inefficient for a typical
> > storage scenarios.
>
> Noone has talked about using 256K filesystem blocks.

Well, logfs has just been merged and works with block sizes in that
range, but obviously only if the partition is correctly aligned.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 03/15/2010 02:56 AM, Denys Vlasenko wrote:
> I think Linux already is doing this. The problem is, in many cases
> OS can't possibly do this, short of using a specially designed
> filesystem.
>
> If you untar a Linux kernel source tarball on a seriously
> fragmented ext2 filesystem, there will be a lot of discontiguous
> and/or misaligned writes smaller than 256K.
> Only smart firmware can help in this case.

Yes, but guess what... there is a lot of stupid firmware out there, and
there are lots of RAID arrays, and so on.

"Seriously fragmented" means you have already lost in the first place.

This doesn't change the fact that this is a real issue and that that is
the major reason why aligning to 63*4K is a bad idea.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/