zcache: page cache compression support [Kernel]

Prev: [PATCH 1/4 update] firewire: cdev: some clarifications to the API documentation
Next: Regression 2.6.34+ -> 2.6.34-rc5: radeon KMS rs780 problems

From: Ed Tomlinson on 17 Jul 2010 17:20

Nitin,

Would you have all this in a git tree somewhere?

Considering getting this working requires 24 patches it would really help with testing.

TIA
Ed Tomlinson

On Friday 16 July 2010 08:37:42 you wrote:
> Frequently accessed filesystem data is stored in memory to reduce access to
> (much) slower backing disks. Under memory pressure, these pages are freed and
> when needed again, they have to be read from disks again. When combined working
> set of all running application exceeds amount of physical RAM, we get extereme
> slowdown as reading a page from disk can take time in order of milliseconds.
>
> Memory compression increases effective memory size and allows more pages to
> stay in RAM. Since de/compressing memory pages is several orders of magnitude
> faster than disk I/O, this can provide signifant performance gains for many
> workloads. Also, with multi-cores becoming common, benefits of reduced disk I/O
> should easily outweigh the problem of increased CPU usage.
>
> It is implemented as a "backend" for cleancache_ops [1] which provides
> callbacks for events such as when a page is to be removed from the page cache
> and when it is required again. We use them to implement a 'second chance' cache
> for these evicted page cache pages by compressing and storing them in memory
> itself.
>
> We only keep pages that compress to PAGE_SIZE/2 or less. Compressed chunks are
> stored using xvmalloc memory allocator which is already being used by zram
> driver for the same purpose. Zero-filled pages are checked and no memory is
> allocated for them.
>
> A separate "pool" is created for each mount instance for a cleancache-aware
> filesystem. Each incoming page is identified with <pool_id, inode_no, index>
> where inode_no identifies file within the filesystem corresponding to pool_id
> and index is offset of the page within this inode. Within a pool, inodes are
> maintained in an rb-tree and each of its nodes points to a separate radix-tree
> which maintains list of pages within that inode.
>
> While compression reduces disk I/O, it also reduces the space available for
> normal (uncompressed) page cache. This can result in more frequent page cache
> reclaim and thus higher CPU overhead. Thus, it's important to maintain good hit
> rate for compressed cache or increased CPU overhead can nullify any other
> benefits. This requires adaptive (compressed) cache resizing and page
> replacement policies that can maintain optimal cache size and quickly reclaim
> unused compressed chunks. This work is yet to be done. However, in the current
> state, it allows manually resizing cache size using (per-pool) sysfs node
> 'memlimit' which in turn frees any excess pages *sigh* randomly.
>
> Finally, it uses percpu stats and compression buffers to allow better
> performance on multi-cores. Still, there are known bottlenecks like a single
> xvmalloc mempool per zcache pool and few others. I will work on this when I
> start with profiling.
>
> * Performance numbers:
> - Tested using iozone filesystem benchmark
> - 4 CPUs, 1G RAM
> - Read performance gain: ~2.5X
> - Random read performance gain: ~3X
> - In general, performance gains for every kind of I/O
>
> Test details with graphs can be found here:
> http://code.google.com/p/compcache/wiki/zcacheIOzone
>
> If I can get some help with testing, it would be intersting to find its
> effect in more real-life workloads. In particular, I'm intersted in finding
> out its effect in KVM virtualization case where it can potentially allow
> running more number of VMs per-host for a given amount of RAM. With zcache
> enabled, VMs can be assigned much smaller amount of memory since host can now
> hold bulk of page-cache pages, allowing VMs to maintain similar level of
> performance while a greater number of them can be hosted.
>
> * How to test:
> All patches are against 2.6.35-rc5:
>
> - First, apply all prerequisite patches here:
> http://compcache.googlecode.com/hg/sub-projects/zcache_base_patches
>
> - Then apply this patch series; also uploaded here:
> http://compcache.googlecode.com/hg/sub-projects/zcache_patches
>
>
> Nitin Gupta (8):
> Allow sharing xvmalloc for zram and zcache
> Basic zcache functionality
> Create sysfs nodes and export basic statistics
> Shrink zcache based on memlimit
> Eliminate zero-filled pages
> Compress pages using LZO
> Use xvmalloc to store compressed chunks
> Document sysfs entries
>
> Documentation/ABI/testing/sysfs-kernel-mm-zcache | 53 +
> drivers/staging/Makefile | 2 +
> drivers/staging/zram/Kconfig | 22 +
> drivers/staging/zram/Makefile | 5 +-
> drivers/staging/zram/xvmalloc.c | 8 +
> drivers/staging/zram/zcache_drv.c | 1312 ++++++++++++++++++++++
> drivers/staging/zram/zcache_drv.h | 90 ++
> 7 files changed, 1491 insertions(+), 1 deletions(-)
> create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-zcache
> create mode 100644 drivers/staging/zram/zcache_drv.c
> create mode 100644 drivers/staging/zram/zcache_drv.h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nitin Gupta on 17 Jul 2010 22:30

Hi Ed,

On 07/18/2010 02:43 AM, Ed Tomlinson wrote:
>
> Would you have all this in a git tree somewhere?
>
> Considering getting this working requires 24 patches it would really help with testing.
>

Unfortunately, git tree for this is not hosted anywhere.

Anyways, I just uploaded monolithic zcache patch containing all its dependencies:
http://compcache.googlecode.com/hg/sub-projects/mainline/zcache_v1_2.6.35-rc5.patch

It applies on top of 2.6.35-rc5

Thanks for trying it out.
Nitin

> On Friday 16 July 2010 08:37:42 you wrote:
>> Frequently accessed filesystem data is stored in memory to reduce access to
>> (much) slower backing disks. Under memory pressure, these pages are freed and
>> when needed again, they have to be read from disks again. When combined working
>> set of all running application exceeds amount of physical RAM, we get extereme
>> slowdown as reading a page from disk can take time in order of milliseconds.
>>
>> Memory compression increases effective memory size and allows more pages to
>> stay in RAM. Since de/compressing memory pages is several orders of magnitude
>> faster than disk I/O, this can provide signifant performance gains for many
>> workloads. Also, with multi-cores becoming common, benefits of reduced disk I/O
>> should easily outweigh the problem of increased CPU usage.
>>
>> It is implemented as a "backend" for cleancache_ops [1] which provides
>> callbacks for events such as when a page is to be removed from the page cache
>> and when it is required again. We use them to implement a 'second chance' cache
>> for these evicted page cache pages by compressing and storing them in memory
>> itself.
>>
>> We only keep pages that compress to PAGE_SIZE/2 or less. Compressed chunks are
>> stored using xvmalloc memory allocator which is already being used by zram
>> driver for the same purpose. Zero-filled pages are checked and no memory is
>> allocated for them.
>>
>> A separate "pool" is created for each mount instance for a cleancache-aware
>> filesystem. Each incoming page is identified with <pool_id, inode_no, index>
>> where inode_no identifies file within the filesystem corresponding to pool_id
>> and index is offset of the page within this inode. Within a pool, inodes are
>> maintained in an rb-tree and each of its nodes points to a separate radix-tree
>> which maintains list of pages within that inode.
>>
>> While compression reduces disk I/O, it also reduces the space available for
>> normal (uncompressed) page cache. This can result in more frequent page cache
>> reclaim and thus higher CPU overhead. Thus, it's important to maintain good hit
>> rate for compressed cache or increased CPU overhead can nullify any other
>> benefits. This requires adaptive (compressed) cache resizing and page
>> replacement policies that can maintain optimal cache size and quickly reclaim
>> unused compressed chunks. This work is yet to be done. However, in the current
>> state, it allows manually resizing cache size using (per-pool) sysfs node
>> 'memlimit' which in turn frees any excess pages *sigh* randomly.
>>
>> Finally, it uses percpu stats and compression buffers to allow better
>> performance on multi-cores. Still, there are known bottlenecks like a single
>> xvmalloc mempool per zcache pool and few others. I will work on this when I
>> start with profiling.
>>
>> * Performance numbers:
>> - Tested using iozone filesystem benchmark
>> - 4 CPUs, 1G RAM
>> - Read performance gain: ~2.5X
>> - Random read performance gain: ~3X
>> - In general, performance gains for every kind of I/O
>>
>> Test details with graphs can be found here:
>> http://code.google.com/p/compcache/wiki/zcacheIOzone
>>
>> If I can get some help with testing, it would be intersting to find its
>> effect in more real-life workloads. In particular, I'm intersted in finding
>> out its effect in KVM virtualization case where it can potentially allow
>> running more number of VMs per-host for a given amount of RAM. With zcache
>> enabled, VMs can be assigned much smaller amount of memory since host can now
>> hold bulk of page-cache pages, allowing VMs to maintain similar level of
>> performance while a greater number of them can be hosted.
>>
>> * How to test:
>> All patches are against 2.6.35-rc5:
>>
>> - First, apply all prerequisite patches here:
>> http://compcache.googlecode.com/hg/sub-projects/zcache_base_patches
>>
>> - Then apply this patch series; also uploaded here:
>> http://compcache.googlecode.com/hg/sub-projects/zcache_patches
>>
>>
>> Nitin Gupta (8):
>> Allow sharing xvmalloc for zram and zcache
>> Basic zcache functionality
>> Create sysfs nodes and export basic statistics
>> Shrink zcache based on memlimit
>> Eliminate zero-filled pages
>> Compress pages using LZO
>> Use xvmalloc to store compressed chunks
>> Document sysfs entries
>>
>> Documentation/ABI/testing/sysfs-kernel-mm-zcache | 53 +
>> drivers/staging/Makefile | 2 +
>> drivers/staging/zram/Kconfig | 22 +
>> drivers/staging/zram/Makefile | 5 +-
>> drivers/staging/zram/xvmalloc.c | 8 +
>> drivers/staging/zram/zcache_drv.c | 1312 ++++++++++++++++++++++
>> drivers/staging/zram/zcache_drv.h | 90 ++
>> 7 files changed, 1491 insertions(+), 1 deletions(-)
>> create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-zcache
>> create mode 100644 drivers/staging/zram/zcache_drv.c
>> create mode 100644 drivers/staging/zram/zcache_drv.h
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo(a)vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Pekka Enberg on 18 Jul 2010 04:00

Nitin Gupta wrote:
> Frequently accessed filesystem data is stored in memory to reduce access to
> (much) slower backing disks. Under memory pressure, these pages are freed and
> when needed again, they have to be read from disks again. When combined working
> set of all running application exceeds amount of physical RAM, we get extereme
> slowdown as reading a page from disk can take time in order of milliseconds.
>
> Memory compression increases effective memory size and allows more pages to
> stay in RAM. Since de/compressing memory pages is several orders of magnitude
> faster than disk I/O, this can provide signifant performance gains for many
> workloads. Also, with multi-cores becoming common, benefits of reduced disk I/O
> should easily outweigh the problem of increased CPU usage.
>
> It is implemented as a "backend" for cleancache_ops [1] which provides
> callbacks for events such as when a page is to be removed from the page cache
> and when it is required again. We use them to implement a 'second chance' cache
> for these evicted page cache pages by compressing and storing them in memory
> itself.
>
> We only keep pages that compress to PAGE_SIZE/2 or less. Compressed chunks are
> stored using xvmalloc memory allocator which is already being used by zram
> driver for the same purpose. Zero-filled pages are checked and no memory is
> allocated for them.
>
> A separate "pool" is created for each mount instance for a cleancache-aware
> filesystem. Each incoming page is identified with <pool_id, inode_no, index>
> where inode_no identifies file within the filesystem corresponding to pool_id
> and index is offset of the page within this inode. Within a pool, inodes are
> maintained in an rb-tree and each of its nodes points to a separate radix-tree
> which maintains list of pages within that inode.
>
> While compression reduces disk I/O, it also reduces the space available for
> normal (uncompressed) page cache. This can result in more frequent page cache
> reclaim and thus higher CPU overhead. Thus, it's important to maintain good hit
> rate for compressed cache or increased CPU overhead can nullify any other
> benefits. This requires adaptive (compressed) cache resizing and page
> replacement policies that can maintain optimal cache size and quickly reclaim
> unused compressed chunks. This work is yet to be done. However, in the current
> state, it allows manually resizing cache size using (per-pool) sysfs node
> 'memlimit' which in turn frees any excess pages *sigh* randomly.
>
> Finally, it uses percpu stats and compression buffers to allow better
> performance on multi-cores. Still, there are known bottlenecks like a single
> xvmalloc mempool per zcache pool and few others. I will work on this when I
> start with profiling.
>
> * Performance numbers:
> - Tested using iozone filesystem benchmark
> - 4 CPUs, 1G RAM
> - Read performance gain: ~2.5X
> - Random read performance gain: ~3X
> - In general, performance gains for every kind of I/O
>
> Test details with graphs can be found here:
> http://code.google.com/p/compcache/wiki/zcacheIOzone
>
> If I can get some help with testing, it would be intersting to find its
> effect in more real-life workloads. In particular, I'm intersted in finding
> out its effect in KVM virtualization case where it can potentially allow
> running more number of VMs per-host for a given amount of RAM. With zcache
> enabled, VMs can be assigned much smaller amount of memory since host can now
> hold bulk of page-cache pages, allowing VMs to maintain similar level of
> performance while a greater number of them can be hosted.

So why would someone want to use zram if they have transparent page
cache compression with zcache? That is, why is this not a replacement
for zram?

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nitin Gupta on 18 Jul 2010 04:20

On 07/18/2010 01:20 PM, Pekka Enberg wrote:
> Nitin Gupta wrote:
>> Frequently accessed filesystem data is stored in memory to reduce access to
>> (much) slower backing disks. Under memory pressure, these pages are freed and
>> when needed again, they have to be read from disks again. When combined working
>> set of all running application exceeds amount of physical RAM, we get extereme
>> slowdown as reading a page from disk can take time in order of milliseconds.
>>
>> Memory compression increases effective memory size and allows more pages to
>> stay in RAM. Since de/compressing memory pages is several orders of magnitude
>> faster than disk I/O, this can provide signifant performance gains for many
>> workloads. Also, with multi-cores becoming common, benefits of reduced disk I/O
>> should easily outweigh the problem of increased CPU usage.
>>
>> It is implemented as a "backend" for cleancache_ops [1] which provides
>> callbacks for events such as when a page is to be removed from the page cache
>> and when it is required again. We use them to implement a 'second chance' cache
>> for these evicted page cache pages by compressing and storing them in memory
>> itself.
>>
<snip>

>
> So why would someone want to use zram if they have transparent page cache compression with zcache? That is, why is this not a replacement for zram?
>

zcache complements zram; it's not a replacement:

- zram compresses anonymous pages while zcache is for page cache compression.
So, workload which depends heavily on "heap memory" usage will tend to prefer
zram and those which are I/O intensive will prefer zcache. Though I have not
yet experimented much, most workloads may want to have a mix of them.

- zram is not just for swap. /dev/zram<id> are generic in-memory compressed
block devices which can be used for, say, /tmp, /var/... etc. temporary storage.

- /dev/zram<id> being a generic block devices, can be used as raw disk in other
OSes also (using virtualization): For example:
http://www.vflare.org/2010/05/compressed-ram-disk-for-windows-virtual.html

Thanks,
Nitin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Dan Magenheimer on 19 Jul 2010 16:10

> We only keep pages that compress to PAGE_SIZE/2 or less. Compressed
> chunks are
> stored using xvmalloc memory allocator which is already being used by
> zram
> driver for the same purpose. Zero-filled pages are checked and no
> memory is
> allocated for them.

I'm curious about this policy choice. I can see why one
would want to ensure that the average page is compressed
to less than PAGE_SIZE/2, and preferably PAGE_SIZE/2
minus the overhead of the data structures necessary to
track the page. And I see that this makes no difference
when the reclamation algorithm is random (as it is for
now). But once there is some better reclamation logic,
I'd hope that this compression factor restriction would
be lifted and replaced with something much higher. IIRC,
compression is much more expensive than decompression
so there's no CPU-overhead argument here either,
correct?

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3 4
Prev: [PATCH 1/4 update] firewire: cdev: some clarifications to the API documentation
Next: Regression 2.6.34+ -> 2.6.34-rc5: radeon KMS rs780 problems