Hardware Accelerated MD RAID5: Introduction [Kernel]

Prev: SATA: Add PCI-ID
Next: dmaengine: enable multiple clients and operations

From: Jeff Garzik on 11 Sep 2006 19:40

Dan Williams wrote:
> Neil,
>
> The following patches implement hardware accelerated raid5 for the Intel
> Xscale® series of I/O Processors. The MD changes allow stripe
> operations to run outside the spin lock in a work queue. Hardware
> acceleration is achieved by using a dma-engine-aware work queue routine
> instead of the default software only routine.
>
> Since the last release of the raid5 changes many bug fixes and other
> improvements have been made as a result of stress testing. See the per
> patch change logs for more information about what was fixed. This
> release is the first release of the full dma implementation.
>
> The patches touch 3 areas, the md-raid5 driver, the generic dmaengine
> interface, and a platform device driver for IOPs. The raid5 changes
> follow your comments concerning making the acceleration implementation
> similar to how the stripe cache handles I/O requests. The dmaengine
> changes are the second release of this code. They expand the interface
> to handle more than memcpy operations, and add a generic raid5-dma
> client. The iop-adma driver supports dma memcpy, xor, xor zero sum, and
> memset across all IOP architectures (32x, 33x, and 13xx).
>
> Concerning the context switching performance concerns raised at the
> previous release, I have observed the following. For the hardware
> accelerated case it appears that performance is always better with the
> work queue than without since it allows multiple stripes to be operated
> on simultaneously. I expect the same for an SMP platform, but so far my
> testing has been limited to IOPs. For a single-processor
> non-accelerated configuration I have not observed performance
> degradation with work queue support enabled, but in the Kconfig option
> help text I recommend disabling it (CONFIG_MD_RAID456_WORKQUEUE).
>
> Please consider the patches for -mm.
>
> -Dan
>
> [PATCH 01/19] raid5: raid5_do_soft_block_ops
> [PATCH 02/19] raid5: move write operations to a workqueue
> [PATCH 03/19] raid5: move check parity operations to a workqueue
> [PATCH 04/19] raid5: move compute block operations to a workqueue
> [PATCH 05/19] raid5: move read completion copies to a workqueue
> [PATCH 06/19] raid5: move the reconstruct write expansion operation to a workqueue
> [PATCH 07/19] raid5: remove compute_block and compute_parity5
> [PATCH 08/19] dmaengine: enable multiple clients and operations
> [PATCH 09/19] dmaengine: reduce backend address permutations
> [PATCH 10/19] dmaengine: expose per channel dma mapping characteristics to clients
> [PATCH 11/19] dmaengine: add memset as an asynchronous dma operation
> [PATCH 12/19] dmaengine: dma_async_memcpy_err for DMA engines that do not support memcpy
> [PATCH 13/19] dmaengine: add support for dma xor zero sum operations
> [PATCH 14/19] dmaengine: add dma_sync_wait
> [PATCH 15/19] dmaengine: raid5 dma client
> [PATCH 16/19] dmaengine: Driver for the Intel IOP 32x, 33x, and 13xx RAID engines
> [PATCH 17/19] iop3xx: define IOP3XX_REG_ADDR[32|16|8] and clean up DMA/AAU defs
> [PATCH 18/19] iop3xx: Give Linux control over PCI (ATU) initialization
> [PATCH 19/19] iop3xx: IOP 32x and 33x support for the iop-adma driver

Can devices like drivers/scsi/sata_sx4.c or drivers/scsi/sata_promise.c
take advantage of this? Promise silicon supports RAID5 XOR offload.

If so, how? If not, why not? :)

Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Dan Williams on 11 Sep 2006 20:00

On 9/11/06, Jeff Garzik <jeff(a)garzik.org> wrote:
> Dan Williams wrote:
> > Neil,
> >
> > The following patches implement hardware accelerated raid5 for the Intel
> > Xscale(r) series of I/O Processors. The MD changes allow stripe
> > operations to run outside the spin lock in a work queue. Hardware
> > acceleration is achieved by using a dma-engine-aware work queue routine
> > instead of the default software only routine.
> >
> > Since the last release of the raid5 changes many bug fixes and other
> > improvements have been made as a result of stress testing. See the per
> > patch change logs for more information about what was fixed. This
> > release is the first release of the full dma implementation.
> >
> > The patches touch 3 areas, the md-raid5 driver, the generic dmaengine
> > interface, and a platform device driver for IOPs. The raid5 changes
> > follow your comments concerning making the acceleration implementation
> > similar to how the stripe cache handles I/O requests. The dmaengine
> > changes are the second release of this code. They expand the interface
> > to handle more than memcpy operations, and add a generic raid5-dma
> > client. The iop-adma driver supports dma memcpy, xor, xor zero sum, and
> > memset across all IOP architectures (32x, 33x, and 13xx).
> >
> > Concerning the context switching performance concerns raised at the
> > previous release, I have observed the following. For the hardware
> > accelerated case it appears that performance is always better with the
> > work queue than without since it allows multiple stripes to be operated
> > on simultaneously. I expect the same for an SMP platform, but so far my
> > testing has been limited to IOPs. For a single-processor
> > non-accelerated configuration I have not observed performance
> > degradation with work queue support enabled, but in the Kconfig option
> > help text I recommend disabling it (CONFIG_MD_RAID456_WORKQUEUE).
> >
> > Please consider the patches for -mm.
> >
> > -Dan
> >
> > [PATCH 01/19] raid5: raid5_do_soft_block_ops
> > [PATCH 02/19] raid5: move write operations to a workqueue
> > [PATCH 03/19] raid5: move check parity operations to a workqueue
> > [PATCH 04/19] raid5: move compute block operations to a workqueue
> > [PATCH 05/19] raid5: move read completion copies to a workqueue
> > [PATCH 06/19] raid5: move the reconstruct write expansion operation to a workqueue
> > [PATCH 07/19] raid5: remove compute_block and compute_parity5
> > [PATCH 08/19] dmaengine: enable multiple clients and operations
> > [PATCH 09/19] dmaengine: reduce backend address permutations
> > [PATCH 10/19] dmaengine: expose per channel dma mapping characteristics to clients
> > [PATCH 11/19] dmaengine: add memset as an asynchronous dma operation
> > [PATCH 12/19] dmaengine: dma_async_memcpy_err for DMA engines that do not support memcpy
> > [PATCH 13/19] dmaengine: add support for dma xor zero sum operations
> > [PATCH 14/19] dmaengine: add dma_sync_wait
> > [PATCH 15/19] dmaengine: raid5 dma client
> > [PATCH 16/19] dmaengine: Driver for the Intel IOP 32x, 33x, and 13xx RAID engines
> > [PATCH 17/19] iop3xx: define IOP3XX_REG_ADDR[32|16|8] and clean up DMA/AAU defs
> > [PATCH 18/19] iop3xx: Give Linux control over PCI (ATU) initialization
> > [PATCH 19/19] iop3xx: IOP 32x and 33x support for the iop-adma driver
>
> Can devices like drivers/scsi/sata_sx4.c or drivers/scsi/sata_promise.c
> take advantage of this? Promise silicon supports RAID5 XOR offload.
>
> If so, how? If not, why not? :)
This is a frequently asked question, Alan Cox had the same one at OLS.
The answer is "probably." The only complication I currently see is
where/how the stripe cache is maintained. With the IOPs its easy
because the DMA engines operate directly on kernel memory. With the
Promise card I believe they have memory on the card and it's not clear
to me if the XOR engines on the card can deal with host memory. Also,
MD would need to be modified to handle a stripe cache located on a
device, or somehow synchronize its local cache with card in a manner
that is still able to beat software only MD.

> Jeff

Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeff Garzik on 11 Sep 2006 22:50

Dan Williams wrote:
> This is a frequently asked question, Alan Cox had the same one at OLS.
> The answer is "probably." The only complication I currently see is
> where/how the stripe cache is maintained. With the IOPs its easy
> because the DMA engines operate directly on kernel memory. With the
> Promise card I believe they have memory on the card and it's not clear
> to me if the XOR engines on the card can deal with host memory. Also,
> MD would need to be modified to handle a stripe cache located on a
> device, or somehow synchronize its local cache with card in a manner
> that is still able to beat software only MD.

sata_sx4 operates through [standard PC] memory on the card, and you use
a DMA engine to copy memory to/from the card.

[select chipsets supported by] sata_promise operates directly on host
memory.

So, while sata_sx4 is farther away from your direct-host-memory model,
it also has much more potential for RAID acceleration: ideally, RAID1
just copies data to the card once, then copies the data to multiple
drives from there. Similarly with RAID5, you can eliminate copies and
offload XOR, presuming the drives are all connected to the same card.

Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Dan Williams on 12 Sep 2006 01:50

On 9/11/06, Jeff Garzik <jeff(a)garzik.org> wrote:
> Dan Williams wrote:
> > This is a frequently asked question, Alan Cox had the same one at OLS.
> > The answer is "probably." The only complication I currently see is
> > where/how the stripe cache is maintained. With the IOPs its easy
> > because the DMA engines operate directly on kernel memory. With the
> > Promise card I believe they have memory on the card and it's not clear
> > to me if the XOR engines on the card can deal with host memory. Also,
> > MD would need to be modified to handle a stripe cache located on a
> > device, or somehow synchronize its local cache with card in a manner
> > that is still able to beat software only MD.
>
> sata_sx4 operates through [standard PC] memory on the card, and you use
> a DMA engine to copy memory to/from the card.
>
> [select chipsets supported by] sata_promise operates directly on host
> memory.
>
> So, while sata_sx4 is farther away from your direct-host-memory model,
> it also has much more potential for RAID acceleration: ideally, RAID1
> just copies data to the card once, then copies the data to multiple
> drives from there. Similarly with RAID5, you can eliminate copies and
> offload XOR, presuming the drives are all connected to the same card.
In the sata_promise case its straight forward, all that is needed is
dmaengine drivers for the xor and memcpy engines. This would be
similar to the current I/OAT model where dma resources are provided by
a PCI function. The sata_sx4 case would need a different flavor of
the dma_do_raid5_block_ops routine, one that understands where the
cache is located. MD would also need the capability to bypass the
block layer since the data will have already been transferred to the
card by a stripe cache operation

The RAID1 case give me pause because it seems any work along these
lines requires that the implementation work for both MD and DM, which
then eventually leads to being tasked with merging the two.

> Jeff

Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Jeff Garzik on 13 Sep 2006 00:10

Dan Williams wrote:
> On 9/11/06, Jeff Garzik <jeff(a)garzik.org> wrote:
>> Dan Williams wrote:
>> > This is a frequently asked question, Alan Cox had the same one at OLS.
>> > The answer is "probably." The only complication I currently see is
>> > where/how the stripe cache is maintained. With the IOPs its easy
>> > because the DMA engines operate directly on kernel memory. With the
>> > Promise card I believe they have memory on the card and it's not clear
>> > to me if the XOR engines on the card can deal with host memory. Also,
>> > MD would need to be modified to handle a stripe cache located on a
>> > device, or somehow synchronize its local cache with card in a manner
>> > that is still able to beat software only MD.
>>
>> sata_sx4 operates through [standard PC] memory on the card, and you use
>> a DMA engine to copy memory to/from the card.
>>
>> [select chipsets supported by] sata_promise operates directly on host
>> memory.
>>
>> So, while sata_sx4 is farther away from your direct-host-memory model,
>> it also has much more potential for RAID acceleration: ideally, RAID1
>> just copies data to the card once, then copies the data to multiple
>> drives from there. Similarly with RAID5, you can eliminate copies and
>> offload XOR, presuming the drives are all connected to the same card.
> In the sata_promise case its straight forward, all that is needed is
> dmaengine drivers for the xor and memcpy engines. This would be
> similar to the current I/OAT model where dma resources are provided by
> a PCI function. The sata_sx4 case would need a different flavor of
> the dma_do_raid5_block_ops routine, one that understands where the
> cache is located. MD would also need the capability to bypass the
> block layer since the data will have already been transferred to the
> card by a stripe cache operation
>
> The RAID1 case give me pause because it seems any work along these
> lines requires that the implementation work for both MD and DM, which
> then eventually leads to being tasked with merging the two.

RAID5 has similar properties. If all devices in a RAID5 array are
attached to a single SX4 card, then a high level write to the RAID5
array is passed directly to the card, which then performs XOR, striping,
etc.

Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

| Next | Last
Pages: 1 2 3
Prev: SATA: Add PCI-ID
Next: dmaengine: enable multiple clients and operations