From: Herbert Xu on
On Mon, May 31, 2010 at 10:44:30PM -0400, Mikulas Patocka wrote:
> Questions:
>
> If you are optimizing it,
>
> 1) why don't you optimize it in such a way that if one CPU submits
> requests, the crypto work is spread among all the CPUs? Currently it
> spreads the work only if different CPUs submit it.

Because the crypto layer already provides that functionality,
through pcrypt. By instantiating pcrypt for a given algorithm,
you can parallelise that algorithm across CPUs.

This would be inappropriate for upper layer code as they do not
know whether the underlying algorithm should be parallelised,
e.g., a PCI offload board certainly should not be parallelised.

> 2) why not optimize software async crypto daemon (crypt/cryptd.c) instead
> of dm-crypt, so that all kernel subsystems can actually take advantage of
> those multi-CPU optimizations, not just dm-crypt?

Because you cannot do what Andi is doing here in the crypto layer.
What dm-crypt does today (which hasn't always been the case BTW)
hides information away (the original submitting CPU) that we cannot
recreate.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert(a)gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on
, Mikulas Patocka wrote:
> Questions:
>
> If you are optimizing it,
>
> 1) why don't you optimize it in such a way that if one CPU submits
> requests, the crypto work is spread among all the CPUs? Currently it
> spreads the work only if different CPUs submit it.

This case is only useful with very slow CPUs and is handled by pcrypt
in theory

(but I haven't tested it)

>
> 2) why not optimize software async crypto daemon (crypt/cryptd.c) instead
> of dm-crypt, so that all kernel subsystems can actually take advantage of
> those multi-CPU optimizations, not just dm-crypt?

Normally most subsystems are multi-CPU already, unless they limit
themselves artitifically like dm-crypt.

For dm-crypt would be wasteful to funnel everything through two single CPU threads just
to spread it out again. That is why I also used per CPU IO threads too.

-Andi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mikulas Patocka on


On Tue, 1 Jun 2010, Herbert Xu wrote:

> On Mon, May 31, 2010 at 10:44:30PM -0400, Mikulas Patocka wrote:
> > Questions:
> >
> > If you are optimizing it,
> >
> > 1) why don't you optimize it in such a way that if one CPU submits
> > requests, the crypto work is spread among all the CPUs? Currently it
> > spreads the work only if different CPUs submit it.
>
> Because the crypto layer already provides that functionality,
> through pcrypt. By instantiating pcrypt for a given algorithm,
> you can parallelise that algorithm across CPUs.

And how can I use pcrypt for dm-crypt? After a quick look at pcrypt
sources, it seems to be dependent on aead and not useable for general
encryption algorithms at all.

I tried cryptd --- in theory it should work by requesting the algorithm
like cryptd(cbc(aes)) --- but if I replace "%s(%s)" with "cryptd(%s(%s))"
in dm-crypt sources it locks up and doesn't work.

> This would be inappropriate for upper layer code as they do not
> know whether the underlying algorithm should be parallelised,
> e.g., a PCI offload board certainly should not be parallelised.

The upper layer should ideally request "cbc(aes)" and the crypto routine
should select the most efficient implementation --- sync on single-core
system, async with cryptd on multi-core system and async with hardware
implementation if you have HIFN crypto card.

> > 2) why not optimize software async crypto daemon (crypt/cryptd.c) instead
> > of dm-crypt, so that all kernel subsystems can actually take advantage of
> > those multi-CPU optimizations, not just dm-crypt?
>
> Because you cannot do what Andi is doing here in the crypto layer.
> What dm-crypt does today (which hasn't always been the case BTW)
> hides information away (the original submitting CPU) that we cannot
> recreate.

It is pointless to track the submitting CPU.

Majority of time is consumed by raw encyption/decryption. And you must
optimize that --- i.e. on SMP system make sure that cryptd distributes the
work across all available cores.

When you get this right --- i.e. when reading encrypted disk, you get
either read speed equivalent to non-encrypted disk or all the cores are
saturated, then you can start thinking about other optimizations.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Herbert Xu on
On Wed, Jun 02, 2010 at 01:10:00AM -0400, Mikulas Patocka wrote:
>
> And how can I use pcrypt for dm-crypt? After a quick look at pcrypt
> sources, it seems to be dependent on aead and not useable for general
> encryption algorithms at all.

You instantiate a pcrypt variant of whatever algorithm that you're
using. For example, if you're using XTS then you should instantiate
pcrypt(xts(aes)). Currently you must use tcrypt to instantiate.

> I tried cryptd --- in theory it should work by requesting the algorithm
> like cryptd(cbc(aes)) --- but if I replace "%s(%s)" with "cryptd(%s(%s))"
> in dm-crypt sources it locks up and doesn't work.

cryptd is something else altogether. However, it certainly should
not lock up. What kernel version is this?

> > This would be inappropriate for upper layer code as they do not
> > know whether the underlying algorithm should be parallelised,
> > e.g., a PCI offload board certainly should not be parallelised.
>
> The upper layer should ideally request "cbc(aes)" and the crypto routine
> should select the most efficient implementation --- sync on single-core
> system, async with cryptd on multi-core system and async with hardware
> implementation if you have HIFN crypto card.

That's exactly what will happen when the admin instantiates pcrypt.
dm-crypt simply needs to specify cbc(aes) and it will get pcrypt
automatically.

The point is that on a modern processor like Nehalem you don't need
pcrypt.

> It is pointless to track the submitting CPU.

No you are wrong.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert(a)gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Herbert Xu on
On Wed, Jun 02, 2010 at 01:15:43AM -0400, Mikulas Patocka wrote:
>
> Almost every CPU is "very slow" so that it lags behind disk when
> encrypting. CPUs with hardware AES may be the exception.

I would not call a platform like Nehalem the exception.

> If one CPU submits I/O for 10MB of data, your patch makes no
> paralelization at all. Because all those 10MB will be encrypted by the
> same CPU that submitted it.

He doesn't need to. This is already solved by pcrypt.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert(a)gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/