From: Michał Nazarewicz on
Hello Jonathan,

Thank you for review and comments!

> On Tue, 20 Jul 2010 17:51:25 +0200 Michal Nazarewicz wrote:
>> The Contiguous Memory Allocator framework is a set of APIs for
>> allocating physically contiguous chunks of memory.
>>
>> Various chips require contiguous blocks of memory to operate. Those
>> chips include devices such as cameras, hardware video decoders and
>> encoders, etc.

On Tue, 20 Jul 2010 22:52:38 +0200, Jonathan Corbet <corbet(a)lwn.net> wrote:
> Certainly this is something that many of us have run into; a general
> solution would make life easier. I do wonder if this implementation
> isn't a bit more complex than is really needed, though.
>
>> diff --git a/Documentation/cma.txt b/Documentation/cma.txt
>
> "cma.txt" is not a name that will say much to people browsing the
> directory, especially since you didn't add a 00-INDEX entry for it. Maybe
> something like contiguous-memory.txt would be better?

Will fix.

>> + For instance, let say that there are two memory banks and for
>> + performance reasons a device uses buffers in both of them. In
>> + such case, the device driver would define two kinds and use it for
>> + different buffers. Command line arguments could look as follows:
>> +
>> + cma=a=32M@0,b=32M(a)512M cma_map=foo/a=a;foo/b=b

> About the time I get here I really have to wonder if we *really* need all
> of this. A rather large portion of the entire patch is parsing code. Are
> there real-world use cases for this kind of functionality?

As of "cma" parameter: we encountered a system where all of the information
(expect for allocator) that it is possible to specify is needed.

1. The size is needed from obvious reasons.
2. Our platform have two banks of memory and one of the devices needs
some buffers to be allocated in one bank and other buffers in the
other bank. The start address that can be specified let us
specify regions in both banks.
3. At least one of our drivers needs a buffer for firmware that is aligned
to 128K (if recall correctly). Due to other, unrelated reasons it needs
to be in a region on its own so we need to reserve a region aligned
at least to 128K.
4. As of allocator, we use only best-fit but I believe that letting user
specify desired allocator is desirable.

As of "cma_map" parameter: It is needed because different devices are
assigned to different regions. Also, at least one of our drivers uses
three kinds of memory (bank 1, bank 2 and firmware) and hence we also
need the optional kind.

The remaining question is whether we need the pattern matching (like '?'
and '*'). I agree that the matching code may be not the most beautiful
piece of software but I believe it may be useful. In particular letting
user write something like the following may be nice:

cma_map=foo-dev=foo-region;*/*=bar-region

This lets one say that foo-dev should use its own region and all other
devices should share the other region. Similarly, if at one point the driver
I described above (that uses 3 kinds) were to receive a firmware upgrade or
be installed on different platform and regions in two banks would no longer
be necessary the command line could be set to:

cma_map=baz-dev/firmware=baz-firmware;baz-dev/*=baz-region

and everything would start working fine without the need to change the
driver itself -- it would be completely unaware of the change.

Generally I see that the asterisk may be quite useful.

>> + And whenever the driver allocated the memory it would specify the
>> + kind of memory:
>> +
>> + buffer1 = cma_alloc(dev, 1 << 20, 0, "a");
>> + buffer2 = cma_alloc(dev, 1 << 20, 0, "b");

> This example, above, is not consistent with:

>> +
>> + There are four calls provided by the CMA framework to devices. To
>> + allocate a chunk of memory cma_alloc() function needs to be used:
>> +
>> + unsigned long cma_alloc(const struct device *dev,
>> + const char *kind,
>> + unsigned long size,
>> + unsigned long alignment);

> It looks like the API changed and the example didn't get updated?

Yep, will fix.

>> +
>> + If required, device may specify alignment that the chunk need to
>> + satisfy. It have to be a power of two or zero. The chunks are
>> + always aligned at least to a page.
>
> So is the alignment specified in bytes or pages?

In bytes, will fix.

>> + Allocated chunk is freed via a cma_put() function:
>> +
>> + int cma_put(unsigned long addr);
>> +
>> + It takes physical address of the chunk as an argument and
>> + decreases it's reference counter. If the counter reaches zero the
>> + chunk is freed. Most of the time users do not need to think about
>> + reference counter and simply use the cma_put() as a free call.
>
> A return value from a put() function is mildly different;

Not sure what you mean. cma_put() returns either -ENOENT if there is no
chunk with given address or whatever kref_put() returned.

> when would that value be useful?

I dunno. I'm just returning what kref_put() returns.

>
>> + If one, however, were to share a chunk with others built in
>> + reference counter may turn out to be handy. To increment it, one
>> + needs to use cma_get() function:
>> +
>> + int cma_put(unsigned long addr);
>
> Somebody's been cut-n-pasting a little too quickly...:)

Will fix.

>> + Creating an allocator for CMA needs four functions to be
>> + implemented.
>> +
>> +
>> + The first two are used to initialise an allocator far given driver
>> + and clean up afterwards:
>> +
>> + int cma_foo_init(struct cma_region *reg);
>> + void cma_foo_done(struct cma_region *reg);
>> +
>> + The first is called during platform initialisation. The
>> + cma_region structure has saved starting address of the region as
>> + well as its size. It has also alloc_params field with optional
>> + parameters passed via command line (allocator is free to interpret
>> + those in any way it pleases). Any data that allocate associated
>> + with the region can be saved in private_data field.
>> +
>> + The second call cleans up and frees all resources the allocator
>> + has allocated for the region. The function can assume that all
>> + chunks allocated form this region have been freed thus the whole
>> + region is free.
>> +
>> +
>> + The two other calls are used for allocating and freeing chunks.
>> + They are:
>> +
>> + struct cma_chunk *cma_foo_alloc(struct cma_region *reg,
>> + unsigned long size,
>> + unsigned long alignment);
>> + void cma_foo_free(struct cma_chunk *chunk);
>> +
>> + As names imply the first allocates a chunk and the other frees
>> + a chunk of memory. It also manages a cma_chunk object
>> + representing the chunk in physical memory.
>> +
>> + Either of those function can assume that they are the only thread
>> + accessing the region. Therefore, allocator does not need to worry
>> + about concurrency.
>> +
>> +
>> + When allocator is ready, all that is left is register it by adding
>> + a line to "mm/cma-allocators.h" file:
>> +
>> + CMA_ALLOCATOR("foo", foo)
>> +
>> + The first "foo" is a named that will be available to use with
>> + command line argument. The second is the part used in function
>> + names.

> This is a bit of an awkward way to register new allocators. Why not just
> have new allocators fill in an operations structure can call something like
> cma_allocator_register() at initialization time? That would let people
> write allocators as modules and would eliminate the need to add allocators
> to a central include file. It would also get rid of some ugly and (IMHO)
> unnecessary preprocessor hackery.

At the moment the list of allocators has to be available early during boot up.
This is because regions are initialised (that is allocators are attached to
regions) from initcall (subsys_initcall to be precise). This means
allocators cannot be modules (ie. they cannot be dynamically loaded) but
only compiled in. Because of all those, I decided that creating an array with
all allocators would be, maybe not very beautiful, but a good solution.

Even if I were to provide a “cma_register_allocator()” call it would have to be
called before subsys initcalls and as such it would be of little usefulness I
believe.

I agree that it would be nice to be able to have allocators loaded dynamically
but it is not possible as of yet.

>> +** Future work
>> +
>> + In the future, implementation of mechanisms that would allow the
>> + free space inside the regions to be used as page cache, filesystem
>> + buffers or swap devices is planned. With such mechanisms, the
>> + memory would not be wasted when not used.
>
> Ouch. You'd need to be able to evacuate that space again when it's needed,
> or the whole point of CMA has been lost. Once again, is it worth the
> complexity?

I believe it is. All of the regions could well take like 64M or so. If most
of the times the device drivers would not be used the space would be wasted.
If, instead, it could be used for some read-only data or other data that is
easy to remove from memory the whole system could benefit.

Still, this is a future work so for now it's in the dominion of dreams and
good-night stories. ;) Bottom line is, we will think about it when the time
will come.

>> diff --git a/include/linux/cma-int.h b/include/linux/cma-int.h
>> new file mode 100644
>> index 0000000..b588e9b
>> --- /dev/null
>> +++ b/include/linux/cma-int.h
>
>> +struct cma_region {
>> + const char *name;
>> + unsigned long start;
>> + unsigned long size, free_space;
>> + unsigned long alignment;
>> +
>> + struct cma_allocator *alloc;
>> + const char *alloc_name;
>> + const char *alloc_params;
>> + void *private_data;
>> +
>> + unsigned users;
>> + /*
>> + * Protects the "users" and "free_space" fields and any calls
>> + * to allocator on this region thus guarantees only one call
>> + * to allocator will operate on this region..
>> + */
>> + struct mutex mutex;
>> +};

> The use of mutexes means that allocation/free functions cannot be called
> from atomic context. Perhaps that will never be a problem, but it might
> also be possible to use spinlocks instead?

Mutexes should not be a problem. In all use cases that we came up with,
allocation and freeing was done from user context when some operation
is initialised. User launches an application to record video with a
camera or launches a video player and at this moment buffers are
initialised. We don't see any use case where CMA would be used from
an interrupt or some such.

At the same time, the use of spinlocks would limit allocators (which is
probably a minor issue) but what's more it would limit our possibility to
use unused space of the regions for page cache/buffers/swap/you-name-it.

In general, I believe that requiring that cma_alloc()/cma_put() cannot be
called from atomic context have more benefits then drawbacks (the latter
could check if it is called from atomic context and if so let a worker do
the actual freeing if there would be cases where it would be nice to use
cma_put() in atomic context).

>> diff --git a/mm/cma-allocators.h b/mm/cma-allocators.h
>> new file mode 100644
>> index 0000000..564f705
>> --- /dev/null
>> +++ b/mm/cma-allocators.h
>> @@ -0,0 +1,42 @@
>> +#ifdef __CMA_ALLOCATORS_H
>> +
>> +/* List all existing allocators here using CMA_ALLOCATOR macro. */
>> +
>> +#ifdef CONFIG_CMA_BEST_FIT
>> +CMA_ALLOCATOR("bf", bf)
>> +#endif
>
> This is the kind of thing I think it would be nice to avoid; is there any
> real reason why allocators need to be put into this central file?
>
> This is some weird ifdef stuff as well; it processes the CMA_ALLOCATOR()
> invocations if it's included twice?

I wanted to make registering of entries in the array as easy as possible.
The idea is that allocator authors just add a single line to the file and
do not have to worry about the rest. To put it in other words, add a line
and do not worry about how it works. ;)

>> +
>> +# undef CMA_ALLOCATOR
>> +#else
>> +# define __CMA_ALLOCATORS_H
>> +
>> +/* Function prototypes */
>> +# ifndef __LINUX_CMA_ALLOCATORS_H
>> +# define __LINUX_CMA_ALLOCATORS_H
>> +# define CMA_ALLOCATOR(name, infix) \
>> + extern int cma_ ## infix ## _init(struct cma_region *); \
>> + extern void cma_ ## infix ## _cleanup(struct cma_region *); \
>> + extern struct cma_chunk * \
>> + cma_ ## infix ## _alloc(struct cma_region *, \
>> + unsigned long, unsigned long); \
>> + extern void cma_ ## infix ## _free(struct cma_chunk *);
>> +# include "cma-allocators.h"
>> +# endif
>> +
>> +/* The cma_allocators array */
>> +# ifdef CMA_ALLOCATORS_LIST
>> +# define CMA_ALLOCATOR(_name, infix) { \
>> + .name = _name, \
>> + .init = cma_ ## infix ## _init, \
>> + .cleanup = cma_ ## infix ## _cleanup, \
>> + .alloc = cma_ ## infix ## _alloc, \
>> + .free = cma_ ## infix ## _free, \
>> + },
>
> Different implementations of the macro in different places in the same
> kernel can cause confusion. To what end? As I said before, a simple
> registration function called by the allocators would eliminate the need for
> this kind of stuff.

Yes, it would, expect at the moment, a registration function may be not
the best option... I'm still trying to think how it could work (dynamic
allocators that is).

>> diff --git a/mm/cma-best-fit.c b/mm/cma-best-fit.c
>
> [...]
>
>> +int cma_bf_init(struct cma_region *reg)
>> +{
>> + struct cma_bf_private *prv;
>> + struct cma_bf_item *item;
>> +
>> + prv = kzalloc(sizeof *prv, GFP_NOWAIT);
>> + if (unlikely(!prv))
>> + return -ENOMEM;
>
> I'll say this once, but the comment applies all over this code: I hate it
> when people go nuts with likely/unlikely(). This is an initialization
> function, we don't actually care if the branch prediction gets it wrong.
> Classic premature optimization. The truth of the matter is that
> *programmers* often get this wrong. Have you profiled all these
> ocurrences? Maybe it would be better to take them out?

My rule of thumbs is that errors are unlikely and I use that consistently
among all of my code. The other rational is that we want error-free code
to work as fast as possible letting the error-recovery path be slower.

>> +struct cma_chunk *cma_bf_alloc(struct cma_region *reg,
>> + unsigned long size, unsigned long alignment)
>> +{
>> + struct cma_bf_private *prv = reg->private_data;
>> + struct rb_node *node = prv->by_size_root.rb_node;
>> + struct cma_bf_item *item = NULL;
>> + unsigned long start, end;
>> +
>> + /* First first item that is large enough */
>> + while (node) {
>> + struct cma_bf_item *i =
>> + rb_entry(node, struct cma_bf_item, by_size);
>
> This is about where I start to wonder about locking. I take it that the
> allocator code is relying upon locking at the CMA level to prevent
> concurrent calls? If so, it would be good to document what guarantees the
> CMA level provides.

The cma-int.h says:

> /**
> * struct cma_allocator - a CMA allocator.
[...]
> * @alloc: Allocates a chunk of memory of given size in bytes and
> * with given alignment. Alignment is a power of
> * two (thus non-zero) and callback does not need to check it.
> * May also assume that it is the only call that uses given
> * region (ie. access to the region is synchronised with
> * a mutex). This has to allocate the chunk object (it may be
> * contained in a bigger structure with allocator-specific data.
> * May sleep.
> * @free: Frees allocated chunk. May also assume that it is the only
> * call that uses given region. This has to kfree() the chunk
> * object as well. May sleep.
> */

Do you think that it needs more clarification? In more places? If so where?

>> +/************************* Basic Tree Manipulation *************************/
>> +
>> +#define __CMA_BF_HOLE_INSERT(root, node, field) ({ \
>> + bool equal = false; \
>> + struct rb_node **link = &(root).rb_node, *parent = NULL; \
>> + const unsigned long value = item->field; \
>> + while (*link) { \
>> + struct cma_bf_item *i; \
>> + parent = *link; \
>> + i = rb_entry(parent, struct cma_bf_item, node); \
>> + link = value <= i->field \
>> + ? &parent->rb_left \
>> + : &parent->rb_right; \
>> + equal = equal || value == i->field; \
>> + } \
>> + rb_link_node(&item->node, parent, link); \
>> + rb_insert_color(&item->node, &root); \
>> + equal; \
>> +})
>
> Is there a reason why this is a macro? The code might be more readable if
> you just wrote out the two versions that you need.

I didn't want to duplicate the code but will fix per request.

>> diff --git a/mm/cma.c b/mm/cma.c
>> new file mode 100644
>> index 0000000..6a0942f
>> --- /dev/null
>> +++ b/mm/cma.c
>
> [...]
>
>> +static const char *__must_check
>> +__cma_where_from(const struct device *dev, const char *kind)
>> +{
>> + /*
>> + * This function matches the pattern given at command line
>> + * parameter agains given device name and kind. Kind may be
>> + * of course NULL or an emtpy string.
>> + */
>> +
>> + const char **spec, *name;
>> + int name_matched = 0;
>> +
>> + /* Make sure dev was given and has name */
>> + if (unlikely(!dev))
>> + return ERR_PTR(-EINVAL);
>> +
>> + name = dev_name(dev);
>> + if (WARN_ON(!name || !*name))
>> + return ERR_PTR(-EINVAL);
>> +
>> + /* kind == NULL is just like an empty kind */
>> + if (!kind)
>> + kind = "";
>> +
>> + /*
>> + * Now we go throught the cma_map array. It is an array of
>> + * pointers to chars (ie. array of strings) so in each
>> + * iteration we take each of the string. The strings is
>> + * basically what user provided at the command line separated
>> + * by semicolons.
>> + */
>> + for (spec = cma_map; *spec; ++spec) {
>> + /*
>> + * This macro tries to match pattern pointed by s to
>> + * @what. If, while reading the spec, we ecnounter
>> + * comma it means that the pattern does not match and
>> + * we need to start over with another spec. If there
>> + * is a character that does not match, we neet to try
>> + * again looking if there is another spec.
>> + */
>> +#define TRY_MATCH(what) do { \
>> + const char *c = what; \
>> + for (; *s != '*' && *c; ++c, ++s) \
>> + if (*s == ',') \
>> + goto again; \
>> + else if (*s != '?' && *c != *s) \
>> + goto again_maybe; \
>> + if (*s == '*') \
>> + ++s; \
>> + } while (0)

> This kind of thing rarely contributes to the readability or maintainability
> of the code. Is it really necessary? Or (as asked before) is all this
> functionality really necessary?

Removed the macro. As of whether the functionality is needed, as explained
above, I believe it is.

> One other comment: it might be nice if drivers could provide allocation
> regions of their own. The viafb driver, for example, really needs an
> allocator to hand out chunks of framebuffer memory - including large chunks
> for things like video frames. If that driver could hand responsibility
> for that over to CMA, it could eliminate the need for yet another
> driver-quality memory manager.

Are we talking about viafb having access to some “magic” memory known only for
itself? If so, it should be fairly easy to let drivers add their own, already
reserved regions. As a matter of fact, the region could be completely unnamed
and private for the device, ie:

... device reserves some memory ...
struct cma_region *reg = kmalloc(sizeof *reg, GFP_KERNEL);

reg->size = ... size ...;
reg->start = ... start ...;
reg->alloc_name = ... allocator name or NULL ...;

ret = cma_region_init(reg);
if (ret)
/* Failed to initialise */
return ret;

and then later, when allocating:

addr = cma_alloc_from_region(reg, ... size ..., ... alignment ...);

and when unloading:

cma_region_cleanup(reg);

what do you think?

On Wed, 21 Jul 2010 02:12:39 +0200, Jonathan Corbet <corbet(a)lwn.net> wrote:
> One other thing occurred to me as I was thinking about this...

>> + There are four calls provided by the CMA framework to devices. To
>> + allocate a chunk of memory cma_alloc() function needs to be used:
>> +
>> + unsigned long cma_alloc(const struct device *dev,
>> + const char *kind,
>> + unsigned long size,
>> + unsigned long alignment);
>
> The purpose behind this interface, I believe, is pretty much always
> going to be to allocate memory for DMA buffers. Given that, might it
> make more sense to integrate the API with the current DMA mapping API?
> Then the allocation function could stop messing around with long values
> and, instead, just hand back a void * kernel-space pointer and a
> dma_addr_t to hand to the device. That would make life a little easier
> in driverland...

In our use cases mapping the region was never needed. It is mostly used
with V4L which handles mapping, cache coherency, etc. It also is outside
of the scope of the CMA framework.

As of changing the type to dma_addr_t it may be a good idea, I'm going to
change that.

--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michał Nazarewicz on
On Tue, 20 Jul 2010 21:38:18 +0200, Daniel Walker <dwalker(a)codeaurora.org> wrote:

> On Tue, 2010-07-20 at 21:14 +0200, Michał Nazarewicz wrote:
>> On Tue, 20 Jul 2010 20:15:24 +0200, Daniel Walker <dwalker(a)codeaurora.org> wrote:
>>
>> > On Tue, 2010-07-20 at 17:51 +0200, Michal Nazarewicz wrote:
>> >> +** Use cases
>> >> +
>> >> + Lets analyse some imaginary system that uses the CMA to see how
>> >> + the framework can be used and configured.
>> >> +
>> >> +
>> >> + We have a platform with a hardware video decoder and a camera
>> >> each
>> >> + needing 20 MiB of memory in worst case. Our system is written in
>> >> + such a way though that the two devices are never used at the same
>> >> + time and memory for them may be shared. In such a system the
>> >> + following two command line arguments would be used:
>> >> +
>> >> + cma=r=20M cma_map=video,camera=r
>> >
>> > This seems inelegant to me.. It seems like these should be connected
>> > with the drivers themselves vs. doing it on the command like for
>> > everything. You could have the video driver declare it needs 20megs, and
>> > the the camera does the same but both indicate it's shared ..
>> >
>> > If you have this disconnected from the drivers it will just cause
>> > confusion, since few will know what these parameters should be for a
>> > given driver set. It needs to be embedded in the kernel.
>>
>> I see your point but the problem is that devices drivers don't know the
>> rest of the system neither they know what kind of use cases the system
>> should support.
>>
>>
>> Lets say, we have a camera, a JPEG encoder, a video decoder and
>> scaler (ie. devices that scales raw image). We want to support the
>> following 3 use cases:
>>
>> 1. Camera's output is scaled and displayed in real-time.
>> 2. Single frame is taken from camera and saved as JPEG image.
>> 3. A video file is decoded, scaled and displayed.
>>
>> What is apparent is that camera and video decoder are never running
>> at the same time. The same situation is with JPEG encoder and scaler.
>> From this knowledge we can construct the following:
>>
>> cma=a=10M;b=10M cma_map=camera,video=a;jpeg,scaler=b
>
> It should be implicit tho. If the video driver isn't using the memory
> then it should tell your framework that the memory is not used. That way
> something else can use it.

What you are asking for is:

cma=a=100M cma_map=*/*=a

All devices will share the same region so that "if the video driver isn't
using the memory" then "something else can use it". (please excuse me quoting
you, it was stronger then me ;) ).

Driver has to little information to say whether it really stopped using
memory. Maybe the next call will be to allocate buffers for frames and
initialise the chip? Sure, some “good enough” defaults can be provided
(and the framework allows that) but still platform architect might need
more power.

> (btw, these strings your creating yikes, talk about confusing ..)

They are not that scary really. Let's look at cma:

a=10M;b=10M

Split it on semicolon:

a=10M
b=10M

and you see that it defines two regions (a and b) 10M each.

As of cma_map:

camera,video=a;jpeg,scaler=b

Again split it on semicolon:

camera,video=a
jpeg,scaler=b

Now, substitute equal sign by "use(s) region(s)":

camera,video use(s) region(s): a
jpeg,scaler use(s) region(s): b

No black magic here. ;)

>> One of the purposes of the CMA framework is to make it let device
>> drivers completely forget about the memory management and enjoy
>> a simple API.
>
> The driver, and it's maintainer, are really the best people to know how
> much memory they need and when it's used/unused. You don't really want
> to architect them out.

This might be true if there is only one device but even then it's not
always the case. If many devices need physically-contiguous memory
there is no way for them to communicate and share memory. For best
performance someone must look at them and say who gets what.

Still, with updated version it will be possible for drivers to use
private regions.

--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Mark Brown on
On Tue, Jul 20, 2010 at 09:14:58PM +0200, Micha?? Nazarewicz wrote:
> On Tue, 20 Jul 2010 20:15:24 +0200, Daniel Walker <dwalker(a)codeaurora.org> wrote:

> > If you have this disconnected from the drivers it will just cause
> > confusion, since few will know what these parameters should be for a
> > given driver set. It needs to be embedded in the kernel.

> I see your point but the problem is that devices drivers don't know the
> rest of the system neither they know what kind of use cases the system
> should support.

If this does need to be configured per system would having platform data
of some kind in the kernel not be a sensible a place to do it, or even
having a way of configuring this at runtime (after all, the set of
currently active users may vary depending on the current configuration
and keeping everything allocated all the time may be wasteful)?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michał Nazarewicz on
On Wed, 21 Jul 2010 15:52:30 +0200, Mark Brown <broonie(a)opensource.wolfsonmicro.com> wrote:

> On Tue, Jul 20, 2010 at 09:14:58PM +0200, Micha?? Nazarewicz wrote:
>> On Tue, 20 Jul 2010 20:15:24 +0200, Daniel Walker <dwalker(a)codeaurora.org> wrote:
>
>> > If you have this disconnected from the drivers it will just cause
>> > confusion, since few will know what these parameters should be for a
>> > given driver set. It needs to be embedded in the kernel.
>
>> I see your point but the problem is that devices drivers don't know the
>> rest of the system neither they know what kind of use cases the system
>> should support.
>
> If this does need to be configured per system would having platform data
> of some kind in the kernel not be a sensible a place to do it,

The current version (and the next version I'm working on) of the code
has cma_defaults() call. It is intended to be called from platform
initialisation code to provide defaults.

> or even
> having a way of configuring this at runtime (after all, the set of
> currently active users may vary depending on the current configuration
> and keeping everything allocated all the time may be wasteful)?

I am currently working on making the whole thing more dynamic. I imagine
the list of regions would stay pretty much the same after kernel has
started (that's because one cannot reliably allocate new big contiguous
memory regions) but it will be possible to change the set of rules, etc.


--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Daniel Walker on
On Wed, 2010-07-21 at 14:01 +0200, Michał Nazarewicz wrote:

> What you are asking for is:
>
> cma=a=100M cma_map=*/*=a
>
> All devices will share the same region so that "if the video driver isn't
> using the memory" then "something else can use it". (please excuse me quoting
> you, it was stronger then me ;) ).

Ok ..

> Driver has to little information to say whether it really stopped using
> memory. Maybe the next call will be to allocate buffers for frames and
> initialise the chip? Sure, some “good enough” defaults can be provided
> (and the framework allows that) but still platform architect might need
> more power.

I think your talking more about optimization .. You can take that into
account ..

> > (btw, these strings your creating yikes, talk about confusing ..)
>
> They are not that scary really. Let's look at cma:
>
> a=10M;b=10M
>
> Split it on semicolon:
>
> a=10M
> b=10M
>
> and you see that it defines two regions (a and b) 10M each.

I think your assuming a lot .. I've never seen the notation before I
wouldn't assuming there's regions or whatever ..

> As of cma_map:
>
> camera,video=a;jpeg,scaler=b
>
> Again split it on semicolon:
>
> camera,video=a
> jpeg,scaler=b
>
> Now, substitute equal sign by "use(s) region(s)":
>
> camera,video use(s) region(s): a
> jpeg,scaler use(s) region(s): b
>
> No black magic here. ;)

It way too complicated .. Users (i.e. not programmers) has to use
this ..

> >> One of the purposes of the CMA framework is to make it let device
> >> drivers completely forget about the memory management and enjoy
> >> a simple API.
> >
> > The driver, and it's maintainer, are really the best people to know how
> > much memory they need and when it's used/unused. You don't really want
> > to architect them out.
>
> This might be true if there is only one device but even then it's not
> always the case. If many devices need physically-contiguous memory
> there is no way for them to communicate and share memory. For best
> performance someone must look at them and say who gets what.

How do you think regular memory allocation work? I mean there's many
devices that all need different amounts of memory and they get along.
Yet your saying it's not possible .

Daniel


--
Sent by an consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/