From: Steven D'Aprano on
I would like to better understand some of the design choices made in
collections.defaultdict.

Firstly, to initialise a defaultdict, you do this:

from collections import defaultdict
d = defaultdict(callable, *args)

which sets an attribute of d "default_factory" which is called on key
lookups when the key is missing. If callable is None, defaultdicts are
*exactly* equivalent to built-in dicts, so I wonder why the API wasn't
added on to dict rather than a separate class that needed to be imported.
That is:

d = dict(*args)
d.default_factory = callable

If you failed to explicitly set the dict's default_factory, it would
behave precisely as dicts do now. So why create a new class that needs to
be imported, rather than just add the functionality to dict?

Is it just an aesthetic choice to support passing the factory function as
the first argument? I would think that the advantage of having it built-
in would far outweigh the cost of an explicit attribute assignment.



Second, why is the factory function not called with key? There are three
obvious kinds of "default values" a dict might want, in order of more-to-
less general:

(1) The default value depends on the key in some way: return factory(key)
(2) The default value doesn't depend on the key: return factory()
(3) The default value is a constant: return C

defaultdict supports (2) and (3):

defaultdict(factory, *args)
defaultdict(lambda: C, *args)

but it doesn't support (1). If key were passed to the factory function,
it would be easy to support all three use-cases, at the cost of a
slightly more complex factory function. E.g. the current idiom:

defaultdict(factory, *args)

would become:

defaultdict(lambda key: factory(), *args)


(There is a zeroth case as well, where the default value depends on the
key and what else is in the dict: factory(d, key). But I suspect that's
well and truly YAGNI territory.)

Thanks in advance,



--
Steven
From: Chris Rebert on
On Thu, Jul 1, 2010 at 9:11 PM, Steven D'Aprano
<steve(a)remove-this-cybersource.com.au> wrote:
> I would like to better understand some of the design choices made in
> collections.defaultdict.

Perhaps python-dev should've been CC-ed...

> Firstly, to initialise a defaultdict, you do this:
>
> from collections import defaultdict
> d = defaultdict(callable, *args)
>
> which sets an attribute of d "default_factory" which is called on key
> lookups when the key is missing. If callable is None, defaultdicts are
> *exactly* equivalent to built-in dicts, so I wonder why the API wasn't
> added on to dict rather than a separate class that needed to be imported.
> That is:
>
> d = dict(*args)
> d.default_factory = callable
>
> If you failed to explicitly set the dict's default_factory, it would
> behave precisely as dicts do now. So why create a new class that needs to
> be imported, rather than just add the functionality to dict?

Don't know personally, but here's one thought: If it was done that
way, passing around a dict could result in it getting a
default_factory set where there wasn't one before, which could lead to
strange results if you weren't anticipating that. The defaultdict
solution avoids this.

<snip>
> Second, why is the factory function not called with key?

Agree, I've never understood this. Ruby's Hash::new does it better
(http://ruby-doc.org/core/classes/Hash.html), and even supports your
case 0; it calls the equivalent of default_factory(d, key) when
generating a default value.

> There are three
> obvious kinds of "default values" a dict might want, in order of more-to-
> less general:
>
> (1) The default value depends on the key in some way: return factory(key)
> (2) The default value doesn't depend on the key: return factory()
> (3) The default value is a constant: return C
>
> defaultdict supports (2) and (3):
>
> defaultdict(factory, *args)
> defaultdict(lambda: C, *args)
>
> but it doesn't support (1). If key were passed to the factory function,
> it would be easy to support all three use-cases, at the cost of a
> slightly more complex factory function.
<snip>
> (There is a zeroth case as well, where the default value depends on the
> key and what else is in the dict: factory(d, key). But I suspect that's
> well and truly YAGNI territory.)

Cheers,
Chris
--
http://blog.rebertia.com
From: Raymond Hettinger on
On Jul 1, 9:11 pm, Steven D'Aprano <st...(a)REMOVE-THIS-
cybersource.com.au> wrote:
> I would like to better understand some of the design choices made in
> collections.defaultdict.
. . .
> If callable is None, defaultdicts are
> *exactly* equivalent to built-in dicts, so I wonder why the API wasn't
> added on to dict rather than a separate class that needed to be imported.
. . .
> Second, why is the factory function not called with key? There are three
> obvious kinds of "default values" a dict might want, in order of more-to-
> less general:
>
> (1) The default value depends on the key in some way: return factory(key)
> (2) The default value doesn't depend on the key: return factory()
> (3) The default value is a constant: return C

The __missing__() magic method lets you provide a factory with a key.
That method is supported by dict subclasses, making it easy to
create almost any desired behavior. A defaultdict is an example.
It is a dict subclass that calls a zero argument factory function.
But with __missing__() can roll your own dict subclass to meet your
other needs. A defaultdict was provided to meet one commonly
requested set of use cases (mostly ones using int() and list()
as factory functions).

From the docs at http://docs.python.org/library/stdtypes.html#mapping-types-dict
:

'''New in version 2.5: If a subclass of dict defines a method
__missing__(), if the key key is not present, the d[key] operation
calls that method with the key key as argument. The d[key] operation
then returns or raises whatever is returned or raised by the
__missing__(key) call if the key is not present. No other operations
or methods invoke __missing__(). If __missing__() is not defined,
KeyError is raised. __missing__() must be a method; it cannot be an
instance variable. For an example, see collections.defaultdict.'''

Raymond

From: Thomas Jollans on
On 07/02/2010 06:11 AM, Steven D'Aprano wrote:
> I would like to better understand some of the design choices made in
> collections.defaultdict.
>
> Firstly, to initialise a defaultdict, you do this:
>
> from collections import defaultdict
> d = defaultdict(callable, *args)
>
> which sets an attribute of d "default_factory" which is called on key
> lookups when the key is missing. If callable is None, defaultdicts are
> *exactly* equivalent to built-in dicts, so I wonder why the API wasn't
> added on to dict rather than a separate class that needed to be imported.
> That is:
>
> d = dict(*args)
> d.default_factory = callable

That's just not what dicts, a very simple and elementary data type, do.
I know this isn't really a good reason. In addition to what Chris said,
I expect this would complicate the dict code a great deal.

>
> If you failed to explicitly set the dict's default_factory, it would
> behave precisely as dicts do now. So why create a new class that needs to
> be imported, rather than just add the functionality to dict?
>
> Is it just an aesthetic choice to support passing the factory function as
> the first argument? I would think that the advantage of having it built-
> in would far outweigh the cost of an explicit attribute assignment.
>

The cost of this feature would be over-complication of the built-in dict
type when a subclass would do just as well

>
>
> Second, why is the factory function not called with key? There are three
> obvious kinds of "default values" a dict might want, in order of more-to-
> less general:
>
> (1) The default value depends on the key in some way: return factory(key)

I agree, this is a strange choice. However, nothing's stopping you from
being a bit verbose about what you want and just doing it:

class mydict(defaultdict):
def __missing__(self, key):
# ...

the __missing__ method is really the more useful bit the defaultdict
class adds, by the looks of it.

-- Thomas

> (2) The default value doesn't depend on the key: return factory()
> (3) The default value is a constant: return C
>
> defaultdict supports (2) and (3):
>
> defaultdict(factory, *args)
> defaultdict(lambda: C, *args)
>
> but it doesn't support (1). If key were passed to the factory function,
> it would be easy to support all three use-cases, at the cost of a
> slightly more complex factory function. E.g. the current idiom:
>
> defaultdict(factory, *args)
>
> would become:
>
> defaultdict(lambda key: factory(), *args)
>
>
> (There is a zeroth case as well, where the default value depends on the
> key and what else is in the dict: factory(d, key). But I suspect that's
> well and truly YAGNI territory.)
From: Chris Rebert on
On Fri, Jul 2, 2010 at 2:20 AM, Thomas Jollans <thomas(a)jollans.com> wrote:
> On 07/02/2010 06:11 AM, Steven D'Aprano wrote:
>> I would like to better understand some of the design choices made in
>> collections.defaultdict.
<snip>
>> Second, why is the factory function not called with key? There are three
>> obvious kinds of "default values" a dict might want, in order of more-to-
>> less general:
>>
>> (1) The default value depends on the key in some way: return factory(key)
>
> I agree, this is a strange choice. However, nothing's stopping you from
> being a bit verbose about what you want and just doing it:
>
> class mydict(defaultdict):
>    def __missing__(self, key):
>        # ...
>
> the __missing__ method is really the more useful bit the defaultdict
> class adds, by the looks of it.

Nitpick: You only need to subclass dict, not defaultdict, to use
__missing__(). See the part of the docs Raymond Hettinger quoted.

Cheers,
Chris