From: Vlastimil Brom on
Hi all,
I'd like to ask about the possibility of negative "counts" in
collections.Counter (using Python 3.1);
I believe, my usecase is rather trivial, basically I have the word
frequencies of two texts and I want do compare them (e.g. to see what
was added and removed between different versions of a text).

This is simple enough to do with own code, but I thought, this would
be exactly the case for Counter...
However, as the Counter only returns positive counts, one has to get
the difference in both directions and combine them afterwards, maybe
something like:

>>> c1=collections.Counter("aabcddd")
>>> c2=collections.Counter("abbbd")
>>> added_c2 = c2-c1
>>> removed_c2 = c1-c2
>>> negative_added_c2 = dict((k, v*-1) for (k, v) in removed_c2.items())
>>> changed_c2 = dict(added_c2)
>>> changed_c2.update(negative_added_c2)
>>> changed_c2
{'a': -1, 'c': -1, 'b': 2, 'd': -2}
>>>

It seems to me, that with negative counts allowed in Counter, this
would simply be the matter of a single difference:
changed_c2 = c2 - c1

Is there a possibility to make the Counter work this way (other than
replacing its methods in a subclass, which might be comparable to
writing the naive counting class itself)?
Are there maybe some reasons I missed to disable negative counts here?
(As I could roughly understand, the Counter isn't quite limited to the
mathematical notion of multiset; it seems to accept negative counts,
but its methods only output the positive part).
Is this kind of task - a comparison in both directions - an unusual
one, or is it simply not the case for Counter?

Thanks in advance,
vbr
From: Arnaud Delobelle on
Vlastimil Brom <vlastimil.brom(a)gmail.com> writes:

> Hi all,
> I'd like to ask about the possibility of negative "counts" in
> collections.Counter (using Python 3.1);
> I believe, my usecase is rather trivial, basically I have the word
> frequencies of two texts and I want do compare them (e.g. to see what
> was added and removed between different versions of a text).
>
> This is simple enough to do with own code, but I thought, this would
> be exactly the case for Counter...
> However, as the Counter only returns positive counts, one has to get
> the difference in both directions and combine them afterwards, maybe
> something like:
>
>>>> c1=collections.Counter("aabcddd")
>>>> c2=collections.Counter("abbbd")
>>>> added_c2 = c2-c1
>>>> removed_c2 = c1-c2
>>>> negative_added_c2 = dict((k, v*-1) for (k, v) in removed_c2.items())
>>>> changed_c2 = dict(added_c2)
>>>> changed_c2.update(negative_added_c2)
>>>> changed_c2
> {'a': -1, 'c': -1, 'b': 2, 'd': -2}
>>>>
>
> It seems to me, that with negative counts allowed in Counter, this
> would simply be the matter of a single difference:
> changed_c2 = c2 - c1
>
> Is there a possibility to make the Counter work this way (other than
> replacing its methods in a subclass, which might be comparable to
> writing the naive counting class itself)?
> Are there maybe some reasons I missed to disable negative counts here?
> (As I could roughly understand, the Counter isn't quite limited to the
> mathematical notion of multiset; it seems to accept negative counts,
> but its methods only output the positive part).
> Is this kind of task - a comparison in both directions - an unusual
> one, or is it simply not the case for Counter?

Every time I have needed something like collections.Counter, I have
wanted the behaviour you require too. As a result, I have never used
collections.Counter. Instead I have used plain dictionaries or my own
class.

I don't understand why the Counter's + and - operators behave as they
do. Here is an example from the docs:

>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d # add two counters together: c[x] + d[x]
Counter({'a': 4, 'b': 3})
>>> c - d # subtract (keeping only positive counts)
Counter({'a': 2})
>>> c & d # intersection: min(c[x], d[x])
Counter({'a': 1, 'b': 1})
>>> c | d # union: max(c[x], d[x])
Counter({'a': 3, 'b': 2})

If + and - just added or subtracted the multiplicities of elements,
keeping negative multiplicites, we would get:

>>> c - d
Counter({'a':2, 'b':-1})

Which I think is useful in many cases. But we could still get the
result of current c - d very simply:

>>> (c - d) | Counter() # | Counter() removes negative multiplicities
Counter({'a':2})

Altogether more versatile and coherent IMHO.

--
Arnaud
From: Steven D'Aprano on
On Sun, 07 Mar 2010 22:21:27 +0000, Arnaud Delobelle wrote:

> Vlastimil Brom <vlastimil.brom(a)gmail.com> writes:
>
>> Hi all,
>> I'd like to ask about the possibility of negative "counts" in
>> collections.Counter (using Python 3.1); I believe, my usecase is rather
>> trivial, basically I have the word frequencies of two texts and I want
>> do compare them (e.g. to see what was added and removed between
>> different versions of a text).

[...]

> Every time I have needed something like collections.Counter, I have
> wanted the behaviour you require too. As a result, I have never used
> collections.Counter. Instead I have used plain dictionaries or my own
> class.

Vlastimil and Arnaud,

It looks like Counter already has negative counts in Python 3.1:

>>> import collections
>>> c = collections.Counter({'red': 2, 'green': 0, 'blue': -5})
>>> c['blue'] -= 1
>>> c
Counter({'red': 2, 'green': 0, 'blue': -6})
>>> c['blue'] += 1
>>> c
Counter({'red': 2, 'green': 0, 'blue': -5})

But the + and - operators destroy negative and zero counts:

>>> c + collections.Counter({'blue': 1})
Counter({'red': 2})

I can't imagine what the use-case for that behaviour is.

Given that Counter supports negative counts, it looks to me that the
behaviour of __add__ and __sub__ is fundamentally flawed. You should
raise a bug report (feature enhancement) on the bug tracker.

http://bugs.python.org/

Given that this will change the documented behaviour, it will help if you
give a short, simple idiom for removing zero and negative elements,
Arnaud's trick with | Counter().

When you raise the report, please post an update here.



--
Steven
From: Raymond Hettinger on
On Mar 7, 5:46 pm, Steven D'Aprano <st...(a)REMOVE-THIS-
cybersource.com.au> wrote:
> Given that Counter supports negative counts, it looks to me that the
> behaviour of __add__ and __sub__ is fundamentally flawed. You should
> raise a bug report (feature enhancement) on the bug tracker.

It isn't a bug. I designed it that way.
There were several possible design choices,
each benefitting different use cases.

FWIW, here is the reasoning behind the design.

The basic approach to Counter() is to be a dict subclass
that supplies zero for missing values. This approach
places almost no restrictions on what can be stored in it.
You can store floats, decimals, fractions, etc.
Numbers can be positive, negative, or zero.

This design leaves users with a good deal of flexibility,
but more importantly it keeps the API very close to
regular dictionaries (thus lowering the learning curve).
It also makes basic counter operations very fast.

The update() method differs from regular dictionaries
because dict.update() isn't very helpful in a counter context.
Instead, it allows one counter to update another. Like
the other basic counter operations, it places no restrictions
on type (i.e. it works on with ints, floats, decimals, fractions,
etc).
or on sign (positive, negative, or zero). The only requirement
is that the values support addition.

The basic API also adds some convenience methods.
The most_common() method tries to not be restrictive.
It only requires the count values be orderable.

For obvious reasons, the elements() method *does* have
restrictions and won't work with float values and won't
emit entries with non-positive counts.

Beyond the basic API listed above which is straight-forward,
the area where there were more interesting design choices
were the Counter-to-Counter operations (__add__, __sub__,
__or__, and __and__).

One possible choice (the one preferred by the OP) was to
has addition and subtraction be straight adds and subtracts
without respect to sign and to not support __and__ and __or__.
Straight addition was already supported via the update() method.
But no direct support was provided for straight subtractions
that leave negative values. Sorry about that.

Instead the choice was to implement the four methods as
multiset operations. As such, they need to correspond
to regular set operations. For example, with sets:

set('abc') - set('cde') # gives set('ab')

Notice how subtracting 'e' did not produce a negative result?
Now with multisets, we want the same result:

Counter({'a':1, 'b':1, 'c':1'}) - Counter({'c':1, 'd':1, 'e':1})

So the design decision was to support multiset operations
that produces only results with positive counts.

These multiset-style mathematical operations are discussed in:
Knuth's TAOCP Volume II section 4.6.3 exercise 19
and at http://en.wikipedia.org/wiki/Multiset .
Also C++ has multisets that support only positive counts.

So, there you have it. There design tries to be as
unrestrictive as possible. When a choice had to be
made, it favored multisets over behaviors supporting
negative values.

Fortunately, it is trivially easy to extend the class
to add any behavior you want.

class MyCounter(Counter):
def __sub__(self, other):
result = self.copy()
for elem, cnt in other.items():
result[elem] -= cnt

Hopes this gives you some insight into the design choices.


Raymond Hettinger

From: Vlastimil Brom on
2010/3/8 Raymond Hettinger <python(a)rcn.com>:
> On Mar 7, 5:46 pm, Steven D'Aprano <st...(a)REMOVE-THIS-
> cybersource.com.au> wrote:
>> Given that Counter supports negative counts, it looks to me that the
>> behaviour of __add__ and __sub__ is fundamentally flawed. You should
>> raise a bug report (feature enhancement) on the bug tracker.
>
> It isn't a bug.  I designed it that way.
> There were several possible design choices,
> each benefitting different use cases.
>...
> One possible choice (the one preferred by the OP) was to
> has addition and subtraction be straight adds and subtracts
> without respect to sign and to not support __and__ and __or__.
> Straight addition was already supported via the update() method.
> But no direct support was provided for straight subtractions
> that leave negative values.  Sorry about that.
>
> Instead the choice was to implement the four methods as
> multiset operations.  As such, they need to correspond
> to regular set operations.  For example, with sets:
>
>...
> Hopes this gives you some insight into the design choices.
>
>
> Raymond Hettinger
>
Thank you very much for the exhaustive explanation Raymond!
I very much suspected, there would be some exact reasoning behind it,
as these negative counts are explicitly handled in the code of these
methods.
I just happened to expect the straightforward addition and subtraction
and possibly the zero truncation or an exception in incompatible cases
(like elements() with floats, currently). This way also the negative
part would be available and the truncation possible.
I am by far not able to follow all of the mathematical background, but
even for zero-truncating multiset, I would expect the truncation on
input rather than on output of some operations. E.g. the following
seems surprising to me in context of the discarded negatives by
addition or subtraction:

>>> c1=collections.Counter({'a': -11, 'b': -8})
>>> c2=collections.Counter({'a': 3})
>>> c2-c1
Counter({'a': 14, 'b': 8})
>>>

Probably a kind of negative_update() or some better named method will
be handy, like the one you supplied or simply the current module code
without the newcount > 0: ... condition. Or would it be an option to
have a keyword argument like zero_truncate=False which would influence
this behaviour? Or is some other method than elements() incompatible
with negative counts?
(I actually can imagine comparing negative counts on both sides. e.g.
in a secondary comparison of two wordlists with specific removed items
- comparing to the "master" list.)
Additionally, were issubset and issuperset considered for this
interface (not sure whether symmetric_difference would be applicable)?

Anyway, I recognise, that I can easily use a custom class for these
tasks, if these usecases are rare or non-standard for this general
collection object.
Thanks again,
vbr
 |  Next  |  Last
Pages: 1 2 3
Prev: click me
Next: killing own process in windows