tallying occurrences in list [Python]

Prev: Street address parsing in Python, again.
Next: bz2 module doesn't work properly with all bz2 files

From: MRAB on 4 Jun 2010 14:50

kj wrote:
>
>
>
>
> Task: given a list, produce a tally of all the distinct items in
> the list (for some suitable notion of "distinct").
>
> Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
> 'c', 'a'], then the desired tally would look something like this:
>
> [('a', 4), ('b', 3), ('c', 3)]
>
> I find myself needing this simple operation so often that I wonder:
>
> 1. is there a standard name for it?
> 2. is there already a function to do it somewhere in the Python
> standard library?
>
> Granted, as long as the list consists only of items that can be
> used as dictionary keys (and Python's equality test for hashkeys
> agrees with the desired notion of "distinctness" for the tallying),
> then the following does the job passably well:
>
> def tally(c):
> t = dict()
> for x in c:
> t[x] = t.get(x, 0) + 1
> return sorted(t.items(), key=lambda x: (-x[1], x[0]))
>
> But, of course, if a standard library solution exists it would be
> preferable. Otherwise I either cut-and-paste the above every time
> I need it, or I create a module just for it. (I don't like either
> of these, though I suppose that the latter is much better than the
> former.)
>
> So anyway, I thought I'd ask. :)
>
In Python 3 there's the 'Counter' class in the 'collections' module.
It'll also be in Python 2.7.

For earlier versions there's this:

http://code.activestate.com/recipes/576611/

From: Lie Ryan on 4 Jun 2010 15:56

On 06/05/10 04:38, Magdoll wrote:
> On Jun 4, 11:33 am, Peter Otten <__pete...(a)web.de> wrote:
>> kj wrote:
>>
>>> Task: given a list, produce a tally of all the distinct items in
>>> the list (for some suitable notion of "distinct").
>>
>>> Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
>>> 'c', 'a'], then the desired tally would look something like this:
>>
>>> [('a', 4), ('b', 3), ('c', 3)]
>>
>>> I find myself needing this simple operation so often that I wonder:
>>
>>> 1. is there a standard name for it?
>>> 2. is there already a function to do it somewhere in the Python
>>> standard library?
>>
>>> Granted, as long as the list consists only of items that can be
>>> used as dictionary keys (and Python's equality test for hashkeys
>>> agrees with the desired notion of "distinctness" for the tallying),
>>> then the following does the job passably well:
>>
>>> def tally(c):
>>> t = dict()
>>> for x in c:
>>> t[x] = t.get(x, 0) + 1
>>> return sorted(t.items(), key=lambda x: (-x[1], x[0]))
>>
>>> But, of course, if a standard library solution exists it would be
>>> preferable. Otherwise I either cut-and-paste the above every time
>>> I need it, or I create a module just for it. (I don't like either
>>> of these, though I suppose that the latter is much better than the
>>> former.)
>>
>>> So anyway, I thought I'd ask. :)
>>
>> Python 3.1 has, and 2.7 will have collections.Counter:
>>
>>>>> from collections import Counter
>>>>> c = Counter("abcabcabca")
>>>>> c.most_common()
>>
>> [('a', 4), ('c', 3), ('b', 3)]
>>
>> Peter
>
>
> Thanks Peter, I think you just answered my post :)

If you're using previous versions (2.4 and onwards) then:

[(o, len(list(g))) for o, g in itertools.groupby(sorted(myList))]

From: kj on 4 Jun 2010 16:52

Thank you all!

~K

From: Sreenivas Reddy Thatiparthy on 5 Jun 2010 13:55

On Jun 4, 11:14 am, kj <no.em...(a)please.post> wrote:
> Task: given a list, produce a tally of all the distinct items in
> the list (for some suitable notion of "distinct").
>
> Example: if the list is ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b',
> 'c', 'a'], then the desired tally would look something like this:
>
> [('a', 4), ('b', 3), ('c', 3)]
>
> I find myself needing this simple operation so often that I wonder:
>
> 1. is there a standard name for it?
> 2. is there already a function to do it somewhere in the Python
> standard library?
>
> Granted, as long as the list consists only of items that can be
> used as dictionary keys (and Python's equality test for hashkeys
> agrees with the desired notion of "distinctness" for the tallying),
> then the following does the job passably well:
>
> def tally(c):
> t = dict()
> for x in c:
> t[x] = t.get(x, 0) + 1
> return sorted(t.items(), key=lambda x: (-x[1], x[0]))
>
> But, of course, if a standard library solution exists it would be
> preferable. Otherwise I either cut-and-paste the above every time
> I need it, or I create a module just for it. (I don't like either
> of these, though I suppose that the latter is much better than the
> former.)
>
> So anyway, I thought I'd ask. :)
>
> ~K

How about this one liner, if you prefer them;
set([(k,yourList.count(k)) for k in yourList])

From: Paul Rubin on 5 Jun 2010 14:00

Sreenivas Reddy Thatiparthy <thatiparthysreenivas(a)gmail.com> writes:
> How about this one liner, if you prefer them;
> set([(k,yourList.count(k)) for k in yourList])

That has a rather bad efficiency problem if the list is large.

First | Prev |
Pages: 1 2
Prev: Street address parsing in Python, again.
Next: bz2 module doesn't work properly with all bz2 files