From: Georgios Petasis on
Hi all,

I am using a dict to keep the occurrence frequencies of some n-grams.
But how can I get the n most frequent n-grams?

How can I sort a dict based on values? (and not keys?)

George
From: Alexandre Ferrieux on
On Sep 21, 10:28 pm, Georgios Petasis <peta...(a)iit.demokritos.gr>
wrote:
>
> I am using a dict to keep the occurrence frequencies of some n-grams.
> But how can I get the n most frequent n-grams?

The simplest would be to sort the list of frequencies, which is O
(MlogM) where M is the number of different frequencies. If you want
the N largest and N<<M, it will be more efficient to keep and update a
sorted list of the N best (using the new lsearch -bisect), because
you'll get O(MlogN).

And of course, once you hve the N best frequencies, you crawl back to
the n-grams by the reverse dict/array, built with [lappend] (example
provided with a reverse array):

dict for {k v} $d {lappend rev($v) $k}

> How can I sort a dict based on values? (and not keys?)

Just project it onto a list. You'll get it for free, with duplicates
removed, by using the key list of the above reverse array:

set sorted_freqs [lsort -integer -decreasing [array get rev]]

-Alex
From: Georgios Petasis on
O/H Georgios Petasis ������:
> Hi all,
>
> I am using a dict to keep the occurrence frequencies of some n-grams.
> But how can I get the n most frequent n-grams?
>
> How can I sort a dict based on values? (and not keys?)
>
> George

lsort -stride will be useful, but not available in 8.5.
Any alternatives into using an intermediate array (with frequencies as
keys, and words as values)?

(I have implemented an alternative, I am looking for something faster :D )

George
From: Georgios Petasis on
O/H Alexandre Ferrieux έγραψε:
> On Sep 21, 10:28 pm, Georgios Petasis <peta...(a)iit.demokritos.gr>
> wrote:
>> I am using a dict to keep the occurrence frequencies of some n-grams.
>> But how can I get the n most frequent n-grams?
>
> The simplest would be to sort the list of frequencies, which is O
> (MlogM) where M is the number of different frequencies. If you want
> the N largest and N<<M, it will be more efficient to keep and update a
> sorted list of the N best (using the new lsearch -bisect), because
> you'll get O(MlogN).
>
> And of course, once you hve the N best frequencies, you crawl back to
> the n-grams by the reverse dict/array, built with [lappend] (example
> provided with a reverse array):
>
> dict for {k v} $d {lappend rev($v) $k}
>
>> How can I sort a dict based on values? (and not keys?)
>
> Just project it onto a list. You'll get it for free, with duplicates
> removed, by using the key list of the above reverse array:
>
> set sorted_freqs [lsort -integer -decreasing [array get rev]]
>
> -Alex

Dear Alex,

Thank you very much. This is what I also did.
But it would be much more simple if dict had an lsort command of its
own, with the ability to sort with keys or values :-)

George

## Sort the n-grams...
dict for {key val} $ngram {
lappend NGRAM($val) $key
}
set ngram [dict create]
set count 0
foreach freq [lsort -integer -decreasing [array names NGRAM]] {
foreach one [lsort -dictionary $NGRAM($freq)] {
dict set ngram $one $freq
if {[incr count] >= $topmost_ngrams} {return $ngram}
}
}
return $ngram
From: Alexandre Ferrieux on
On Sep 22, 12:28 am, Georgios Petasis <peta...(a)iit.demokritos.gr>
wrote:
>
> But it would be much more simple if dict had an lsort command of its
> own, with the ability to sort with keys or values :-)

Sorting on keys is what you get with [-stride 2], though in the
process you lose dict-ness (you shimmer to a list -- which may be OK
in most circumstances).

Sorting on values implies reverting the dict; so maybe a better
primitive would be [dict revert], but I'm not sure how much a C
implementation would gain over the Tcl version...

-Alex
 |  Next  |  Last
Pages: 1 2
Prev: Using a TEA package
Next: Strange embedded proc error