From: kee chen on 28 Jun 2010 23:03
I have 2 lists stored in 2 text files may have duplicated records, the raw
data looks like this:
basically, what I want is:
1. all of the duplicated records need to be removed and
2. the unique items need bind with an unique integer ID, something like a
PK in database, no sort needed.
but before you give answer here, pls also read below.
1 orange 1 japan
2 pear 2 china
3 apple 3 american
4 cherry 4 india
5 lemon 5 taiwan
6 strawberry 6 korea
7 banana 7 thailand
Q1,the items in above lists may need to be added and deleted later, then how
to make the list easy to extend and how to make sure the items have a
sequenced, unique fixed, INTERGET type ID bind with those items?
Here is why I want an INTEGER ID not hash or uuid: the "uuid4" is not
working on my case because I want make that ID may transfer information in
low cost in a MCU protocol style later, I means the INTEGER ID used here
also as the binary stream position id in my protocol, take lfruit data here
for example, a bin stream 0111100 can with the meaning of lfruit items
exists or not.
Also, a combination of 2 lists may needed later to generate new list or
called matrix, also as above, an unique ID is also needed here:
lcombination = [lfruit] * [lcountry]
1 japan orange #(1,1)
2 japan pear #(1,2)
3 japan apple #(1,3)
4 japan cherry #(1,4)
5 japan lemon ...
6 japan strawberry ...
7 japan banana ...
8 china orange #(2,1)
9 china pear #(2,2)
Q2, because the lcombination come from the extendable items in lists, then
how to make sure the unique ID here also is always fixed and unique?
BTW: my original plan is to use dict or list as the runtime data container
and use sqlite as the storage also the assigee of the unique ID , however,
base on answer from
it may not just rely on sqlite ensure the unique ID assignee mechanism may
works, then I asks help here, any answer or comment will be highly
Prev: Lockless algorithms in python (Nothing to do with GIL)
Next: Pydev 1.5.8 Released