From: Tim Roberts on
kee chen <keekychen.shared(a)gmail.com> wrote:
>
>I have 2 lists stored in 2 text files may have duplicated records, the raw
>data looks like this:
>lfruit lcountry
>====== =========
>orange japan
>pear china
>orange china
>apple american
>cherry india
>lemon china
>lemon japan
>strawberry korea
>banana thailand
> australia
>basically, what I want is:
> 1. all of the duplicated records need to be removed and
> 2. the unique items need bind with an unique integer ID, something like a
>PK in database, no sort needed.
>but before you give answer here, pls also read below.

You need a database. What you're talking about here is exactly the kind of
thing that an SQL database can provide. Sqlite is simple and lightweight,
and can do your unique checks and your join without even breaking a sweat.
If you don't like that, there are pure Python SQL engines available that
are even simpler.

Why reinvent the whell? What you want already exists.
--
Tim Roberts, timr(a)probo.com
Providenza & Boekelheide, Inc.