From: Grant on
On Sun, 25 Oct 2009 09:56:52 +0800, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote:

>On Wed, 21 Oct 2009 02:47:54 -0500, Ed Morton <mortonspam(a)gmail.com>
>wrote:
>
>>$ cat tst.awk
>>BEGIN{ FS="\t"; OFS="#"; scale=(scale ? scale : 256) }
>>function ip2nr(ip, nr,ipA) {
>> # aaa.bbb.ccc.ddd
>> split(ip,ipA,".")
>> nr = (((((ipA[1] * scale) + ipA[2]) * scale) + ipA[3]) * scale) +
>>ipA[4]
>> return nr
>>}
>>NR==FNR { addrs[$0] = ip2nr($0); next }
>>FNR>1 {
>> start = ip2nr($1)
>> end = ip2nr($2)
>> for (ip in addrs) {
>> if ((addrs[ip] >= start) && (addrs[ip] <= end)) {
>> print ip,$3" "$4
>> delete addrs[ip]
>> }
>> }
>>}
>
>Another issue is that: if the file1 is a huge one, say including
>several thousands entries in it, the above process will be time
>consuming. So
>is it possible to revise the above awk script with multithread
>technology
>to improve the efficiency?

I already posted a more efficient method, read database file into
memory, then binary search each IP to find matching block and
retrieve the database name string.

Grant.
--
http://bugsplatter.id.au
From: Kaz Kylheku on
On 2009-10-25, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote:
> Another issue is that: if the file1 is a huge one, say including
> several thousands entries in it, the above process will be time
> consuming. So
> is it possible to revise the above awk script with multithread
> technology
> to improve the efficiency?

What kind of machine are you using that a file of a mere several thousand
entries causes a performance problem, yet threads somehow help?
From: Grant on
On Sun, 25 Oct 2009 05:04:43 +0000 (UTC), Kaz Kylheku <kkylheku(a)gmail.com> wrote:

>On 2009-10-25, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote:
>> Another issue is that: if the file1 is a huge one, say including
>> several thousands entries in it, the above process will be time
>> consuming. So
>> is it possible to revise the above awk script with multithread
>> technology
>> to improve the efficiency?
>
>What kind of machine are you using that a file of a mere several thousand
>entries causes a performance problem, yet threads somehow help?

Dunno about threads being applicable, but his lookup file has
300k records -- no point re-reading that for each of the thousand
input lines, which is what Ed's solution does.

Grant.
--
http://bugsplatter.id.au
From: Hongyi Zhao on
On Sun, 25 Oct 2009 15:57:47 +1100, Grant
<g_r_a_n_t_(a)bugsplatter.id.au> wrote:

>I already posted a more efficient method, read database file into
>memory, then binary search each IP to find matching block and
>retrieve the database name string.

Where have you posted, on your personal webpage or blog? Any hints
on the corresponding url?

Best regards.
--
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
From: Hongyi Zhao on
On Sun, 25 Oct 2009 05:04:43 +0000 (UTC), Kaz Kylheku
<kkylheku(a)gmail.com> wrote:

>What kind of machine are you using that a file of a mere several thousand
>entries causes a performance problem, yet threads somehow help?

The lookup IP database used by on is also a huge one (including 373374
line in it). Furthermore, I use this script under cygwin box.

Best regards.
--
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.