From: Grant on
On Sun, 25 Oct 2009 22:45:29 -0500, Ed Morton <mortonspam(a)gmail.com> wrote:

....
>> # load lookup table
>> NR==FNR {
>> start[++range] = ip2nr($1) # using one-based array
>
>Yes, but it'll store the titles in start[1] which is undesirable. That's
>why I incremented range after the assignments.

No, it's a one-based array so we can detect IP addr lower than the
first database IP block start.

See the working code (I think, hard to tell 'cos of the data text
locale).

Grant.
--
http://bugsplatter.id.au
From: Grant on
On Mon, 26 Oct 2009 13:35:08 +0800, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote:

>On Mon, 26 Oct 2009 15:08:37 +1100, Grant
....
>Another issue: it takes 6 or 7 seconds for you, while 37 seconds for
>me, to read 373375 or 373374 records, why?

Because I tested it on the fastest Slackware box on localnet here,
it has a Core2Duo CPU :)

Okay, here we preprocess the data to numeric addresses, on a slow
machine here data table load went from 39 seconds down to 16 seconds

grant(a)deltree:/home/common$ cat yyy
#!/usr/bin/gawk -f
#
# script to massage database file to speed loading
#
# run as ./yyy input_file > output_file
#
function ip2nr(ip, k)
{
# aaa.bbb.ccc.ddd
split(ip, k, ".")
return ((k[1] * 256 + k[2]) * 256 + k[3]) * 256 + k[4]
}
/^#/ { print; next } # print comment lines (header info)

/^$/ { next } # skip blanks
{
# convert dotquad IP to numeric IP
printf "%s\t%s\t%s\n", ip2nr($1), ip2nr($2), $3" "$4
}
# end

grant(a)deltree:/home/common$ head -5 datafile-nr
# StartIP EndIP Country Local
0 16777215 IANA CZ88.NET
16777216 20185087 IANA CZ88.NET
20185088 20250623 ÃÀ¹ú CZ88.NET
20250624 26869759 IANA CZ88.NET

grant(a)deltree:/home/common$ ./xxx2 datafile-nr file1
Read 373375 records in 16 seconds.
28.232.110.16 Ó¢¹ú ½£ÇÅ´óѧ
28.31.0.34 ÃÀ¹ú ÂíÈøÖîÈûÖÝÃ׵¶ûÈû¿Ë˹Ïؽ£ÇÅÊÐÂéÊ¡Àí¹¤Ñ§Ôº
28.6.224.103 ÃÀ¹ú ÂÞ¸ñ˹´óѧ
28.83.194.97 ÃÀ¹ú µÂ¿ËÈø˹´óѧ°Ä˹͡·ÖУ
28.83.194.98 ÃÀ¹ú µÂ¿ËÈø˹´óѧ°Ä˹͡·ÖУ
29.133.8.31 ÈðÊ¿ CZ88.NET
29.21.126.99 ÃÀ¹ú Rochester¿Æ¼¼Ñ§Ôº
129.25.11.27 ÃÀ¹ú Drexel
....
8.251.7.53 ÃÀ¹ú ÂíÈøÖîÈûÖÝÃ׵¶ûÈû¿Ë˹Ïؽ£ÇÅÊÐÂéÊ¡Àí¹¤Ñ§Ôº
95.251.249.86 Èðµä CZ88.NET
5.110.229.102 ¼ÓÄôó Î÷Ãɸ¥À×Ôó´óѧ

grant(a)deltree:/home/common$ cat xxx2
#!/usr/bin/gawk -f
#
# script to process IP addr
#
# run as $0 ip-block-2-name-table IP-addr-file
#
BEGIN {
FS = "\t"
format = "%-20s %s\n"
started = systime()
}
function nr2ip(nr, j, k)
{
for (j = 4; j > 0; j--) { k[j] = and(nr, 255); nr /= 256 }
return sprintf("%d.%d.%d.%d", k[1], k[2], k[3], k[4])
}
# show data read progress
NR==FNR && NR % 357 == 0 { printf "\rReading %d", NR }

# skip comment or blank lines
/^#|^$/ { next }

# load lookup table
NR==FNR {
# reading datafile with numeric IP block start + end addr
start[++range] = $1
end[range] = $2
name[range] = $3" "$4
next
}

# show data records read
NR!=FNR && FNR == 1 {
printf "\rRead %d records in %d seconds.\n", NR - 1, \
systime() - started
}
# process IPs from second file
{
a = ip2nr($0); lo = 1; hi = range

# binary search
while (hi - lo > 1) {
mid = int((lo + hi) / 2)
if (start[mid] < a) {
lo = mid
}
else {
hi = mid
}
}

# adjust to closest less than when no exact match (likely)
if (a < start[hi]) { --hi }

# skip if IP undefined
if (a > end[hi]) { next }

printf format, nr2ip(a), name[hi]
}
# end

Now add datafile header lines starting with '#' so they're ignored,
then you have self-documenting data files.

Grant.
--
http://bugsplatter.id.au
From: Hongyi Zhao on
On Mon, 26 Oct 2009 17:22:58 +1100, Grant
<g_r_a_n_t_(a)bugsplatter.id.au> wrote:

>Because I tested it on the fastest Slackware box on localnet here,
>it has a Core2Duo CPU :)

But I use a Core i7 920 CPU with 6Gb memory to do the above test under
cygwin :)

>
>Okay, here we preprocess the data to numeric addresses, on a slow
>machine here data table load went from 39 seconds down to 16 seconds

[snipped]

Wonderful work. Thanks a lot. This is a key step towards the
practical application for huge IP lookup files and IP addresses.

Best regards.
--
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
From: Grant on
On Mon, 26 Oct 2009 15:41:38 +0800, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote:

>On Mon, 26 Oct 2009 17:22:58 +1100, Grant
><g_r_a_n_t_(a)bugsplatter.id.au> wrote:
>
>>Because I tested it on the fastest Slackware box on localnet here,
>>it has a Core2Duo CPU :)
>
>But I use a Core i7 920 CPU with 6Gb memory to do the above test under
>cygwin :)

You need a real OS --> linux, unix, *BSD, lots to choose from. I'm
on win7 desktop using PuTTY terminals to the linux boxes here --
>
>>
>>Okay, here we preprocess the data to numeric addresses, on a slow
>>machine here data table load went from 39 seconds down to 16 seconds
>
>[snipped]
>
>Wonderful work. Thanks a lot. This is a key step towards the
>practical application for huge IP lookup files and IP addresses.

Not under cygwin :)

Grant.
--
http://bugsplatter.id.au
From: Hongyi Zhao on
On Mon, 26 Oct 2009 18:58:11 +1100, Grant
<g_r_a_n_t_(a)bugsplatter.id.au> wrote:

>Not under cygwin :)

Thanks for your suggestion.

Another issue: in the private mail I've sent to you, I told you that
the ip lookup table used here is extracted from the following binary
file:

http://update.cz88.net/soft/qqwry.rar

In the bottom of the following url, there is a demo code to read IP
informations from the above binary file (this webpage is in Chinese,
but of course the demo code itself is in English):

http://lumaqq.linuxsir.org/article/qqwry_format_detail.html

So I want to konw is it possible to read the above binary file
directly to note the specific IP addresses.

Thanks in advance.
--
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.