From: Ed Morton on
Grant wrote:
<snip>
> # check got query
> [ -z "$1" ] && echo "
> ccfind -- lookup country code and name for IP address
> usage $0 aa.bb.cc.dd
> " && exit
>
> # get server listen port
> port=$(gawk '/^inetport/ {print $2}' /etc/ip2cn-server.conf)
>
> # make query, may be dotquad or numeric (decimal) IP address
> echo "$@" | gawk -v port=$port '
> BEGIN { service = "/inet/tcp/0/localhost/" port }
> $1 == "0" { $1 = "0." }
> { print |& service; service |& getline; print }' 2>/dev/null
>
> # end

The above could just be a single awk script:

gawk -v cmd="$0" -v ip="$1" '
/^inetport/ { port=$2 }
END {
if (ip == "") {
print cmd,"-- lookup country code and name for IP
address\n\tusage",cmd,"aa.bb.cc.dd" | "cat>&2"
} else {
ip = (ip == "0" ? "0." : ip)
service = "/inet/tcp/0/localhost/" port
print ip |& service
close(service,"to")
if ( (service |& getline cc) > 0) {
print cc
} else {
print cmd,"failed to access",service | "cat>&2"
}
close(service)
}
}
' /etc/ip2cn-server.conf

I made the co-process and getline parts more robust too.

Ed.
From: Grant on
On Wed, 21 Oct 2009 02:54:23 -0500, Ed Morton <mortonspam(a)gmail.com> wrote:

>Grant wrote:
>> On Wed, 21 Oct 2009 13:18:47 +0800, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote:
>>
>>> On Wed, 21 Oct 2009 07:48:18 +1100, Grant
>>> <g_r_a_n_t_(a)bugsplatter.id.au> wrote:
>>>
>>>>>>> ? ? nr = ipA[1] * 1000000000 + ipA[2] * 1000000 + ipA[3] * 1000 + ipA[4]
>>>> The weighting for converting dotquad IP to a number is 256, not
>>>> 1000 -- using 1000 will skip IP addresses in your range matching.
>>>>
>>>> Try
>>>> nr = ipA[1] * 2^24 + ipA[2] * 2^16 + ipA[3] * 2^8 + ipA[4]
>>>>
>>>> or
>>>> nr = ((ipA[1] * 256 + ipA[2]) * 256 + ipA[3]) * 256 + ipA[4]
>>>>
>>>> instead -- the second version is speed optimised for gawk.
>>> I've tried all of the above three expressions for _nr_, and I _always_
>>> get the same results. Could you please give some example to support
>>> your point of view?
>>
>> grant(a)deltree:~$ echo 123.123.123.123 > dotquad
>>
>> grant(a)deltree:~$ awk '{split($1,a,".");ip=((a[1]*256+a[2])*256+a[3])*256+a[4];\
>> xx=((a[1]*1000+a[2])*1000+a[3])*1000+a[4];print $1, ip, xx}' dotquad
>> 123.123.123.123 2071690107 123123123123
>>
>> grant(a)deltree:~$ ccfind 123.123.123.123
>> 123.123.123.123 CN:China
>>
>> grant(a)deltree:~$ ccfind 2071690107
>> 123.123.123.123 CN:China
>>
>> grant(a)deltree:~$ ccfind 123123123123
>> (bad query)
>
>I expect you're right and that multiplying by 256 does produce a
>"better" representation of the IP address as a decimal number, but can
>you think of an example where the range check Hongyi cares about would
>fail if we used 1000 instead of 256 as the multiplier?

I've done a fair bit of work with ipv4 address space, and without
thinking too much I introduce my own bias on how to treat the numbers
-- it's a bit like packed bcd vs binary? If you're in one or other
interpretation it matters little as long as you are consistent.

Many IPv4 utilities work with the IP addr as a single 32bit unsigned
integer rather than the dot-quad. If you work with the 1000 multiplier
you get gaps in the resulting numbers, test by counting through a
'dot' boundary:

0.0.0.254 254
0.0.0.255 255
0.0.1.0 256
0.0.1.1 257

So therefore the unsigned 32 bit word is a better representation of
an IP addr when one starts searching to discover if a particular IP
is between arbitrary low and hi addr. Maybe it's me, either method
might work for the OP's requirement, my preferred method gives correct
answers to finding unallocated blocks as well, so I view it as being
more sturdy for all aspects of IPv4 work.

I had problems a few years back getting this stuff to work reliably,
so I guess I'm biased to the method that proves reliable for me.

But issues like correct binary searching[1] and mention of gaps makes
me think I've been through issues like the OP's before and nailed them.

[1] Binary search algorithm from Tim Bray's site, more information
here: http://www.tbray.org/ongoing/When/200x/2003/03/22/Binary

Grant.
--
http://bugsplatter.id.au
From: Hongyi Zhao on
On Wed, 21 Oct 2009 02:47:54 -0500, Ed Morton <mortonspam(a)gmail.com>
wrote:

>to see if it produces any difference in the output from your real, large
>input files. If not, I'd go with 256 as the scale. If it does, think
>about it and decide which is correct.

I've tested it by using a large file1 including 2160 non-duplicate IP
addressed and a huge file2 which inluding a complete IP location
database, more accurate to say, 373374 non-overlap IP blocks. I
obtain the same results, i.e., o1 is exactly the same as o2.

So I should use 256 as the scale.

Thanks a lot for all the helps here.

Best regards.
--
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
From: Grant on
On Wed, 21 Oct 2009 03:36:37 -0500, Ed Morton <mortonspam(a)gmail.com> wrote:

>Grant wrote:
><snip>
>> # check got query
>> [ -z "$1" ] && echo "
>> ccfind -- lookup country code and name for IP address
>> usage $0 aa.bb.cc.dd
>> " && exit
>>
>> # get server listen port
>> port=$(gawk '/^inetport/ {print $2}' /etc/ip2cn-server.conf)
>>
>> # make query, may be dotquad or numeric (decimal) IP address
>> echo "$@" | gawk -v port=$port '
>> BEGIN { service = "/inet/tcp/0/localhost/" port }
>> $1 == "0" { $1 = "0." }
>> { print |& service; service |& getline; print }' 2>/dev/null
>>
>> # end
>
>The above could just be a single awk script:
>
>gawk -v cmd="$0" -v ip="$1" '
>/^inetport/ { port=$2 }
>END {
> if (ip == "") {
> print cmd,"-- lookup country code and name for IP
>address\n\tusage",cmd,"aa.bb.cc.dd" | "cat>&2"
> } else {
> ip = (ip == "0" ? "0." : ip)
> service = "/inet/tcp/0/localhost/" port
> print ip |& service
> close(service,"to")
> if ( (service |& getline cc) > 0) {
> print cc
> } else {
> print cmd,"failed to access",service | "cat>&2"
> }
> close(service)
> }
>}
>' /etc/ip2cn-server.conf
>
>I made the co-process and getline parts more robust too.
>
> Ed.

Well thank you :) But there's no point trying to be nice about
not connecting to the server:

grant(a)deltree:~$ cat ccf
#!/bin/bash
#
gawk -v cmd="$0" -v ip="$1" '
/^inetport/ { port=$2 }
END {
if (ip == "") {
print cmd, \
"-- lookup country code and name for IP address\n\tusage:", \
cmd, "aa.bb.cc.dd" | "cat>&2"
}
else {
ip = (ip == "0" ? "0." : ip)
service = "/inet/tcp/0/localhost/" port
print ip |& service
close(service, "to")
if ( (service |& getline cc) > 0) {
print cc
}
else {
print cmd,"failed to access",service | "cat>&2"
}
close(service)
}
}
' /etc/ip2cn-server.conf
#
grant(a)deltree:~$ ./ccf 123.123.123.123
123.123.123.123 CN:China
grant(a)deltree:~$ ./ccf 123.2.77.8
123.2.77.8 AU:Australia
grant(a)deltree:~$ ./ccf
../ccf -- lookup country code and name for IP address
usage: ./ccf aa.bb.cc.dd
grant(a)deltree:~$ sudo /etc/rc.d/rc.ip2cn-server stop
grant(a)deltree:~$ ./ccf 123.2.77.8
gawk: cmd. line:12: (FILENAME=/etc/ip2cn-server.conf FNR=20) fatal: can't open two way socket `/inet/tcp/0/localhost/4743' for input/output (No such file or directory)

That's why I dumped server error to /dev/null ;)

For some reason I remember the 'classic' half close after query, and
after the transaction end did something wrong, but this worked as
expected -- oh well, I can't remember details of why I did it like
that last year -- perhaps it's what happened when porting from one
method to another? Or, 40 ways to kill a cat :)

Grant.
--
http://bugsplatter.id.au
From: Hongyi Zhao on
On Wed, 21 Oct 2009 02:47:54 -0500, Ed Morton <mortonspam(a)gmail.com>
wrote:

>$ cat tst.awk
>BEGIN{ FS="\t"; OFS="#"; scale=(scale ? scale : 256) }
>function ip2nr(ip, nr,ipA) {
> # aaa.bbb.ccc.ddd
> split(ip,ipA,".")
> nr = (((((ipA[1] * scale) + ipA[2]) * scale) + ipA[3]) * scale) +
>ipA[4]
> return nr
>}
>NR==FNR { addrs[$0] = ip2nr($0); next }
>FNR>1 {
> start = ip2nr($1)
> end = ip2nr($2)
> for (ip in addrs) {
> if ((addrs[ip] >= start) && (addrs[ip] <= end)) {
> print ip,$3" "$4
> delete addrs[ip]
> }
> }
>}

Another issue is that: if the file1 is a huge one, say including
several thousands entries in it, the above process will be time
consuming. So
is it possible to revise the above awk script with multithread
technology
to improve the efficiency?

Thanks in advance.
--
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.