From: Hongyi Zhao on
Hi all,

I've the following file which includes three fields in each line:

"IP_ADDRESS" "ISP_NAME" "DOMAIN_NAME"
"109.86.226.38" "-" "-"
"117.18.75.235" "SUNNYVISION LIMITED" "SUNNYVISIONDATACENTRE.COM"
"119.11.13.169" "-" "-"
"119.11.42.164" "-" "-"
"121.44.240.31" "INTERNET SERVICE PROVIDER" "ON.NET"
"122.155.3.145" "CAT TELECOM PUBLIC COMPANY LTD" "-"
"140.109.17.180" "T-SINICA.EDU.TW-NET" "-"
"145.100.100.190" "UVA-MASTER-SNE-NET" "-"
"149.9.0.57" "PSI" "BNA.COM"
"149.9.0.58" "PSI" "BNA.COM"
"149.9.0.59" "PSI" "BNA.COM"
"151.15.8.46" "ITALIA ONLINE S.P.A" "15-151.IOL.IT"
"151.16.191.218" "IUNET-BNET" "38-151.NET24.IT"
"151.21.86.208" "FREE INTERNET DIAL-UP SERVICES" "21-151.LIBERO.IT"
"151.23.7.196" "ITALIA ONLINE S.P.A" "PPP-POOL-23-0-10.IOL.IT"
"151.48.43.174" "IUNET-BNET" "48-151.NET24.IT"
"151.53.80.237" "IUNET-BNET" "38-151.NET24.IT"
"151.54.214.97" "IUNET-BNET" "38-151.NET24.IT"

Now, I want to delete some records from this file based on "ISP_NAME"
or "DOMAIN_NAME". I describe the details of my requirements as
follows:

1- If a record's "ISP_NAME" and "DOMAIN_NAME" fields are "-", delete
it from the file.

2- Based on the given IP_ADDRESS, say, 151.48.43.174, delete the
records which have the same "ISP_NAME" or "DOMAIN_NAME" as it has. In
this case, the following records should be deleted from the file:

"151.48.43.174" "IUNET-BNET" "48-151.NET24.IT"
"151.53.80.237" "IUNET-BNET" "38-151.NET24.IT"
"151.54.214.97" "IUNET-BNET" "38-151.NET24.IT"

Any hints on this issue?

BR.
--
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
From: Janis Papanagnou on
Hongyi Zhao wrote:
> Hi all,
>
> I've the following file which includes three fields in each line:
>
> "IP_ADDRESS" "ISP_NAME" "DOMAIN_NAME"
> "109.86.226.38" "-" "-"
> "117.18.75.235" "SUNNYVISION LIMITED" "SUNNYVISIONDATACENTRE.COM"
> "119.11.13.169" "-" "-"
> "119.11.42.164" "-" "-"
> "121.44.240.31" "INTERNET SERVICE PROVIDER" "ON.NET"
> "122.155.3.145" "CAT TELECOM PUBLIC COMPANY LTD" "-"
> "140.109.17.180" "T-SINICA.EDU.TW-NET" "-"
> "145.100.100.190" "UVA-MASTER-SNE-NET" "-"
> "149.9.0.57" "PSI" "BNA.COM"
> "149.9.0.58" "PSI" "BNA.COM"
> "149.9.0.59" "PSI" "BNA.COM"
> "151.15.8.46" "ITALIA ONLINE S.P.A" "15-151.IOL.IT"
> "151.16.191.218" "IUNET-BNET" "38-151.NET24.IT"
> "151.21.86.208" "FREE INTERNET DIAL-UP SERVICES" "21-151.LIBERO.IT"
> "151.23.7.196" "ITALIA ONLINE S.P.A" "PPP-POOL-23-0-10.IOL.IT"
> "151.48.43.174" "IUNET-BNET" "48-151.NET24.IT"
> "151.53.80.237" "IUNET-BNET" "38-151.NET24.IT"
> "151.54.214.97" "IUNET-BNET" "38-151.NET24.IT"
>
> Now, I want to delete some records from this file based on "ISP_NAME"
> or "DOMAIN_NAME". I describe the details of my requirements as
> follows:
>
> 1- If a record's "ISP_NAME" and "DOMAIN_NAME" fields are "-", delete
> it from the file.

awk '$NF !~ /"-"/ || $(NF-1) !~ /"-"/' your_file

>
> 2- Based on the given IP_ADDRESS, say, 151.48.43.174, delete the
> records which have the same "ISP_NAME" or "DOMAIN_NAME" as it has. In
> this case, the following records should be deleted from the file:

This requires to operate twice on the data, first to find the respective
name and then to remove all the addresses.

For the first task[*]...

awk -v ipaddr="151.48.43.174" '$1 ~ ipaddr {print $2}' your_file

For the second task...

awk -v ispname=... '$2 !~ ispname' your_file

You can combine those commands, e.g. using command substitution where
the variable ispname is set.

Janis

[*] This is, strictly speaking, not correct since the dots match any
character in the first field, but your data seems to allow for that
simplification.

>
> "151.48.43.174" "IUNET-BNET" "48-151.NET24.IT"
> "151.53.80.237" "IUNET-BNET" "38-151.NET24.IT"
> "151.54.214.97" "IUNET-BNET" "38-151.NET24.IT"
>
> Any hints on this issue?
>
> BR.
From: Hongyi Zhao on
On Sat, 27 Mar 2010 10:46:32 +0100, Janis Papanagnou
<janis_papanagnou(a)hotmail.com> wrote:

>This requires to operate twice on the data, first to find the respective
>name and then to remove all the addresses.
>
>For the first task[*]...
>
> awk -v ipaddr="151.48.43.174" '$1 ~ ipaddr {print $2}' your_file
>
>For the second task...
>
> awk -v ispname=... '$2 !~ ispname' your_file
>
>You can combine those commands, e.g. using command substitution where
>the variable ispname is set.
>
>Janis
>
>[*] This is, strictly speaking, not correct since the dots match any
>character in the first field, but your data seems to allow for that
>simplification.

Good, thanks a lot.

But we must consider the following case:

If the record corresponding to the given IP has the following
characteristics: one of these two fields, i.e. "ISP_NAME" or
"DOMAIN_NAME" has the value: "-", then the second task will be more
dangerous because based on the value: "-", we may remove some records
that should not be deleted.

In order to solve this issue, I put the following requirement
additionally:

If the "ISP_NAME" of the record corresponding to the given IP has the
value: "-", use the "DOMAIN_NAME" as the matching conditions to do the
deteting operations, and vice versa.

So how should the code be touched up?

BR.
--
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
From: Janis Papanagnou on
Hongyi Zhao wrote:
> On Sat, 27 Mar 2010 10:46:32 +0100, Janis Papanagnou
> <janis_papanagnou(a)hotmail.com> wrote:
>
>> This requires to operate twice on the data, first to find the respective
>> name and then to remove all the addresses.
>>
>> For the first task[*]...
>>
>> awk -v ipaddr="151.48.43.174" '$1 ~ ipaddr {print $2}' your_file
>>
>> For the second task...
>>
>> awk -v ispname=... '$2 !~ ispname' your_file

I just noticed that the ISP_NAME can contain spaces, so the suggested
solution wouldn't work. Sorry. To fix that you can re-define the FS in
awk as " " (i.e. as three characters quote, space, quote), and remove
the quotes from the search pattern if you're comparing field 2.

>>
>> You can combine those commands, e.g. using command substitution where
>> the variable ispname is set.
>>
>> Janis
>>
>> [*] This is, strictly speaking, not correct since the dots match any
>> character in the first field, but your data seems to allow for that
>> simplification.
>
> Good, thanks a lot.
>
> But we must consider the following case:
>
> If the record corresponding to the given IP has the following
> characteristics: one of these two fields, i.e. "ISP_NAME" or
> "DOMAIN_NAME" has the value: "-", then the second task will be more
> dangerous because based on the value: "-", we may remove some records
> that should not be deleted.

This would just require an additional condition; make sure $2 is not
"-".

>
> In order to solve this issue, I put the following requirement
> additionally:
>
> If the "ISP_NAME" of the record corresponding to the given IP has the
> value: "-", use the "DOMAIN_NAME" as the matching conditions to do the
> deteting operations, and vice versa.

In the code

awk -v ipaddr="151.48.43.174" '$1 ~ ipaddr {print $2}' your_file

you can print conditionally

print (($2 !~ /"-"/) ? $2 : $3)

but you need both values (or some discriminator). So see below...

>
> So how should the code be touched up?

Return both values in the "first task"

awk -v ipaddr= ... '$1 ~ ipaddr {print $2 SEP $3}'

with an appropriatly choosen value for SEP, and adjust the condition
for the "second task"

awk -v isp_and_dom=... '
BEGIN { split(isp_and_dom,iad,SEP) }
($2 !~ /"-"/ && $2 !~ iad[1]) ||
($2 ~ /"-"/ && $3 !~ iad[2])
' your_file

This requires to consider the space problem above and must be adjusted
accordingly, as mentioned. Note that the code could be further simplified
and it is of course untested.

Janis

>
> BR.
From: Hongyi Zhao on
On Sat, 27 Mar 2010 12:49:03 +0100, Janis Papanagnou
<janis_papanagnou(a)hotmail.com> wrote:

>I just noticed that the ISP_NAME can contain spaces, so the suggested
>solution wouldn't work. Sorry. To fix that you can re-define the FS in
>awk as " " (i.e. as three characters quote, space, quote), and remove
>the quotes from the search pattern if you're comparing field 2.

What about just use the quote as field separator and use $2,$4,$6 to
catch the correponding three fields?

$ echo '"121.44.240.31" "INTERNET SERVICE PROVIDER" "ON.NET"' | awk
-F'"' '{print $2,$4,$6 }'
121.44.240.31 INTERNET SERVICE PROVIDER ON.NET
--
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.