From: Randal L. Schwartz on
>>>>> "Hongyi" == Hongyi Zhao <hongyi.zhao(a)gmail.com> writes:

Hongyi> I want to obtain all of the IPv4 addresses from a file by using
Hongyi> (e)grep. What regex should I use to do this thing?

Since you're not trying to validate them (thank gawd, the wrong thing
for a regex to do), this should suffice:

/(\d+\.){3}\d+/

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn(a)stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
From: Stephane CHAZELAS on
2009-12-8, 14:25(+00), Edgardo Portal:
> On 2009-12-08, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote:
>> Hi all,
>>
>> I want to obtain all of the IPv4 addresses from a file by using
>> (e)grep. What regex should I use to do this thing?
>>
>> Thanks in adavance.
>
> Though not strictly correct (e.g., it'd find 260.1.1.1, 256.1.1.1, etc.), but
> how about the following as a rough start?
>
> prompt$ cat /tmp/ips.tmp
> 192.168.142.138
> 5.4
> 66.33.154.1
> 127.0.0.1
> 1.888.555.1212
>
> prompt$ cat /tmp/ips.tmp \
> | egrep '[12]?[0-9]?[0-9]\.[12]?[0-9]?[0-9]\.[12]?[0-9]?[0-9]\.[12]?[0-9]?[0-9]'
> 192.168.142.138
> 66.33.154.1
> 127.0.0.1

5.4 is a valid IP address, typically for an interface on a class
A network, a.b.c.d is just the quad-decimal notation.

5.4 is the same as 5.0.0.4. 127.0.0.1 is more conveniently
written 127.1 (the first address on the 127.0/8 network).

If you want to accept all the forms suppored by inet_addr(3) or
gethostbyname(3) or getaddrinfo(3), there'll be more than that.

Like

127.0.0.1, 0177.0.1, 0x7f000001 are all the same IP address, and
09.09.09.09 is invalid (wrong octal numbers).

For inet_pton(3), that's another matter, that one only supports
quad-decimal with 1 to 3 digits per number.

--
St�phane
From: Stephane CHAZELAS on
2009-12-9, 17:12(+00), Edgardo Portal:
[...]
>> On my system the problem is the (?:) syntax. This is from Perl REs
>> and denotes a non-capturing group. I find that
>>
>> egrep '\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b' ips
>>
>> works fine.
>
> Note that it does match things like 001.002.003.004,
> 012.034.056.099, etc. -- are those "valid" IPs?

Depends for what. The second is not for inet_addr, gethostbyname
or getaddrinfo as 099 is not a valid octal number.

--
St�phane
From: Michael Paoli on
On Dec 7, 4:52 pm, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote:
> I want to obtain all of the IPv4 addresses from a file by using
> (e)grep. What regex should I use to do this thing?

I'll provide some hints. Here's a bit of regular expression (RE) code
from a program I wrote in perl, that does something relatively
similar.
In it I'm using some perl RE capabilities that are (or may not be)
present in Extended Regular Expressions (EREs). I was solving a
slightly different task (matching lines that contained only and
precisely an IPv4 address in dotted quad form, and subsequently
processing (sorting numerically) by each dotted quad - so you'll want
to
match at least a bit differently (e.g. IPv4 address(es) within a
line).

Here are the significant bits of perl RE I used that aren't or may not
be in ERE or may be slightly different in ERE:
x Extend your pattern's legibility by permitting whitespace and com-
ments.
i.e. with that modifier, # through end of line, and whitespace is
ignored within the RE (unless preceded with backslash (\) in which
case
it's then taken as literal).
\d - equivalent to BRE/ERE [0-9]
? - equivalent to (depending on BRE/ERE flavor) {0,1} or \{0,1\}
{n} - equivalent to (depending on BRE/ERE flavor) {n} or \{n\}

$ expand -t 4 < ~/bin/ipv4sort | sed -ne '17,29p'
/^
(
(
\d\d?| #a digit or two
[01]\d\d|2[0-4]\d|25[0-5] #or three (in range)
)
\. #dot
){3} #thrice that
(
\d\d?| #a digit or two
[01]\d\d|2[0-4]\d|25[0-5] #or three (in range)
)
$/ox
$