|
Prev: Matching URLs with REs (was "Some questions about q{} and qr{}").
Next: FAQ 9.8 How do I fetch an HTML file?
From: Ben Bullock on 15 Apr 2008 18:46 On Tue, 15 Apr 2008 13:35:27 -0700, Robbie Hatley wrote: > "Ben Bullock" wrote: > >> Well OK but if I was going to do this for real, I would use something >> like /\b(($validdns\.){1,62}(com|net|org|us|uk|ca|jp))\b/i or similar >> (I haven't checked this regex with the machine yet but hopefully you >> get the picture). > > The problem with "(com|net|org|us|uk|ca|jp)" or similar is that there > are hundreds or thousands of such valid domain suffixes. I think there are only about 200 or so, most of which are rare. > You're > forgetting "es" (Spain), "ru" (Russia), "uk" (Ukraine), "us" (USA), not > to mention "mil", "gov", "edu", "biz", "info", etc, etc, etc. Um, I have both "us" and "uk" there. I didn't know that uk was Ukraine though. > That's > part of why my URL-matching regex was so vague. >> I just wanted to make the point that the &$% stuff is not valid as part >> of the web address. > > Those characters all appear in web addresses. Did you really not understand my point? > Hence I tend to go for a vauge RE that I believe > captures every valid document URL, at the cost of occasionally > caputuring a few invalid ones. Unless someone knows a better approach. Well, even if they do know a better approach, they might not have the energy to discuss it with you.
From: Ben Bullock on 16 Apr 2008 20:11
On Wed, 16 Apr 2008 12:49:32 -0700, Robbie Hatley wrote: > Ok, I just downloaded Regexp-Common-2.120. Now I have a folder with a > bunch of stuff in it. This may sound like an incredibly stupid > question, but what do I do with it? I've never actually used a CPAN > module before. Any hints a CPAN newbie should be aware of? If I want to install a cpan module, I usually don't directly download the .tar.gz file. Instead I log in as root and type cpan Regexp::Common You might need to prefix that with "sudo" if you are using Ubuntu/Debian linux. If you are using ActiveState Perl on Windows, you are better off using "ppm", the Perl Package Manager, which has precompiled versions of the modules. |