From: Andreas Marschke on
> Probably not what you're looking for, but it seems this does the same
> thing (I see you're using GNU sed)

Interesting... How different is GNU sed from some of the BSD's ones or are
you referring to other unicese like HP-UX or Solaris?

> wget http://www.jargon.net/ -O- 2>/dev/null | sed -n '\:<A
> HREF="/jargonfile/[a-z]/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*</A>:
{s/<[^>]*>//g;s/
> */ /gp;}'
>
> However, keep in mind that parsing html with sed/grep and other
> regex-based tools is difficult if you can't count on the input having a
> fixed, known format.
Thats quite true. But basically its only a bit more typing effort for the
good hacker.

From: pk on
Andreas Marschke wrote:

>> Probably not what you're looking for, but it seems this does the same
>> thing (I see you're using GNU sed)
>
> Interesting... How different is GNU sed from some of the BSD's ones or are
> you referring to other unicese like HP-UX or Solaris?

GNU sed supports a number of extensions, like using \|, \+ or \? in regexps,
and it also supports extended regexps (plus a load of other features not
found in standard sed).

>> However, keep in mind that parsing html with sed/grep and other
>> regex-based tools is difficult if you can't count on the input having a
>> fixed, known format.
> Thats quite true. But basically its only a bit more typing effort for the
> good hacker.

I suppose the truly good hacker uses a parser to minimize both typing and
error likeliness, but that's just my opinion.
From: Janis Papanagnou on
Andreas Marschke wrote:
> Hi !
>
> I was just wondering wether somebody wants to share his/her best shell
> script snippets here on the list. Im interested in everything that can do
> something nifty to a system or a website. Pick your favourite shell wether
> its bash,sh,dash,ksh,csh,fish or whatever just have fun hacking and share
> your jewels!
>
> To start it off Here is a simple bash script scraping the daily JARGON off
> the website for the new hackers dictionary:
>
> |+-+-+-+-+-+--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|
> #!/bin/bash
>
> wget http://www.jargon.net/ -O- 2>/dev/null | grep '<A HREF="/jargonfile/[a-
> z]/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*</A>' | sed 's:\(<[a-zA-Z0-9]*>\|</[a-zA-
> Z0-9]*>\|<A HREF="/[a-zA-Z0-9]*/[a-z]/[a-zA-Z0-9]*\.html">\|<[a-z]*>\|</[a-
> z]*>\)::g' | sed s/\ \ */\ /g
> |+-+-+-+-+-+--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|

Whenever I see a command pipe like this one with the grep|sed|sed sequence
I wonder why the programmer does not use just a single tool, preferable one
that results in clearer, better legible, and easier maintainable code.

Janis

>
> Cheers and happy hacking!
>
> Andreas Marschke.
From: Andreas Marschke on
> I suppose the truly good hacker uses a parser to minimize both typing and
> error likeliness, but that's just my opinion.

What do you mean by "parser" ?

From: Janis Papanagnou on
Andreas Marschke wrote:
>> I suppose the truly good hacker uses a parser to minimize both typing and
>> error likeliness, but that's just my opinion.
>
> What do you mean by "parser" ?
>

A tool aware of the specific syntax of the data.

Janis