Fun with the $SHELL [Shell]

Prev: awk: is it possible to use some charcters' combination as the field-separator?
Next: awk: is it possible to use some charcters' combination as thefield-separator?

From: Andreas Marschke on 22 Feb 2010 14:43

> Whenever I see a command pipe like this one with the grep|sed|sed sequence
> I wonder why the programmer does not use just a single tool, preferable
> one that results in clearer, better legible, and easier maintainable code.
>
> Janis

My goal was to use tools that are avaiable on every possible machine there
could be. Not to use a one true thing.
Besides one part of the UNIX philosoph is to have "Tools that do only one
thing and do it well". So one can use them in conjunction to achieve his
goal.

I'm interested what you would use for a task like that to do this. Pleas
enlighten me.

From: Janis Papanagnou on 22 Feb 2010 15:19

Andreas Marschke wrote:
>> Whenever I see a command pipe like this one with the grep|sed|sed sequence
>> I wonder why the programmer does not use just a single tool, preferable
>> one that results in clearer, better legible, and easier maintainable code.
>>
>> Janis
>
> My goal was to use tools that are avaiable on every possible machine there
> could be. Not to use a one true thing.

There is no such thing as "the one true tool".

> Besides one part of the UNIX philosoph is to have "Tools that do only one
> thing and do it well". So one can use them in conjunction to achieve his
> goal.

And once you combine an unnecessary large number of Unix tools who each do
their specific task well you're prone to get an unmaintainable inefficient
mess instead.

You don't seriously mean your multi-line spanning code to be commendable?

> I'm interested what you would use for a task like that to do this. Pleas
> enlighten me.

Whereever you build pipelines of: cut, head, tail, sed, grep, tr, etc. etc.
use (e.g.) awk(1) instead; and "avaiable on every possible machine"; it's
standard on Unix and available even for WinDOS if you like. Another option,
if you're not repelled by it's syntax, is perl (it's non-standard on Unixes,
but generally available as well).

Janis

From: Ed Morton on 22 Feb 2010 15:19

On 2/22/2010 1:43 PM, Andreas Marschke wrote:
>> Whenever I see a command pipe like this one with the grep|sed|sed sequence
>> I wonder why the programmer does not use just a single tool, preferable
>> one that results in clearer, better legible, and easier maintainable code.
>>
>> Janis
>
> My goal was to use tools that are avaiable on every possible machine there
> could be. Not to use a one true thing.
> Besides one part of the UNIX philosoph is to have "Tools that do only one
> thing and do it well". So one can use them in conjunction to achieve his
> goal.
>
> I'm interested what you would use for a task like that to do this. Pleas
> enlighten me.

awk, for example, given what you posted:

wget ... |
grep '<A HREF="/jargonfile/[a-z]/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*</A>' | sed
's:$<[a-zA-Z0-9]*>\|</[a-zA-Z0-9]*>\|<A
HREF="/[a-zA-Z0-9]*/[a-z]/[a-zAZ0-9]*\.html">\|<[a-z]*>\|</[a-z]*>$::g' | sed
s/\ \ */\ /g

if I understand the above correctly then the direct translation would be:

wget ... |
awk '/<A HREF="\/jargonfile\/[a-z]\/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*<\/A>/ {
gsub(/<[a-zA-Z0-9]*>|<\/[a-zA-Z0-9]*>|<A
HREF="\/[a-zA-Z0-9]*\/[a-z]\/[a-zAZ0-9]*\.html">|<[a-z]*>|<\/[a-z]*>/,"")
gsub(/ +/," ")
print
}'

which could be simplified to:

wget ... |
awk '/<A HREF="\/jargonfile\/[a-z]\/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*<\/A>/ {
gsub(/<[a-zA-Z0-9]*>|<\/[a-zA-Z0-9]*>|<A
HREF="\/[a-zA-Z0-9]*\/[a-z]\/[a-zAZ0-9]*\.html">/,"")
gsub(/ +/," ")
print
}'

since [a-z]* is a subset of the [a-zA-Z0-9]* which is already present in your
RE. Then since you're just operating on the "jargonfile" records that looks like
it could further be simplified to:

wget ... |
awk 'sub(/<A HREF="\/jargonfile\/[a-z]\/[a-zA-Z0-9]*.html">/,"") {
gsub(/<[a-zA-Z0-9]*>|<\/[a-zA-Z0-9]*>/,"")
gsub(/ +/," ")
print
}'

and then finally cleaning up the odds and ends:

wget ... |
awk 'sub(/<A HREF="\/jargonfile\/[[:lower:]]\/[[:alnum:]]*.html">/,"") {
gsub(/<\/?[[:alnum:]]*>/,"")
gsub(/[[:space:]]+/," ")
print
}'

which appear to be a lot clearer and simpler than what you started with:

wget ... |
grep '<A HREF="/jargonfile/[a-z]/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*</A>' | sed
's:$<[a-zA-Z0-9]*>\|</[a-zA-Z0-9]*>\|<A
HREF="/[a-zA-Z0-9]*/[a-z]/[a-zAZ0-9]*\.html">\|<[a-z]*>\|</[a-z]*>$::g' | sed
s/\ \ */\ /g

Regards,

Ed.

From: Janis Papanagnou on 22 Feb 2010 15:32

Andreas Marschke wrote:
> Hi !
>
> I was just wondering wether somebody wants to share his/her best shell
> script snippets here on the list. [...]

There have been some very clever examples posted here in the past.
Just recently someone reposted a very clever idea from Dan Mercer.
Stay tuned with this Usenet group and you'll find the worthy pearls.

Janis

From: Andreas Marschke on 22 Feb 2010 16:42

Janis Papanagnou wrote:

> Andreas Marschke wrote:
>>> I suppose the truly good hacker uses a parser to minimize both typing
>>> and error likeliness, but that's just my opinion.
>>
>> What do you mean by "parser" ?
>>
>
> A tool aware of the specific syntax of the data.
>
> Janis
I was more interested in you naming an actual tool not the general
description of a parser.

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: awk: is it possible to use some charcters' combination as the field-separator?
Next: awk: is it possible to use some charcters' combination as thefield-separator?