From: Bit Twister on
On Sun, 21 Mar 2010 21:55:22 -0500, Ohmster wrote:
> Summary:
> Take the zip file from below and make a script that will take item 1 on
> the htm list below as an input file and output a clean html with target
> attribute html file.
>
> http://home.comcast.net/~theohmster/files/dirty_bookmarks.zip

Here is something for you to play with.
#!/bin/bash
_out_fn=urls.html
_in_fn=bookmark.htm

/bin/cp /dev/null $_out_fn

while read -r _line ; do
_char4=${_line:0:4}
set -- $_line
if [ "$_char4" = "<DT>" ] ; then
shift
_href="$1"
shift 5
echo "<DT><A $_href target=\"_blank\" $@</A>" >> $_out_fn
fi
done < $_in_fn

echo "Output in $_out_fn"
From: Ohmster on
Bit Twister <BitTwister(a)mouse-potato.com> wrote in
news:slrnhqdooi.6vs.BitTwister(a)cooker.home.test:

> On Sun, 21 Mar 2010 21:55:22 -0500, Ohmster wrote:
>> Summary:
>> Take the zip file from below and make a script that will take item 1
on
>> the htm list below as an input file and output a clean html with
target
>> attribute html file.
>>
>> http://home.comcast.net/~theohmster/files/dirty_bookmarks.zip
>
> Here is something for you to play with.
> #!/bin/bash
> _out_fn=urls.html
> _in_fn=bookmark.htm

Bit Twister,

Your script worked so good the other night and I was so happy, but now
for some reason, I am getting a funky output from it and I cannot figure
out why. I put your code into iclean, put that in my ~/scripts/ directory
and made it executable and now it is in my path. I put bookmark.htm in my
~/bench/ directory and run "iclean" with no quoted and get url.html.
Nice. Except that this version is not good. Oh it is clean alright, that
part is good, but I am getting an extra "</A>" tag on every other line,
this is f*cking up the page in a bad way. I suppose I can use a text
editor macro to drop down a line and delete to end of file, but this
script is supposed to work good and I am pretty sure it did, what went
wrong?

I put these files in a zip for you:

bookmark.htm
ieclean
urls.htm

You got my copy of your script (ieclean)
The original IE bookmark.htm file
and the funky output urls.htm file

Here they are, can you have a look please?
http://home.comcast.net/~theohmster/files/BT_IE_Clean.zip

This is what I am getting out of ieclean and this is not very good:

<DT><A HREF="http://www.absolutearts.com/jboutrouil/" target="_blank" >
AbsoluteArts-Jean Claude Boutrouille -.url</A>
</A>
<DT><A HREF="http://www.absolutearts.com/" target="_blank" >
AbsoluteArts.com.url</A>
</A>
<DT><A HREF="http://scorptest.free.fr/test_flip/" target="_blank" >Art-
Jean Claude FLIP.url</A>
</A>

That trailing </A> is not very helpful and is messing things up pretty
bad. Shouldn't I get a </DT> at the end of the line instead of an extra
</A> every other line instead?

Thanks.

--
~Ohmster | ohmster59 /a/t/ gmail dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
From: J G Miller on
On Mon, 22 Mar 2010 18:49:18 -0500, Ohmster wrote:
>
> Bit Twister,

> Here they are, can you have a look please?
> http://home.comcast.net/~theohmster/files/BT_IE_Clean.zip

This is a Usenet news group.

Are you not able to use e-mail for personal messages
and software support?
From: Ohmster on
J G Miller <miller(a)yoyo.ORG> wrote in news:1269302804_31(a)vo.lu:

>> Here they are, can you have a look please?
>> http://home.comcast.net/~theohmster/files/BT_IE_Clean.zip
>
> This is a Usenet news group.
>
> Are you not able to use e-mail for personal messages
> and software support?

....hmmm, it does appear to be a personal message, the way I worded it,
doesn't it? I was frustrated and pressed for time, but I posted in a Usenet
group because we were discussing scripting to complete a requested task. I
would welcome the help from anyone gifted enough in scripting to assist,
although I didn't word the post to reflect that. Hungry, late, and
frustrated, should have worded it to the newsgroup rather than one
individual. My bad.

Ate dinner, feeling *much* better now. This is actually a continuation of a
request for a code cleaning script and the suggested script is posted top
of this thread. I just want to find out if why I am having difficulty with
it. If you can help or have any scripting experience, your assistence would
be most welcome, JG. I will be more "group orientated" in future posts,
thanks for the heads up. I was in too much of a rush to post enough
information to get help from anyone that would be able to help. Heh, dumb
mistake. ;P

....now what in the hell went wrong with that script, it was such a good one
too...

Guys, is it me or is there anything in here that someone can spot that
would cause a double </A> tag in the generated urls like I am getting? My
problem is that this, IMHO, is great stuff but I am not educated or
experienced enough to manipulate it as of yet, or at least not very much.

e.g.:
<DT><A HREF="http://www.absolutearts.com/jboutrouil/" target="_blank" >
AbsoluteArts-Jean Claude Boutrouille -.url</A> </A>

Here is a line of html code from the original bookmark.htm generated by IE:
<DT><A HREF="http://www.absolutearts.com/jboutrouil/"
ADD_DATE="1264959631" LAST_VISIT="1264959631" LAST_MODIFIED="1264207428"
ICON_URI="http://www.absolutearts.com/favicon.ico" >AbsoluteArts-Jean
Claude Boutrouille -.url</A>

I would love to get that ".url" out of the link name too.

---------------------------------------------------------------------
[ohmster(a)ohmster scripts]$ cat ieclean
#!/bin/bash

# Written by Bit Twister 03-21-2010 to cleanup html code from
# Internet Explorer 8's export of bookmark.htm file

_out_fn=urls.html
_in_fn=bookmark.htm

/bin/cp /dev/null $_out_fn

while read -r _line ; do
_char4=${_line:0:4}
set -- $_line
if [ "$_char4" = "<DT>" ] ; then
shift
_href="$1"
shift 5
echo "<DT><A $_href target=\"_blank\" $@</A>" >> $_out_fn
fi
done < $_in_fn

echo "Output in $_out_fn"
---------------------------------------------------------------------
Bit Twister, anybody? Understand this script enough to figure out what is
making the extra close address tag and how to clean that ".url" out of the
link name?

Thanks guys.

--
~Ohmster | ohmster59 /a/t/ gmail dot com
Put "messageforohmster" in message body
(That is Message Body, not Subject!)
to pass my spam filter.
From: Bit Twister on
On Mon, 22 Mar 2010 18:49:18 -0500, Ohmster wrote:

> Nice. Except that this version is not good. Oh it is clean alright, that
> part is good, but I am getting an extra "</A>" tag on every other line,

Not every other line from my results of dumping your zip.
I see two </A> per <DT> which also show up in your snippet.

> this is f*cking up the page in a bad way. I suppose I can use a text
> editor macro to drop down a line and delete to end of file, but this
> script is supposed to work good and I am pretty sure it did, what went
> wrong?

I am not much interested in what went wrong. Apparently you are not
making any effort what so ever, at understanding scripting language.


> That trailing </A> is not very helpful and is messing things up pretty
> bad.

Then remove it from the script. For someone able to code macros, you
seemed to have disconnected the logic part of your brain.

> Shouldn't I get a </DT> at the end of the line

That was not in your initial requirement. :-D

> instead of an extra </A> every other line instead?

My recommendations:
Use an editor to leave about 3 <DT> lines in bookmark.htm or a test file.

To see what is happening in the script, add a line to show what is
going on, the line is

set -x # that enables the debugging aide.

one or more lines of script code to watch

set - # that turns off the debugging aide.

Now run the script.

I do not do html so I do not know what format you what the end of the
line to contain.

Going to guess you need to add a line
_text="$@"

(some substring work on $_text goes here)

and change $@ to $_text in the output line then put the finishing
touch to the output line.


For substring work you might look at
http://www.tldp.org/LDP/abs/html/refcards.html#AEN22102
would not hurt to look at everything on that page.


When I am trying to work out a script problem, I click up another
terminal and test commands at the command line.

Example: click up a terminal and just paste these into the terminal.

_text=" >Art-Jean Claude FLIP.url</A>"
echo "target=\"_blank\" $_text DT"

Now you should be able to play around with scripting commands.
You can use command line editing commands to modify a line.

Other option is to put experiment commands in a script and see what
happens.