From: james_027 on
hi,

Any idea how I can replace words in a html file? Meaning only the
content will get replace while the html tags, javascript, & css are
remain untouch.

THanks,
James
From: Daniel Fetchinson on
> Any idea how I can replace words in a html file? Meaning only the
> content will get replace while the html tags, javascript, & css are
> remain untouch.

I'm not sure what you tried and what you haven't but as a first trial
you might want to

<untested>

f = open( 'new.html', 'w' )
f.write( open( 'index.html' ).read( ).replace( 'replace-this', 'with-that' ) )
f.close( )

</untested>

HTH,
Daniel




--
Psss, psss, put it down! - http://www.cafepress.com/putitdown
From: Luap777 on
On Apr 28, 8:02 am, james_027 <cai.hai...(a)gmail.com> wrote:
> hi,
>
> Any idea how I can replace words in a html file? Meaning only the
> content will get replace while the html tags, javascript, & css are
> remain untouch.
>
> THanks,
> James

You might try cleaning the HTML with uTidy (http://
utidylib.berlios.de/) to make XHTML then using Beautiful Soup (http://
www.crummy.com/software/BeautifulSoup/documentation.html) to process
it.

If the number of files isn't that large and it's a one-time thing, you
might be just as well using search and replace on the directory and
previewing each replacement as you go....
From: Cameron Simpson on
On 28Apr2010 22:03, Daniel Fetchinson <fetchinson(a)googlemail.com> wrote:
| > Any idea how I can replace words in a html file? Meaning only the
| > content will get replace while the html tags, javascript, & css are
| > remain untouch.
|
| I'm not sure what you tried and what you haven't but as a first trial
| you might want to
|
| <untested>
|
| f = open( 'new.html', 'w' )
| f.write( open( 'index.html' ).read( ).replace( 'replace-this', 'with-that' ) )
| f.close( )
|
| </untested>

If 'replace-this' occurs inside the javascript etc or happens to be an
HTML tag name, it will get mangled. The OP didn't want that.

The only way to get this right is to parse the file, then walk the doc
tree enditing only the text parts.

The BeautifulSoup module (3rd party, but a single .py file and trivial to
fetch and use, though it has some dependencies) does a good job of this,
coping even with typical not quite right HTML. It gives you a parse
tree you can easily walk, and you can modify it in place and write it
straight back out.

Cheers,
--
Cameron Simpson <cs(a)zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

The Web site you seek
cannot be located but
endless others exist
- Haiku Error Messages http://www.salonmagazine.com/21st/chal/1998/02/10chal2.html
From: Daniel Fetchinson on
> | > Any idea how I can replace words in a html file? Meaning only the
> | > content will get replace while the html tags, javascript, & css are
> | > remain untouch.
> |
> | I'm not sure what you tried and what you haven't but as a first trial
> | you might want to
> |
> | <untested>
> |
> | f = open( 'new.html', 'w' )
> | f.write( open( 'index.html' ).read( ).replace( 'replace-this', 'with-that'
> ) )
> | f.close( )
> |
> | </untested>
>
> If 'replace-this' occurs inside the javascript etc or happens to be an
> HTML tag name, it will get mangled. The OP didn't want that.

Correct, that is why I started with "I'm not sure what you tried and
what you haven't but as a first trial you might". For instance if the
OP wants to replace words which he knows are not in javascript and/or
css and he knows that these words are also not in html attribute
names/values, etc, etc, then the above approach would work, in which
case BeautifulSoup is a gigantic overkill. The OP needs to specify
more clearly what he wants, before really useful advice can be given.

Cheers,
Daniel


> The only way to get this right is to parse the file, then walk the doc
> tree enditing only the text parts.
>
> The BeautifulSoup module (3rd party, but a single .py file and trivial to
> fetch and use, though it has some dependencies) does a good job of this,
> coping even with typical not quite right HTML. It gives you a parse
> tree you can easily walk, and you can modify it in place and write it
> straight back out.
>
> Cheers,
> --
> Cameron Simpson <cs(a)zip.com.au> DoD#743
> http://www.cskk.ezoshosting.com/cs/
>
> The Web site you seek
> cannot be located but
> endless others exist
> - Haiku Error Messages
> http://www.salonmagazine.com/21st/chal/1998/02/10chal2.html
>


--
Psss, psss, put it down! - http://www.cafepress.com/putitdown