From: MRAB on
rantingrick wrote:
> On Aug 7, 7:20 pm, ����� <nikos.the.gr...(a)gmail.com> wrote:
>> Hello dear Pythoneers,
>
> I prefer Pythonista, but anywho..
>
>> I have over 500 .php web pages in various subfolders under 'data'
>> folder that i have to rename to .html
>
> import os
> os.rename(old, new)
>
>> and and ditch the '<?' and '?>' tages from within
>
> path = 'some/valid/path'
> f = open(path, 'r')
> data = f.read()
> f.close()
> data.replace('<?', '')
> data.replace('?>', '')
>
That should be:

data = data.replace('<?', '')
data = data.replace('?>', '')

>> and also insert a very first line of <!-- id -->
>> where id must be an identification unique number of every page for
>> counter tracking purposes.
>
> comment = "<!-- %s -->"%(idnum)
> data.insert(idx, comment)
>
Strings don't have an 'insert' method!

>> ONly pure html code must be left.
>
> Well then don't F up! However judging from the amount of typos in this
> post i would suggest you do some major testing!
>
>> I don't know how to handle such a big data replacing problem and
>> cannot play with fire because those 500 pages are my cleints pages and
>> data of those files just cannot be messes up.
>
> Better do some serous testing first, or (if you have enough disc
> space ) create copies instead!
>
>> Can you provide to me a script please that is able of performing an
>> automatic way of such a page content replacing?
>
> This is very basic stuff and the fine manual is free you know. But how
> much are you willing to pay?

From: Thomas Jollans on
On 08/08/2010 04:46 AM, rantingrick wrote:
> *facepalm*! I really must stop Usenet-ing whilst consuming large
> volumes of alcoholic beverages.

THAT explains a lot.

Cheers

From: Thomas Jollans on
On 08/08/2010 11:21 AM, Νίκος wrote:
> Please help me adjust it, if need extra modification for more php tags
> replacing.

Have you tried it ? I haven't, but I see no immediate reason why it
wouldn't work with multiple PHP blocks.

> #!/usr/bin/python
>
> import cgitb; cgitb.enable()
> import cgi, re, os
>
> print ( "Content-type: text/html; charset=UTF-8 \n" )
>
>
> id = 0 # unique page_id
>
> for currdir, files, dirs in os.walk('data'):
>
> for f in files:
>
> if f.endswith('php'):
>
> # get abs path to filename
> src_f = join(currdir,f)
>
> # open php src file
> f = open(src_f, 'r')
> src_data = f.read() # read contents of PHP file
> f.close()
> print 'reading from %s' % src_f
>
> # replace tags
> src_data = src_data.replace('<%', '')
> src_data = src_data.replace('%>', '')

Did you read the script before posting? ;-)
Here, you remove ASP-style tags. Which is fine, PHP supports them if you
configure it that way, but you probably didn't. Change this to the start
and end tags you actually use, and, if you use multiple forms (such as
<?php vs <?), then add another line or two.

> print 'replacing php tags'
>
> # add ID
> src_data = ( '<!-- %d -->' % id ) + src_data
> id += 1
> print 'adding unique page_id'
>
> # create new file with .html extension
> src_file = src_file.replace('.php', '.html')
>
> # open newly created html file for insertid data
> dest_f = open(src_f, 'w')
> dest_f.write(src_data) # write contents
> dest_f.close()
> print 'writing to %s' % dest_f
>
From: MRAB on
Νίκος wrote:
> On 8 Αύγ, 17:59, Thomas Jollans <tho...(a)jollans.com> wrote:
>
>> Two problems here:
>>
>> str.replace doesn't use regular expressions. You'll have to use the re
>> module to use regexps. (the re.sub function to be precise)
>>
>> '.' matches a single character. Any character, but only one.
>> '.*' matches as many characters as possible. This is not what you want,
>> since it will match everything between the *first* <? and the *last* ?>.
>> You want non-greedy matching.
>>
>> '.*?' is the same thing, without the greed.
>
> Thanks you,
>
> So i guess this needs to be written as:
>
> src_data = re.sub( '<?(.*?)?>', '', src_data )
>
In a regex '?' is a special character, so if you want a literal '?' you
need to escape it. Therefore:

src_data = re.sub(r'<\?(.*?)\?>', '', src_data)

> Tha 'r' special char doesn't need to be inserter before the regex here
> due to regex ain't containing backslashes.
>
>> You will have to find the </body> tag before inserting the string.
>> str.find should help -- or you could use str.replace and replace the
>> </body> tag with you counter line, plus a new </body>.
>
> Ah yes! Damn why din't i think of it.... str.replace should do the
> trick. I was stuck trying to figure regexes.
>
> So, i guess that should work:
>
> src_data = src_data.replace('</body>', '<br><br><h4><font
> color=green> Αριθμός Επισκεπτών: %(counter)d </font></h4></body>' )
>
>> No it's not. You're just giving up too soon.
>
> Yes youa re right, your hints keep me going and thank you for that.

From: MRAB on
Νίκος wrote:
> On 9 Αύγ, 16:52, MRAB <pyt...(a)mrabarnett.plus.com> wrote:
>> Νίκος wrote:
>>> On 8 Αύγ, 17:59, Thomas Jollans <tho...(a)jollans.com> wrote:
>>>> Two problems here:
>>>> str.replace doesn't use regular expressions. You'll have to use the re
>>>> module to use regexps. (the re.sub function to be precise)
>>>> '.' matches a single character. Any character, but only one.
>>>> '.*' matches as many characters as possible. This is not what you want,
>>>> since it will match everything between the *first* <? and the *last* ?>.
>>>> You want non-greedy matching.
>>>> '.*?' is the same thing, without the greed.
>>> Thanks you,
>>> So i guess this needs to be written as:
>>> src_data = re.sub( '<?(.*?)?>', '', src_data )
>> In a regex '?' is a special character, so if you want a literal '?' you
>> need to escape it. Therefore:
>>
>> src_data = re.sub(r'<\?(.*?)\?>', '', src_data)
>
> i see, or perhaps even this:
>
> src_data = re.sub(r'<?(.*?)?>', '', src_data)
>
> maybe it works here as well.

No. That regex means that it should match:

<? # optional '<'
(.*?)? # optional group of any number of any characters
> # '>'