Replace and inserting strings within .txt files with the use of regex [Python]

Prev: Replace and inserting strings within .txt files with the useof regex
Next: I need a starter ptr writing python embedded in html.

From: Νίκος on 9 Aug 2010 10:21

On 9 ÎÏÎ³, 16:52, MRAB <pyt...(a)mrabarnett.plus.com> wrote:
> ÎÎ¯ÎºÎ¿Ï wrote:
> > On 8 ÎÏÎ³, 17:59, Thomas Jollans <tho...(a)jollans.com> wrote:
>
> >> Two problems here:
>
> >> str.replace doesn't use regular expressions. You'll have to use the re
> >> module to use regexps. (the re.sub function to be precise)
>
> >> '.' Â matches a single character. Any character, but only one.
> >> '.*' matches as many characters as possible. This is not what you want,
> >> since it will match everything between the *first* <? and the *last* ?>.
> >> You want non-greedy matching.
>
> >> '.*?' is the same thing, without the greed.
>
> > Thanks you,
>
> > So i guess this needs to be written as:
>
> > src_data = re.sub( '<?(.*?)?>', '', src_data )
>
> In a regex '?' is a special character, so if you want a literal '?' you
> need to escape it. Therefore:
>
> Â Â Â src_data = re.sub(r'<\?(.*?)\?>', '', src_data)

i see, or perhaps even this:

Â Â src_data = re.sub(r'<?(.*?)?>', '', src_data)

maybe it works here as well.

From: Νίκος on 9 Aug 2010 16:30

On 9 ÎÏÎ³, 23:17, MRAB <pyt...(a)mrabarnett.plus.com> wrote:
> ÎÎ¯ÎºÎ¿Ï wrote:
> > On 9 ÎÏÎ³, 21:05, Thomas Jollans <tho...(a)jollybox.de> wrote:
> >> On Monday 09 August 2010, it occurred to ÎÎ¯ÎºÎ¿Ï to exclaim:
>
> >>> On 9 ÎÏÎ³, 19:21, Peter Otten <__pete...(a)web.de> wrote:
> >>>> ÎÎ¯ÎºÎ¿Ï wrote:
> >>>>> Please tell me that no matter what weird charhs has inside ic an still
> >>>>> open thosie fiels and make the neccessary replacements.
> >>>> Go back to 2.6 for the moment and defer learning about unicode until
> >>>> you're done with the conversion job.
> >>> You are correct again! 3.2 caused the problem, i switched to 2.7 and
> >>> now i donyt have that problem anymore. File is openign okey!
> >>> it ALMOST convert correctly!
> >>> # replace tags
> >>> print ( 'replacing php tags and contents within' )
> >>> src_data = re.sub( '<\?(.*?)\?>', '', src_data )
> >>> it only convert the first instance of php tages and not the rest?
> >>> But why?
> >>http://docs.python.org/library/re.html#re.S
>
> >> You probably need to pass the re.DOTALL flag.
>
> > Â src_data = re.sub( '<\?(.*?)\?>', '', src_data, re.DOTALL )
>
> > like this?
>
> re.sub doesn't accept a flags argument. You can put the flag inside the
> regex itself like this:
>
> Â Â Â src_data = re.sub(r'(?s)<\?(.*?)\?>', '', src_data)
>
> (Note that the abbreviation for re.DOTALL is re.S and the inline flag is
> '(?s)'. This is for historical reasons! :-))

This is for the '.' to match any character including '\n' too right?
so no matter if the php start tag and the end tag is in different
lines still to be matched, correct?

We nned the 'raw' string as well? why? The regex doens't cotnain
backslashes.

From: Νίκος on 9 Aug 2010 17:05

On 9 ÎÏÎ³, 23:28, MRAB <pyt...(a)mrabarnett.plus.com> wrote:
> ÃÃÃªÃ¯Ã² wrote:
> > On 9 ÃÃ½Ã£, 10:07, ÃÃÃªÃ¯Ã² <nikos.the.gr...(a)gmail.com> wrote:
> >> Now the code looks as follows:
>
> >> =============================
> >> #!/usr/bin/python
>
> >> import re, os, sys
>
> >> id = 0 Â # unique page_id
>
> >> for currdir, files, dirs in os.walk('test'):
>
> >> Â Â Â Â for f in files:
>
> >> Â Â Â Â Â Â Â Â if f.endswith('php'):
>
> [snip]
>
> >> I just tried to test it. I created a folder names 'test' in me 'd:\'
> >> drive.
> >> Then i have put to .php files inside form the original to test if it
> >> would work ok for those too files before acting in the whole copy and
> >> after in the original project.
>
> >> so i opened a 'cli' form my Win7 and tried
>
> >> D:\>convert.py
>
> >> D:\>
>
> >> Itsjust printed an empty line and nothign else. Why didn't even try to
> >> open the folder and fiels within?
> >> Syntactically it doesnt ghive me an error!
> >> Somehting with os.walk() methos perhaps?
>
> > Can you help in this too please?
>
> > Now iam able to just convrt a single file 'd:\test\index.php'
>
> > But these needs to be done for ALL the php files in every subfolder.
>
> >> for currdir, files, dirs in os.walk('test'):
>
> >> Â Â Â Â for f in files:
>
> >> Â Â Â Â Â Â Â Â if f.endswith('php'):
>
> > Should the above lines enter folders and find php files in each folder
> > so to be edited?
>
> I'd start by commenting-out the lines which change the files and then
> add some more print statements to see which files it's finding. That
> might give a clue. Only when it's fixed and finding the correct files
> would I remove the additional print statements and then restore the
> commented lines.

I did that, but it doesnt even get to the 'test' folder to search for
the files!

From: Νίκος on 10 Aug 2010 01:11

On 10 ÎÏÎ³, 01:43, MRAB <pyt...(a)mrabarnett.plus.com> wrote:
> ÃÃÃªÃ¯Ã² wrote:
> > D:\>convert.py
> > Â File "D:\convert.py", line 34
> > SyntaxError: Non-ASCII character '\xce' in file D:\convert.py on line
> > 34, but no
> > Â encoding declared; seehttp://www.python.org/peps/pep-0263.htmlfor
> > details
>
> > D:\>
>
> > What does it refering too? what character cannot be identified?
>
> > Line 34 is:
>
> > src_data = src_data.replace( '</body>', '<br><br><center><h4><font
> > color=green> ÃÃ±Ã©Ã¨Ã¬Ã¼Ã² ÃÃ°Ã©Ã³ÃªÃ¥Ã°Ã´Ã¾Ã: %(counter)d </body>' )
>
> Didn't you say that you're using Python 2.7 now? The default file
> encoding will be ASCII, but your file isn't ASCII, it contains Greek
> letters. Add the encoding line:
>
> Â Â Â # -*- coding: utf-8 -*-
>
> and check that the file is saved as UTF-8.
>
> > Also,
>
> > for currdir, files, dirs in os.walk('test'):
>
> > Â Â for f in files:
>
> > Â Â Â Â Â Â if f.lower().endswith("php"):
>
> > in the above lines
>
> > should i state Â os.walk('test') or Â os.walk('d:\test') ?
>
> The path 'test' is relative to the current working directory. Is that
> D:\ for your script? If not, then it won't find the (correct) folder.
>
> It might be better to use an absolute path instead. You could use
> either:
>
> Â Â Â r'd:\test'
>
> (note that I've made it a raw string because it contains a backslash
> which I want treated as a literal backslash) or:
>
> Â Â Â 'd:/test'
>
> (Windows should accept a slash as well as of a backslash.)

I will try it as soon as i make another change that i missed:

The ID number of each php page was contained in the old php code
within this string

PageID = some_number

So instead of create a new ID number for eaqch page i have to pull out
this number to store to the beginnign to the file as comment line,
because it has direct relationship with the mysql database as in
tracking the number of each webpage and finding the counter of it.

# Grab the PageID contained within the php code and store it in id
variable
id = re.search( 'PageID = ', src_data )

How to tell Python to Grab that number after 'PageID = ' string and to
store it in var id that a later use in the program?

also i made another changewould something like this work:

===============================
# open same php file for storing modified data
print ( 'writing to %s' % dest_f )
f = open(src_f, 'w')
f.write(src_data)
f.close()

# rename edited .php file to .html extension
dst_f = src_f.replace('.php', '.html')
os.rename( src_f, dst_f )
===============================

Because instead of creating a new .html file and inserting the desired
data of the old php thus having two files(old php, and new html) i
decided to open the same php file for writing that data and then
rename it to html.
Would the above code work?

From: Νίκος on 11 Aug 2010 05:48

On 10 ÎÏÎ³, 18:12, MRAB <pyt...(a)mrabarnett.plus.com> wrote:
> ÎÎ¯ÎºÎ¿Ï wrote:
>
> [snip]
>
>
>
>
>
> > The ID number of each php page was contained in the old php code
> > within this string
>
> > PageID = some_number
>
> > So instead of create a new ID number for eaqch page i have to pull out
> > this number to store to the beginnign to the file as comment line,
> > because it has direct relationship with the mysql database as in
> > tracking the number of each webpage and finding the counter of it.
>
> > # Grab the PageID contained within the php code and store it in id
> > variable
> > id = re.search( 'PageID = ', src_data )
>
> > How to tell Python to Grab that number after 'PageID = ' string and to
> > store it in var id that a later use in the program?
>
> If the part of the file you're trying to match look like this:
>
> Â Â Â PageID = 12
>
> then the regex should look like this:
>
> Â Â Â PageID = (\d+)
>
> and the code should look like this:
>
> Â Â Â page_id = re.search(r'PageID = (\d+)', src_data).group(1)
>
> The page_id will, of course, be a string.
>

Thank you very much for helping me with the syntax.

> > also i made another changewould something like this work:
>
> > ===============================
> > # open same php file for storing modified data
> > print ( 'writing to %s' % dest_f )
> > f = open(src_f, 'w')
> > f.write(src_data)
> > f.close()
>
> > # rename edited .php file to .html extension
> > dst_f = src_f.replace('.php', '.html')
> > os.rename( src_f, dst_f )
> > ===============================
>
> > Because instead of creating a new .html file and inserting the desired
> > data of the old php thus having two files(old php, and new html) i
> > decided to open the same php file for writing that data and then
> > rename it to html.
> > Would the above code work?
>
> Why wouldn't it?

I though i was perhaps did something wrong with the code.

=========================================
for currdir, files, dirs in os.walk('d:\\test'): # neither 'd:/test'
tracks the folder

for f in files:

if f.lower().endswith("php"):

print currdir, files, dirs, f
=========================================

As you advised me in a previous post of yours i need to find out why
the converting code
although works for a single file doesn't for some reason enter folders
and subfolders to grab files form there to convert.

So as you said i should comment all other statements to find out the
culprit in the above lines.

Well those lines are supposed to print current working folder and
files but when i run the above code it gives me nothing in response,
not even 'f'.

So does that mean that os.walk() method cannot enter the windows 7
folders?

* One more thing is that instead of trying to run the above script
form 'cli' wouldn't it better to run it as a cgi script and see the
results in the browser instead with the addition fo this line?

print ( "Content-type: text/html; charset=UTF-8 \n" )

Or for some reason this has to be run from the shell to both
local(windows 7) and remote hosting (linux) servers?

First | Prev |
Pages: 1 2
Prev: Replace and inserting strings within .txt files with the useof regex
Next: I need a starter ptr writing python embedded in html.