Replace and inserting strings within .txt files with the useof regex [Python]

Prev: Replace and inserting strings within .txt files with the use of regex
Next: Replace and inserting strings within .txt files with the use of regex

From: MRAB on 9 Aug 2010 16:17

Νίκος wrote:
> On 9 Αύγ, 21:05, Thomas Jollans <tho...(a)jollybox.de> wrote:
>> On Monday 09 August 2010, it occurred to Νίκος to exclaim:
>>
>>> On 9 Αύγ, 19:21, Peter Otten <__pete...(a)web.de> wrote:
>>>> Νίκος wrote:
>>>>> Please tell me that no matter what weird charhs has inside ic an still
>>>>> open thosie fiels and make the neccessary replacements.
>>>> Go back to 2.6 for the moment and defer learning about unicode until
>>>> you're done with the conversion job.
>>> You are correct again! 3.2 caused the problem, i switched to 2.7 and
>>> now i donyt have that problem anymore. File is openign okey!
>>> it ALMOST convert correctly!
>>> # replace tags
>>> print ( 'replacing php tags and contents within' )
>>> src_data = re.sub( '<\?(.*?)\?>', '', src_data )
>>> it only convert the first instance of php tages and not the rest?
>>> But why?
>> http://docs.python.org/library/re.html#re.S
>>
>> You probably need to pass the re.DOTALL flag.
>
> src_data = re.sub( '<\?(.*?)\?>', '', src_data, re.DOTALL )
>
> like this?

re.sub doesn't accept a flags argument. You can put the flag inside the
regex itself like this:

src_data = re.sub(r'(?s)<\?(.*?)\?>', '', src_data)

(Note that the abbreviation for re.DOTALL is re.S and the inline flag is
'(?s)'. This is for historical reasons! :-))

From: MRAB on 9 Aug 2010 16:28

�� wrote:
> On 9 ��, 10:07, �� <nikos.the.gr...(a)gmail.com> wrote:
>> Now the code looks as follows:
>>
>> =============================
>> #!/usr/bin/python
>>
>> import re, os, sys
>>
>> id = 0 # unique page_id
>>
>> for currdir, files, dirs in os.walk('test'):
>>
>> for f in files:
>>
>> if f.endswith('php'):
>>
[snip]
>>
>> I just tried to test it. I created a folder names 'test' in me 'd:\'
>> drive.
>> Then i have put to .php files inside form the original to test if it
>> would work ok for those too files before acting in the whole copy and
>> after in the original project.
>>
>> so i opened a 'cli' form my Win7 and tried
>>
>> D:\>convert.py
>>
>> D:\>
>>
>> Itsjust printed an empty line and nothign else. Why didn't even try to
>> open the folder and fiels within?
>> Syntactically it doesnt ghive me an error!
>> Somehting with os.walk() methos perhaps?
>
> Can you help in this too please?
>
> Now iam able to just convrt a single file 'd:\test\index.php'
>
> But these needs to be done for ALL the php files in every subfolder.
>
>> for currdir, files, dirs in os.walk('test'):
>>
>> for f in files:
>>
>> if f.endswith('php'):
>
> Should the above lines enter folders and find php files in each folder
> so to be edited?

I'd start by commenting-out the lines which change the files and then
add some more print statements to see which files it's finding. That
might give a clue. Only when it's fixed and finding the correct files
would I remove the additional print statements and then restore the
commented lines.

From: MRAB on 9 Aug 2010 18:32

Νίκος wrote:
> On 9 Αύγ, 23:17, MRAB <pyt...(a)mrabarnett.plus.com> wrote:
>> Νίκος wrote:
>>> On 9 Αύγ, 21:05, Thomas Jollans <tho...(a)jollybox.de> wrote:
>>>> On Monday 09 August 2010, it occurred to Νίκος to exclaim:
>>>>> On 9 Αύγ, 19:21, Peter Otten <__pete...(a)web.de> wrote:
>>>>>> Νίκος wrote:
>>>>>>> Please tell me that no matter what weird charhs has inside ic an still
>>>>>>> open thosie fiels and make the neccessary replacements.
>>>>>> Go back to 2.6 for the moment and defer learning about unicode until
>>>>>> you're done with the conversion job.
>>>>> You are correct again! 3.2 caused the problem, i switched to 2.7 and
>>>>> now i donyt have that problem anymore. File is openign okey!
>>>>> it ALMOST convert correctly!
>>>>> # replace tags
>>>>> print ( 'replacing php tags and contents within' )
>>>>> src_data = re.sub( '<\?(.*?)\?>', '', src_data )
>>>>> it only convert the first instance of php tages and not the rest?
>>>>> But why?
>>>> http://docs.python.org/library/re.html#re.S
>>>> You probably need to pass the re.DOTALL flag.
>>> src_data = re.sub( '<\?(.*?)\?>', '', src_data, re.DOTALL )
>>> like this?
>> re.sub doesn't accept a flags argument. You can put the flag inside the
>> regex itself like this:
>>
>> src_data = re.sub(r'(?s)<\?(.*?)\?>', '', src_data)
>>
>> (Note that the abbreviation for re.DOTALL is re.S and the inline flag is
>> '(?s)'. This is for historical reasons! :-))
>
> This is for the '.' to match any character including '\n' too right?
> so no matter if the php start tag and the end tag is in different
> lines still to be matched, correct?
>
> We nned the 'raw' string as well? why? The regex doens't cotnain
> backslashes.

Yes it does; two of them!

From: MRAB on 9 Aug 2010 18:43

�� wrote:
> D:\>convert.py
> File "D:\convert.py", line 34
> SyntaxError: Non-ASCII character '\xce' in file D:\convert.py on line
> 34, but no
> encoding declared; see http://www.python.org/peps/pep-0263.html for
> details
>
> D:\>
>
> What does it refering too? what character cannot be identified?
>
> Line 34 is:
>
> src_data = src_data.replace( '</body>', '<br><br><center><h4><font
> color=green> �� : %(counter)d </body>' )
>
Didn't you say that you're using Python 2.7 now? The default file
encoding will be ASCII, but your file isn't ASCII, it contains Greek
letters. Add the encoding line:

# -*- coding: utf-8 -*-

and check that the file is saved as UTF-8.

> Also,
>
> for currdir, files, dirs in os.walk('test'):
>
> for f in files:
>
> if f.lower().endswith("php"):
>
> in the above lines
>
> should i state os.walk('test') or os.walk('d:\test') ?

The path 'test' is relative to the current working directory. Is that
D:\ for your script? If not, then it won't find the (correct) folder.

It might be better to use an absolute path instead. You could use
either:

r'd:\test'

(note that I've made it a raw string because it contains a backslash
which I want treated as a literal backslash) or:

'd:/test'

(Windows should accept a slash as well as of a backslash.)

From: MRAB on 10 Aug 2010 11:12

Νίκος wrote:
[snip]
>
> The ID number of each php page was contained in the old php code
> within this string
>
> PageID = some_number
>
> So instead of create a new ID number for eaqch page i have to pull out
> this number to store to the beginnign to the file as comment line,
> because it has direct relationship with the mysql database as in
> tracking the number of each webpage and finding the counter of it.
>
> # Grab the PageID contained within the php code and store it in id
> variable
> id = re.search( 'PageID = ', src_data )
>
> How to tell Python to Grab that number after 'PageID = ' string and to
> store it in var id that a later use in the program?
>
If the part of the file you're trying to match look like this:

PageID = 12

then the regex should look like this:

PageID = (\d+)

and the code should look like this:

page_id = re.search(r'PageID = (\d+)', src_data).group(1)

The page_id will, of course, be a string.

> also i made another changewould something like this work:
>
> ===============================
> # open same php file for storing modified data
> print ( 'writing to %s' % dest_f )
> f = open(src_f, 'w')
> f.write(src_data)
> f.close()
>
> # rename edited .php file to .html extension
> dst_f = src_f.replace('.php', '.html')
> os.rename( src_f, dst_f )
> ===============================
>
> Because instead of creating a new .html file and inserting the desired
> data of the old php thus having two files(old php, and new html) i
> decided to open the same php file for writing that data and then
> rename it to html.
> Would the above code work?

Why wouldn't it?

First | Prev |
Pages: 1 2
Prev: Replace and inserting strings within .txt files with the use of regex
Next: Replace and inserting strings within .txt files with the use of regex