From: Νίκος on
On 9 Αύγ, 13:47, Peter Otten <__pete...(a)web.de> wrote:
> Νίκος wrote:
> > On 9 Αύγ, 13:06, Peter Otten <__pete...(a)web.de> wrote:
>
> >> > So since its utf-8 what the problem of opening it?
>
> >> Python says it's not, and I tend to believe it.
>
> > You are right!
>
> > I tried to do the same exact openign via IDLE enviroment and i goth
> > the encoding of the file from there!
>
> >>>> open("d:\\test\\index.php" ,'r')
> > <_io.TextIOWrapper name='d:\\test\\index.php' encoding='cp1253'>
>
> > Thats why in the error in my previous post it said
> > File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode
> > it tried to use the cp1253 encoding.
>
> > But now sicne Python as we see can undestand the nature of the
> > encoding what causing it not to open the file?
>
> It doesn't. You have to tell.

Why it doesn't? The idle response designates that it knows that file
encoding is in "cp1253" which means it can identify it.

*If* the file uses cp1253 you can open it with
>
> open(..., encoding="cp1253")
>
> Note that if the file is not in cp1253 python will still happily open it as
> long as it doesn't contain the following bytes:
>
> >>> for i in range(256):
>
> ...     try: chr(i).decode("cp1253") and None
> ...     except: print i
> ...
> 129
> 136
> 138
> 140
> 141
> 142
> 143
> 144
> 152
> 154
> 156
> 157
> 158
> 159
> 170
> 210
> 255
>
> Peter

I'm afraid it does because whn i tried:

f = open(src_f, 'r', encoding="cp1253" )

i got the same error again.....what are those characters?Dont they
belong too tot he same weird 'cp1253' encoding? Why compiler cant open
them?

From: Νίκος on
Please tell me that no matter what weird charhs has inside ic an still
open thosie fiels and make the neccessary replacements.
From: Peter Otten on
Νίκος wrote:

> Please tell me that no matter what weird charhs has inside ic an still
> open thosie fiels and make the neccessary replacements.

Go back to 2.6 for the moment and defer learning about unicode until you're
done with the conversion job.
From: Νίκος on
On 9 Αύγ, 19:21, Peter Otten <__pete...(a)web.de> wrote:
> Νίκος wrote:
> > Please tell me that no matter what weird charhs has inside ic an still
> > open thosie fiels and make the neccessary replacements.
>
> Go back to 2.6 for the moment and defer learning about unicode until you're
> done with the conversion job.

You are correct again! 3.2 caused the problem, i switched to 2.7 and
now i donyt have that problem anymore. File is openign okey!

it ALMOST convert correctly!

# replace tags
print ( 'replacing php tags and contents within' )
src_data = re.sub( '<\?(.*?)\?>', '', src_data )

it only convert the first instance of php tages and not the rest?
But why?
From: Νίκος on
On 8 Αύγ, 20:29, John S <jstrick...(a)gmail.com> wrote:

> When replacing text in an HTML document with re.sub, you want to use
> the re.S (singleline) option; otherwise your pattern won't match when
> the opening tag is on one line and the closing is on another.

Thats exactly the problem iam facing now with this statement.

src_data = re.sub( '<\?(.*?)\?>', '', src_data )

you mean i have to switch it like this?

src_data = re.S ( '<\?(.*?)\?>', '', src_data ) ?