From: Keith G Hicks on
Yeah, I found that out. I'm kind of stabbing in the dark here. I'm asking
for help and trying to figure things out while waiting. I'm figuring a few
things out but not enough.

I have no way of getting these files in a better format than they already
are. I'm kind of stuck. I need to know how to take a file and change the
encoding to <?xml version="1.0" encoding="ISO-8859-1"?>

If I open the file manually in a tool I have called EditPad Pro I can paste
the above into the header. Then when I save it EditPad asks if I want to
change to the new encoding or not. Works quite well. I also discovered that
if I chagne the header in Notepad the characters I'm having toruble with
actually come out fine after I save it and reopen in XML editor. So that's
why I thought that changing it in vb code would do the same thing. Guess
not. Not sure why it works in Notepad.

So anyone that can help me write code to encode these files properly would
get my sincerest thanks.

Thanks,

Keith


"Scott M." <s-mar(a)nospam.nospam> wrote in message
news:%23aMS$InXKHA.4816(a)TK2MSFTNGP06.phx.gbl...
>
> "Keith G Hicks" <krh(a)comcast.net> wrote in message
> news:%23$$5j2mXKHA.4808(a)TK2MSFTNGP06.phx.gbl...
>> Never mind. I figured it out:
>>
>>
>> Dim TheFileLines As New List(Of String)
>>
>> TheFileLines.AddRange(System.IO.File.ReadAllLines(xmlFilesLocation & "\"
>> &
>> sArticleToPost))
>>
>> TheFileLines.RemoveAt(0)
>>
>> TheFileLines.Insert(0, "<?xml version=""1.0"" encoding=""ISO-8859-1""?>")
>>
>> System.IO.File.WriteAllLines(xmlFilesLocation & "\" & sArticleToPost,
>> TheFileLines.ToArray)
>>
>>
>>
>>
>>
>> "Keith G Hicks" <krh(a)comcast.net> wrote in message
>> news:uMeJV7lXKHA.4360(a)TK2MSFTNGP04.phx.gbl...
>>> Okay, I need to clean up these files. They are coming out of this goofy
>>> system with this header:
>>>
>>> <?xml version=?1.0? encoding=?UTF-8??>
>>>
>>> The quotes around things are not coming in as quotes. And it's not the
>>> correct encoding anyway. It needs to be this:
>>>
>>> <?xml version="1.0" encoding="ISO-8859-1"?>
>>>
>>>
>>> So I guess I need to change the encoding of each file before I can open
>>> it
>>> up as an XML doc and read it there. I have no idea what is the best way
>>> to
>>> do this programmatically in vb.net. Do I need to open with StreamWriter
>>> or
>>> is there an easier way? I can't find anything out there that explains
>>> this
>>> clearly. If I need to do this with streamwriter could someone point me
>>> somewhere that shows how to do this?
>>>
>>> Thanks,
>>>
>>> Keith
>
> You realize that just because you've said what you want the encoding to be
> doesn't mean that the characters are actually encoded that way, right?
>


From: Scott M. on

"Keith G Hicks" <krh(a)comcast.net> wrote in message
news:e2N4BTnXKHA.1236(a)TK2MSFTNGP05.phx.gbl...
> Yeah, I found that out. I'm kind of stabbing in the dark here. I'm asking
> for help and trying to figure things out while waiting. I'm figuring a few
> things out but not enough.
>
> I have no way of getting these files in a better format than they already
> are. I'm kind of stuck. I need to know how to take a file and change the
> encoding to <?xml version="1.0" encoding="ISO-8859-1"?>
>
> If I open the file manually in a tool I have called EditPad Pro I can
> paste the above into the header. Then when I save it EditPad asks if I
> want to change to the new encoding or not. Works quite well. I also
> discovered that if I chagne the header in Notepad the characters I'm
> having toruble with actually come out fine after I save it and reopen in
> XML editor. So that's why I thought that changing it in vb code would do
> the same thing. Guess not. Not sure why it works in Notepad.
>
> So anyone that can help me write code to encode these files properly would
> get my sincerest thanks.
>
> Thanks,
>
> Keith

Keith,

Take a look http://www.15seconds.com/Issue/050616.htm and look at the
XmlWriterSettings section. This is what you want.

-Scott


From: Keith G Hicks on
The first line of the file's I'm getting is fouled up and so I cannot
open/read it at all using any XML features in VB. The first line is not
recognizeable. It's coiming to me saying it's UTF-8 but it's not and the
double quotes in the header are not coming to me as double quotes.

When I use StreamReader, alter the fist line and then save it as a new
file, that almost works but the characters that need to have the correct
encoding actually get changed to something else in the save process. I'm
guessing the stream reader is interpreting them funny and so it doesn't
really matter what I change the header to, the characters themselves change
(I checked in a hex editor to be sure).

So since it works to manually open these files in notepad and simply change
the header to the correct encoding, the characters themselves MUST have the
correct binary values. All that needs to be done is to change that header to
the right encoding without fouling up the characters in the body.

So how can I open the file in the most raw form of text, replace that first
line and save it without changing the characters in question in the process?

I made some progress with this:

Dim sr As New StreamReader(xmlFilesLocation & "\" & sArticleToPost,
Encoding.UTF7)

Dim text As String = sr.ReadToEnd

Dim text2() As String

ReDim text2(1)

text2(0) = text.Replace("<?xml version=1.0 encoding=UTF-8?>", "<?xml
version=""1.0"" encoding=""ISO-8859-1""?>")

System.IO.File.WriteAllLines(xmlFilesLocation & "\x" & sArticleToPost,
text2)


The text2 variable shows the correct characters and when I copy its value
into notepad it's fine. But it doesn't save right. I still get weirder
characters than I want. It's supposed to have characters like N with a
tilde, O with a tilde, O with an accent mark, etc. There are about 6 or 7 I
expect to see in this file. But when I open the newly saved files, those
characters are converted into very strange characters that I'd have to show
you.


I have a question regarding all of this. The encoding header merely tells
the program that's opening the file how to read the characters that are in
it. The characters are of course ultimately stored in binary so the encoding
knows how to interpret the binary into readable characters. If I open a file
using one encoding and the characters look a certain way and then save it
using another, the characters change binary. Is this all true? Am I
understandign this or not? I mean the 0's and 1's that are stored on disk
don't change just cuz of the way you open it. If you open it using one
interpreter (encoding) adn they look this way then open using another
encoding you'll see different characters. that makes sense to me. So the
only way I could see the binary changing is if the encoding used when saving
reinterprets the charcters to different string of 1's and 0's. Yes?

Okay, so when I choose the "encoding" parameter of StreamReader, there are
only about 5 options (UTF-7, UTF-8, UTF-32, ASCII, Default, ...) How do I
tell it I want it to read AND SAVE as ISO-8859-1????

Opening UTF-7 seems to help but OMG when I save using UTF-7 things are a big
mess.


Thanks,

Keith




From: Martin Honnen on
Keith G Hicks wrote:

> Okay, so when I choose the "encoding" parameter of StreamReader, there are
> only about 5 options (UTF-7, UTF-8, UTF-32, ASCII, Default, ...) How do I
> tell it I want it to read AND SAVE as ISO-8859-1????

Encoding.GetEncoding("ISO-8859-1") should give an Encoding instance
allowing you to decode and encode with IS0-8859-1.
And both StreamReader and StreamWriter allow you to specify an encoding,
for instance StreamWriter has
http://msdn.microsoft.com/en-us/library/f5f5x7kt.aspx


--

Martin Honnen --- MVP XML
http://msmvps.com/blogs/martin_honnen/
From: Keith G Hicks on
Yep. I found that out late last night. Thanks. It took quite a bit of
hunting around to figure this out. It's not intuitive. "GetEncoding" sounds
like a read only property. The word "get" is misleading. I finally landed
upon something that I read that explained that it was more like "Set" than
"Get". Now I'm sure they meant that GetEncoding("ISO-8859-1") means to "get"
the encoding of "ISO-8859-1" but that's a bit ambiguous. With
Encoding.GetEncoding as the 3rd param of StreamReader, it also could be
interrpeted as "get the current encoding of the stream".

Thanks for the info.

"Martin Honnen" <mahotrash(a)yahoo.de> wrote in message
news:Ok74qntXKHA.3600(a)TK2MSFTNGP04.phx.gbl...
> Keith G Hicks wrote:
>
>> Okay, so when I choose the "encoding" parameter of StreamReader, there
>> are only about 5 options (UTF-7, UTF-8, UTF-32, ASCII, Default, ...) How
>> do I tell it I want it to read AND SAVE as ISO-8859-1????
>
> Encoding.GetEncoding("ISO-8859-1") should give an Encoding instance
> allowing you to decode and encode with IS0-8859-1.
> And both StreamReader and StreamWriter allow you to specify an encoding,
> for instance StreamWriter has
> http://msdn.microsoft.com/en-us/library/f5f5x7kt.aspx
>
>
> --
>
> Martin Honnen --- MVP XML
> http://msmvps.com/blogs/martin_honnen/