|
Prev: bubblesort question
Next: (www.stefsclothes.com)supply t-shirt,Sandal,slipper,jersey,nike shoes(paypal accept)
From: Mark Space on 19 Jul 2008 20:30 Stefan Ram wrote: > ram(a)zedat.fu-berlin.de (Stefan Ram) writes: >> new java.io.InputStreamReader >> ( httpURLConnection.getInputStream(), "UTF-8" ); > > A more specific question: > > Shouldn't I use the document encoding instead of �UTF-8�? The default for HTTP is "8859_1" (that's the Java charset name). There's a special protocol for negotiating a different charset, which you won't support because your get is to primitive. The server will either send you 8859.1 if it can, or it'll close the connection, I think.
From: Mark Space on 19 Jul 2008 20:43 Mark Space wrote: > Stefan Ram wrote: >> ram(a)zedat.fu-berlin.de (Stefan Ram) writes: >>> new java.io.InputStreamReader >>> ( httpURLConnection.getInputStream(), "UTF-8" ); >> >> A more specific question: >> >> Shouldn't I use the document encoding instead of �UTF-8�? > > The default for HTTP is "8859_1" (that's the Java charset name). There's > a special protocol for negotiating a different charset, which you won't > support because your get is to primitive. > > The server will either send you 8859.1 if it can, or it'll close the > connection, I think. P.S. the openStream() method for URL seems to open the type of connection you need directly. BufferedReader bin = null; URL url = new URL( arg[0] ); bin = new BufferedReader( new InputStreamReader( url.openStream() )); I think. Better check that. It's fewer lines though.
From: Arne Vajhøj on 19 Jul 2008 22:14 Mark Space wrote: > Stefan Ram wrote: >> ram(a)zedat.fu-berlin.de (Stefan Ram) writes: >>> new java.io.InputStreamReader >>> ( httpURLConnection.getInputStream(), "UTF-8" ); >> >> A more specific question: >> >> Shouldn't I use the document encoding instead of �UTF-8�? > > The default for HTTP is "8859_1" (that's the Java charset name). There's > a special protocol for negotiating a different charset, which you won't > support because your get is to primitive. > > The server will either send you 8859.1 if it can, or it'll close the > connection, I think. What ? HttpURLConnection and its InputStream fetches bytes from the server. No negotiations possible. When the client needs to interpret the bytes it needs to decide on an encoding. The code snippet above creates an InputStreamReader expecting UTF-8 encoding. If it is known that is the encoding then it is fine. If the encoding is unknown it should be based on HTTP header and HTML META tag info. There are no default ISO-8859-1 in neither HTTP or Java. HTTP is always explicit and Java default is system specific. Arne
From: Mark Space on 19 Jul 2008 22:40 Arne Vajh�j wrote: > > HttpURLConnection and its InputStream fetches bytes from the > server. No negotiations possible. I think that's what I'm saying. Although I'm no longer sure that HttpURLConnection doesn't fully support HTTP character sets. It might. > There are no default ISO-8859-1 in neither HTTP or Java. HTTP is > always explicit and Java default is system specific. For a socket, yes, there is no default encoding. For HTTP, I think that is not true. 8859-1 is the default if nothing is specified, and it is legal to leave out the charset encoding -- in both the GET and the response. I think, anyway. I could be all wrong about that. Stefan has a valid question: If the content type isn't specified until you read the header, and you don't know the content type, how do you know what to open the stream as? The answer I think is that it's defined to be 8859-1 by default. Let me see if I can dig something up... Content Negotiation for HTTP: <http://en.wikipedia.org/wiki/Content_negotiation> Some info on "Missing Charset" in the RFC: <http://tools.ietf.org/html/rfc2616> Search for 8859. Back to Java: Also, URLConnection() looks like it will allow one to read things like the content type and mime type before getting a Java InputStream to the content: URLConnection c = url.openConnection(); String mimeType = c.getContentType(); System.out.println( mimeType ); And similarly for getContentEncoding(); I gotta run. I hope I didn't booger things up too badly replying to Stefan. Apologies if I did.
From: Arne Vajhøj on 19 Jul 2008 22:51 Mark Space wrote: > Arne Vajh�j wrote: >> There are no default ISO-8859-1 in neither HTTP or Java. HTTP is >> always explicit and Java default is system specific. > > For a socket, yes, there is no default encoding. For HTTP, I think that > is not true. 8859-1 is the default if nothing is specified, and it is > legal to leave out the charset encoding -- in both the GET and the > response. > Let me see if I can dig something up... > > Content Negotiation for HTTP: > <http://en.wikipedia.org/wiki/Content_negotiation> > > Some info on "Missing Charset" in the RFC: > <http://tools.ietf.org/html/rfc2616> > Search for 8859. You are right. If nothing is specified it means ISO-8859-1. Which is rather bad since the world is moving from ISO-8859-1 to UTF-8. > Stefan has a valid question: If the content type isn't specified until > you read the header, and you don't know the content type, how do you > know what to open the stream as? The answer I think is that it's > defined to be 8859-1 by default. > > Back to Java: Also, URLConnection() looks like it will allow one to read > things like the content type and mime type before getting a Java > InputStream to the content: > > URLConnection c = url.openConnection(); > String mimeType = c.getContentType(); > System.out.println( mimeType ); > > And similarly for getContentEncoding(); Encoding in HTTP header is easy, because the headers are US-ASCII, so the client can read the headers and determine the encoding before reading the body. Encoding in HTML META tag is not so nice. Arne
|
Next
|
Last
Pages: 1 2 Prev: bubblesort question Next: (www.stefsclothes.com)supply t-shirt,Sandal,slipper,jersey,nike shoes(paypal accept) |