<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Writing games in Korean &#8211; part 3</title>
	<atom:link href="http://t-machine.org/index.php/2008/04/21/writing-games-in-korean-part-3/feed/" rel="self" type="application/rss+xml" />
	<link>http://t-machine.org/index.php/2008/04/21/writing-games-in-korean-part-3/</link>
	<description>Internet Gaming, Computer Games, Technology, MMO, and Web 2.0</description>
	<lastBuildDate>Thu, 29 Jul 2010 16:26:58 +0100</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: adam</title>
		<link>http://t-machine.org/index.php/2008/04/21/writing-games-in-korean-part-3/comment-page-1/#comment-1753</link>
		<dc:creator>adam</dc:creator>
		<pubDate>Sat, 03 May 2008 23:38:29 +0000</pubDate>
		<guid isPermaLink="false">http://t-machine.org/?p=160#comment-1753</guid>
		<description>Thanks, guys, for the good advice!

@dom: of course, as you say, the parser can&#039;t even tell by that point. I keep foolishly half-expecting stream objects in java to reveal the history of their encoding(s) via runtime methods, but they don&#039;t. It would help with debugging and with checking incoming data from external sources. I think there&#039;s a bit too much transparecny here - since character data is almost certainly a point of interface with external systems, more debug info would be good.</description>
		<content:encoded><![CDATA[<p>Thanks, guys, for the good advice!</p>
<p>@dom: of course, as you say, the parser can&#8217;t even tell by that point. I keep foolishly half-expecting stream objects in java to reveal the history of their encoding(s) via runtime methods, but they don&#8217;t. It would help with debugging and with checking incoming data from external sources. I think there&#8217;s a bit too much transparecny here &#8211; since character data is almost certainly a point of interface with external systems, more debug info would be good.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dominic Mitchell</title>
		<link>http://t-machine.org/index.php/2008/04/21/writing-games-in-korean-part-3/comment-page-1/#comment-1751</link>
		<dc:creator>Dominic Mitchell</dc:creator>
		<pubDate>Sat, 03 May 2008 18:43:11 +0000</pubDate>
		<guid isPermaLink="false">http://t-machine.org/?p=160#comment-1751</guid>
		<description>I&#039;ve learned the hard way that any time you say &quot;read a text file&quot; and don&#039;t &lt;em&gt;explicitly&lt;/em&gt; say what encoding to use, you&#039;re in for trouble sooner or later.  Unfortunately, &lt;a href=&quot;http://java.sun.com/j2se/1.5.0/docs/api/java/io/FileReader.html&quot; rel=&quot;nofollow&quot;&gt;FileReader&lt;/a&gt; doesn&#039;t appear to support that, so you have to be a bit more explicit.  As elFarto mentions above, you need to wrap a FileInputStream using &lt;a href=&quot;http://java.sun.com/j2se/1.5.0/docs/api/java/io/InputStreamReader.html#InputStreamReader(java.io.InputStream,%20java.lang.String)&quot; rel=&quot;nofollow&quot;&gt;InputStreamReader&lt;/a&gt;, and specify the encoding.  Then you need to wrap the whole lot in a BufferedReader to get any performance out of it.  It&#039;s a pretty nasty concoction, but hey, it&#039;s flexible.

&lt;code&gt;new BufferedReader(new InputStreamReader(new FileInputStream(&quot;someFile&quot;), &quot;UTF-8&quot;));&lt;/code&gt;

The other thing about the xml parser not complaining that the encoding doesn&#039;t match: well, it can&#039;t.  Because you&#039;ve provided a Reader, instead of an InputStream to parse, you are sending characters not bytes into the parser.  It can&#039;t know anything about the encoding because the transformation from bytes to characters has already occurred.  Whereas if you pass an InputStream, the whole &quot;detect the encoding&quot; work of the XML parser takes place as expected.  Like elFarto says, using an InputStream is probably a better idea here.  And UTF-8?  Well, if you&#039;re dealing exclusively with Korean characters, you may wish to consider UTF-16 for space reasons.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve learned the hard way that any time you say &#8220;read a text file&#8221; and don&#8217;t <em>explicitly</em> say what encoding to use, you&#8217;re in for trouble sooner or later.  Unfortunately, <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/io/FileReader.html" onclick="javascript:pageTracker._trackPageview('/outbound/comment/http://java.sun.com/j2se/1.5.0/docs/api/java/io/FileReader.html');" rel="nofollow">FileReader</a> doesn&#8217;t appear to support that, so you have to be a bit more explicit.  As elFarto mentions above, you need to wrap a FileInputStream using <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/io/InputStreamReader.html#InputStreamReader(java.io.InputStream,%20java.lang.String)" onclick="javascript:pageTracker._trackPageview('/outbound/comment/http://java.sun.com/j2se/1.5.0/docs/api/java/io/InputStreamReader.html#InputStreamReader(java.io.InputStream,%20java.lang.String)');" rel="nofollow">InputStreamReader</a>, and specify the encoding.  Then you need to wrap the whole lot in a BufferedReader to get any performance out of it.  It&#8217;s a pretty nasty concoction, but hey, it&#8217;s flexible.</p>
<p><code>new BufferedReader(new InputStreamReader(new FileInputStream("someFile"), "UTF-8"));</code></p>
<p>The other thing about the xml parser not complaining that the encoding doesn&#8217;t match: well, it can&#8217;t.  Because you&#8217;ve provided a Reader, instead of an InputStream to parse, you are sending characters not bytes into the parser.  It can&#8217;t know anything about the encoding because the transformation from bytes to characters has already occurred.  Whereas if you pass an InputStream, the whole &#8220;detect the encoding&#8221; work of the XML parser takes place as expected.  Like elFarto says, using an InputStream is probably a better idea here.  And UTF-8?  Well, if you&#8217;re dealing exclusively with Korean characters, you may wish to consider UTF-16 for space reasons.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: elFarto</title>
		<link>http://t-machine.org/index.php/2008/04/21/writing-games-in-korean-part-3/comment-page-1/#comment-1737</link>
		<dc:creator>elFarto</dc:creator>
		<pubDate>Wed, 23 Apr 2008 21:32:24 +0000</pubDate>
		<guid isPermaLink="false">http://t-machine.org/?p=160#comment-1737</guid>
		<description>new FileReader(file)

is identical to doing:

new InputStreamReader(new FileInputStream(file))

But this will attempt to use the platform default encoding to read the file (which is generally CP-1252 on Windows). Always use InputStreams when reading XML, as XML comes with it&#039;s own method for determining character encoding, which Readers don&#039;t understand or follow.

If you want a buffered FileInputStream, just wrap it in a BufferedInputStream.

I&#039;ve had &quot;Content is not allowed in prolog.&quot; when I was attempting to use Notepad to create a UTF-8 XML file. Notepad helpfully prepends a BOM (Byte Ordering Mark) to the beginning, which isn&#039;t allowed in XML.

I just did a quick test in Eclipse, and it&#039;s magic automatically re-encoding the file feature is broken for XML. The  bit must be the first thing in the file, and it must be in ASCII. When selecting UTF-16 it encodes the whole file, which is wrong.

Use UTF-8 and InputStreams and you should be fine.

Regards
elFarto</description>
		<content:encoded><![CDATA[<p>new FileReader(file)</p>
<p>is identical to doing:</p>
<p>new InputStreamReader(new FileInputStream(file))</p>
<p>But this will attempt to use the platform default encoding to read the file (which is generally CP-1252 on Windows). Always use InputStreams when reading XML, as XML comes with it&#8217;s own method for determining character encoding, which Readers don&#8217;t understand or follow.</p>
<p>If you want a buffered FileInputStream, just wrap it in a BufferedInputStream.</p>
<p>I&#8217;ve had &#8220;Content is not allowed in prolog.&#8221; when I was attempting to use Notepad to create a UTF-8 XML file. Notepad helpfully prepends a BOM (Byte Ordering Mark) to the beginning, which isn&#8217;t allowed in XML.</p>
<p>I just did a quick test in Eclipse, and it&#8217;s magic automatically re-encoding the file feature is broken for XML. The  bit must be the first thing in the file, and it must be in ASCII. When selecting UTF-16 it encodes the whole file, which is wrong.</p>
<p>Use UTF-8 and InputStreams and you should be fine.</p>
<p>Regards<br />
elFarto</p>
]]></content:encoded>
	</item>
</channel>
</rss>
