<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
	xmlns:media="http://search.yahoo.com/mrss/"
	>
<channel>
	<title>Comments on: Zip files and Encoding &#8211; I hate you.</title>
	<atom:link href="http://datadriven.com.au/2008/12/zip-files-and-encoding-i-hate-you/feed/" rel="self" type="application/rss+xml" />
	<link>http://datadriven.com.au/2008/12/zip-files-and-encoding-i-hate-you/</link>
	<description>Marcos Caceres&#039; ramblings about stuff</description>
	<lastBuildDate>Mon, 24 Aug 2009 10:49:17 -0700</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: gobi</title>
		<link>http://datadriven.com.au/2008/12/zip-files-and-encoding-i-hate-you/comment-page-1/#comment-69</link>
		<dc:creator>gobi</dc:creator>
		<pubDate>Mon, 24 Aug 2009 10:49:17 +0000</pubDate>
		<guid isPermaLink="false">http://datadriven.com.au/?p=112#comment-69</guid>
		<description>The unzip command has -O and -I options to specify source filename encodings.

If archive was done on Windows u use -O option some thing like:

unzip -O sjis yourarchive.zip

-I option is used if you archived it on Linux/Unix with diffirent option.</description>
		<content:encoded><![CDATA[<p>The unzip command has -O and -I options to specify source filename encodings.</p>
<p>If archive was done on Windows u use -O option some thing like:</p>
<p>unzip -O sjis yourarchive.zip</p>
<p>-I option is used if you archived it on Linux/Unix with diffirent option.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: O. Andersen</title>
		<link>http://datadriven.com.au/2008/12/zip-files-and-encoding-i-hate-you/comment-page-1/#comment-65</link>
		<dc:creator>O. Andersen</dc:creator>
		<pubDate>Fri, 24 Jul 2009 14:15:55 +0000</pubDate>
		<guid isPermaLink="false">http://datadriven.com.au/?p=112#comment-65</guid>
		<description>Windows-1252 is actually a superset of ISO 8859-1 (disregarding control characters which will not appear in file names anyway), so your example is technically incorrect: encoding as ISO 8859-1 and decoding as Windows-1252 would work perfectly fine.

You later mention CP437 as an encoding used on Windows machines, but also say that &quot;everyone&quot; ignores the specification, which says that only CP437 or UTF-8 should be used. (CP437 is effectively incompatible with ISO 8859-1 and Windows-1252.) I am confused as to what encoding Windows actually uses/assumes. Do some versions use Windows-1252 and others CP437? Please clarify. (Obviously, other encodings must be used for non-Western demographics, as touched upon by another commentator, but let us leave that for now.)</description>
		<content:encoded><![CDATA[<p>Windows-1252 is actually a superset of ISO 8859-1 (disregarding control characters which will not appear in file names anyway), so your example is technically incorrect: encoding as ISO 8859-1 and decoding as Windows-1252 would work perfectly fine.</p>
<p>You later mention CP437 as an encoding used on Windows machines, but also say that &#8220;everyone&#8221; ignores the specification, which says that only CP437 or UTF-8 should be used. (CP437 is effectively incompatible with ISO 8859-1 and Windows-1252.) I am confused as to what encoding Windows actually uses/assumes. Do some versions use Windows-1252 and others CP437? Please clarify. (Obviously, other encodings must be used for non-Western demographics, as touched upon by another commentator, but let us leave that for now.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Christopher Warner</title>
		<link>http://datadriven.com.au/2008/12/zip-files-and-encoding-i-hate-you/comment-page-1/#comment-67</link>
		<dc:creator>Christopher Warner</dc:creator>
		<pubDate>Fri, 08 May 2009 20:38:42 +0000</pubDate>
		<guid isPermaLink="false">http://datadriven.com.au/?p=112#comment-67</guid>
		<description>Blame all the programmers who think that encoding doesn&#039;t matter and refuse to get on the UTF-8 bandwagon even though the rest of the world has long since been on the bus.</description>
		<content:encoded><![CDATA[<p>Blame all the programmers who think that encoding doesn&#8217;t matter and refuse to get on the UTF-8 bandwagon even though the rest of the world has long since been on the bus.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Todd Morrison</title>
		<link>http://datadriven.com.au/2008/12/zip-files-and-encoding-i-hate-you/comment-page-1/#comment-68</link>
		<dc:creator>Todd Morrison</dc:creator>
		<pubDate>Wed, 08 Apr 2009 16:15:27 +0000</pubDate>
		<guid isPermaLink="false">http://datadriven.com.au/?p=112#comment-68</guid>
		<description>Thank you for this research.. i have been facing this problem recently.</description>
		<content:encoded><![CDATA[<p>Thank you for this research.. i have been facing this problem recently.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: noep</title>
		<link>http://datadriven.com.au/2008/12/zip-files-and-encoding-i-hate-you/comment-page-1/#comment-76</link>
		<dc:creator>noep</dc:creator>
		<pubDate>Mon, 08 Dec 2008 18:19:25 +0000</pubDate>
		<guid isPermaLink="false">http://datadriven.com.au/?p=112#comment-76</guid>
		<description>Serves people right for using languages other than english.</description>
		<content:encoded><![CDATA[<p>Serves people right for using languages other than english.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Cary Clark</title>
		<link>http://datadriven.com.au/2008/12/zip-files-and-encoding-i-hate-you/comment-page-1/#comment-75</link>
		<dc:creator>Cary Clark</dc:creator>
		<pubDate>Mon, 08 Dec 2008 17:18:56 +0000</pubDate>
		<guid isPermaLink="false">http://datadriven.com.au/?p=112#comment-75</guid>
		<description>Future zip executables (compressors) should assume filenames are in the system encoding (a very reasonable assumption in my opinion) and convert them to UTF-8 in the created zip files.</description>
		<content:encoded><![CDATA[<p>Future zip executables (compressors) should assume filenames are in the system encoding (a very reasonable assumption in my opinion) and convert them to UTF-8 in the created zip files.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: WAHa.06x36</title>
		<link>http://datadriven.com.au/2008/12/zip-files-and-encoding-i-hate-you/comment-page-1/#comment-74</link>
		<dc:creator>WAHa.06x36</dc:creator>
		<pubDate>Mon, 08 Dec 2008 13:01:27 +0000</pubDate>
		<guid isPermaLink="false">http://datadriven.com.au/?p=112#comment-74</guid>
		<description>Also, tar is a very unflexible and limited format, and not much good for any platform with filesystem metadata of any kind, which is pretty much all of them these days.</description>
		<content:encoded><![CDATA[<p>Also, tar is a very unflexible and limited format, and not much good for any platform with filesystem metadata of any kind, which is pretty much all of them these days.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: WAHa.06x36</title>
		<link>http://datadriven.com.au/2008/12/zip-files-and-encoding-i-hate-you/comment-page-1/#comment-73</link>
		<dc:creator>WAHa.06x36</dc:creator>
		<pubDate>Mon, 08 Dec 2008 12:59:20 +0000</pubDate>
		<guid isPermaLink="false">http://datadriven.com.au/?p=112#comment-73</guid>
		<description>There&#039;s nothing &quot;weird&quot; or &quot;nonstandard&quot; about the OS X Unicode decomposition, it&#039;s just plain NFD as far as I know. Now, if unicode decomposition was the only problem, this would all be trivial.

But the real problem is the already mentioned ISO-8859-1, Windows-1252, CP437, and the as-yet unmentioned Shift_JIS, EUCKR, Big5, ISO-8859:s 2 through 15 or however many there are, and so on, and so on.

Really, the only way to reliably open a zip file is to either ask the user for the character encoding (and he probably doesn&#039;t know), or to try and autodetect it.

I&#039;ve had some success using Mozilla&#039;s universalchardet to open Zip files in http://code.google.com/p/theunarchiver/. A friend is currently also helping getting some of the core code to run on Linux. It&#039;s all Objective-C, though, which will probably scare people off from using it.</description>
		<content:encoded><![CDATA[<p>There&#8217;s nothing &#8220;weird&#8221; or &#8220;nonstandard&#8221; about the OS X Unicode decomposition, it&#8217;s just plain NFD as far as I know. Now, if unicode decomposition was the only problem, this would all be trivial.</p>
<p>But the real problem is the already mentioned ISO-8859-1, Windows-1252, CP437, and the as-yet unmentioned Shift_JIS, EUCKR, Big5, ISO-8859:s 2 through 15 or however many there are, and so on, and so on.</p>
<p>Really, the only way to reliably open a zip file is to either ask the user for the character encoding (and he probably doesn&#8217;t know), or to try and autodetect it.</p>
<p>I&#8217;ve had some success using Mozilla&#8217;s universalchardet to open Zip files in <a href="http://code.google.com/p/theunarchiver/" rel="nofollow">http://code.google.com/p/theunarchiver/</a>. A friend is currently also helping getting some of the core code to run on Linux. It&#8217;s all Objective-C, though, which will probably scare people off from using it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike Seth</title>
		<link>http://datadriven.com.au/2008/12/zip-files-and-encoding-i-hate-you/comment-page-1/#comment-72</link>
		<dc:creator>Mike Seth</dc:creator>
		<pubDate>Mon, 08 Dec 2008 12:49:22 +0000</pubDate>
		<guid isPermaLink="false">http://datadriven.com.au/?p=112#comment-72</guid>
		<description>Use tar(1)?</description>
		<content:encoded><![CDATA[<p>Use tar(1)?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: kL</title>
		<link>http://datadriven.com.au/2008/12/zip-files-and-encoding-i-hate-you/comment-page-1/#comment-71</link>
		<dc:creator>kL</dc:creator>
		<pubDate>Mon, 08 Dec 2008 12:15:48 +0000</pubDate>
		<guid isPermaLink="false">http://datadriven.com.au/?p=112#comment-71</guid>
		<description>I haven&#039;t got much success with tar.bz2 either.

There&#039;s The Unarchiver for Mac OS X which tries to guess encoding of filenames.

Since UTF-8 can be mostly-reliably distinguished from 8-bit encodings, I think it should be required for all decompressors.

And NFD is Mac OS X&#039;s problem, not ZIP&#039;s. If some app tries to use bytes in filenames that system simply does not allow by definition, then that&#039;s bug in the app, and app should be fixed.

I think going forward, all ZIP-dependent specs should require filenames in UTF-8 and forbid applications from relying on any particular Unicode normalization.</description>
		<content:encoded><![CDATA[<p>I haven&#8217;t got much success with tar.bz2 either.</p>
<p>There&#8217;s The Unarchiver for Mac OS X which tries to guess encoding of filenames.</p>
<p>Since UTF-8 can be mostly-reliably distinguished from 8-bit encodings, I think it should be required for all decompressors.</p>
<p>And NFD is Mac OS X&#8217;s problem, not ZIP&#8217;s. If some app tries to use bytes in filenames that system simply does not allow by definition, then that&#8217;s bug in the app, and app should be fixed.</p>
<p>I think going forward, all ZIP-dependent specs should require filenames in UTF-8 and forbid applications from relying on any particular Unicode normalization.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
