Archive

Archive for the ‘W3C’ Category

Zip files and Encoding – I hate you.

December 8th, 2008

I’ve written about some of the issues with depending on zip as a packaging format in the past. As people know, Web Apps is depending on Zip as the packaging format for Widgets.

Zip the good

Zip has a lot going for it. It is ubiquitous and dependable… so long as you don’t want to share files across cultures.

Zip the bad

The Zip spec does not seem to know that there are normalization models for UTF-8, when there are actually 4 (or more, because there is some non-standard ones too!). The Zip file gives no guidance as to how file names inside zip files are to be normalized.

Consider, when a zip file is created on Linux, it just writes the bytes for the file name in the encoding of the underlying file system. So, if the file system is in ISO-8859-1, the bytes are written in ISO-8859-1. This may seem ok, but when you decompress the zip file on Windows, which runs on encoding Windows-1252, the file names get all mangled. If the underlying encoding of the file system on Linux is something else, you won’t be able to share files with other systems at all. So in this case, it is not Window’s fault.

The Zip spec says that the only supported encodings are CP437 and UTF-8, but everyone has ignored that. Implementers just encode file names however they want (usually byte for byte as they are in the OS… see table below).

It gets worst! because MacOS runs on some weird non-standard decomposed Unicode mode, you can only share zip files with other MacOs users. According to this email, the LimeWire guys also ran into a similar problem with regards to encodings in MacOS:

“for example a French, German or Spanish Windows user cannot exchange files that contain [file names with] French, German or Spanish accents with a French, German or Spanish Macintosh users”

The following table illustrates the problem:

Bytes that represent ñ in a Zip file (in hex)
File name Zip in Windows Zip in Linux Zip in Mac OS
ñ a4 (Extended US-ASCII/CP437) C3 B1 (UTF-8 NFC) 6E CC 83 (UTF-8 NFD)

Yes! holly crap! three different byte sequences corresponding to different character encodings.

The only way around this would be a *special* custom-built widget zipping tool that normalizes file name strings to NFC. If the widget engine needs to decompress the widget to disk, then it would take the NFC and convert them to the operating system’s native encoding (or store the files in memory, and reference them that way). This affects the URI scheme and DOM normalization of Widgets, so Web Apps will have to deal with it eventually… but not sure exactly how.

W3C, Widgets , , , ,

IE8 XDomainRequest conspiracy theory

June 18th, 2008

UPDATE: This conspiracy theory has been debunked. Microsoft said they would implement various aspects of the access-control spec in IE8. For what it’s worth, those Microsoft guys are ok with me :)

I love conspiracy theories… particularly when I get to make one up! Here is my conspiracy theory for how Microsoft will try to force both the W3C and other browser makers to adopt IE8’s XDomainRequest mechanism/API.

A bit of background first: the Web Applications Working Group (WAF) has been working on a spec that allows browsers to do cross-domain requests (basically for creating mashups securely). The spec is called Access-Control, and has been in development for three years. The spec was being edited by Anne van Kesteren of Opera Software, but under heavy influence from Hixie of Google, Jonas Sicking from Mozilla, and Maciej Stachowiak from Apple, to name a few people/companies. Marc Silbey, the representative from Microsoft to the working group, was also participating for a while, but he dropped off the radar as Microsoft shifted into high gear during development of IE8 (actually, Microsoft assigned 3 people to participate in WAF, but only Marc did). A few months ago, to coincide with the release of the IE8 beta, Microsoft announced XDomainRequest… aspects of which look, in a lot of ways, very similar to Access-Control, but with some key differences.Then, to the shock of the working group, they brought XDomainRequest to the W3C for standardization knowing full well that WAF had been working on Access-Control for over three years!

Naturally, Microsoft’s actions pissed a lot of people off because, as I stated in an email, they are just ignoring over three years of work into the Access-Control spec, they created their own proposal and implementation in secret and now are attempting to fast track it through standardization ignoring due process.

To which, Sunava Dutta, from Microsoft, responded by saying “incorrect” and prompting Chris Wilson, Chief Architect of IE, to respond:

You know, there is an idea that perhaps we’re not IGNORING the work on Access Control, and perhaps we simply disagree with some of it.

Which prompted me to respond:

…If Microsoft would have found the time to collaborate [in the WAF WG], all this stuff could have been resolved progressively and the [Access-Control] spec would probably be done by now (as has been shown, the MS proposal has just as many issues, if not more, than the Access-control spec; so trying to do it in-house did not yield a more adequate solution).

Which beckons the question, why did Microsoft stop participating in WAF to go off and create their own version of access-control? And here is the conspiracy theory:

  1. Microsoft joins the WAF working group in 2007
  2. Microsoft “borrows” Access-Control idea
  3. Microsoft implements its own XDomainRequest mechanism in IE8beta
  4. Mozilla implements Access-Contol in FireFox 3, but then pulls the feature at the last minute (consequently leaving a gap in the cross-domain request space for Microsoft to jump in)
  5. Microsoft delays Access-Control work by sending in comments a year late (just before it was about to go to Last Call) and putting in their XDomainRequest proposal for standardization. Meanwhile…
  6. Microsoft rolls out IE8, quickly gains market share (no help from Vista, of course :) )
  7. Other browsers must now implement Microsoft’s solution/spec because business and developers start using it
  8. Microsoft’s spec become a W3C Recommendation, Access-Control spec dies in the ass.

We are currently at point 5, with Microsoft using delay tactics to slow down standardization of Access-Control.

Why do I care? I’ve only contributed to Access-Control from the sidelines by attending face-to-face meetings and asking Anne dumb questions. However, a lot of C02 has been wasted flying everyone to meetings to talk about this spec; that’s thousands of dollars and thousands of kilos of C02 going to waste. Another thing that annoys me is, as I already stated, that Microsoft has every chance to provide feedback to the working group to fix/discuss any issues they’ve had with the Access-Control spec.

W3C, WAF-WG , , , , , , , ,

Widget spec is now Widget Specs

March 7th, 2008

In an effort to expedite the standardization of widgets, the Web Application Formats Working Group yesterday decided to split the Widgets 1.0 Specification into three (or more) specs:

Other specs may also follow, particularly:

Other documents are still under development too:

We are aiming to have all these done (ie. Last Call) by October. However, now that the document split has happened, I should be able to get the packaging format done fairly quickly.

We have more or less now settled on the configuration language format. The elements are going to be:

  • <widget width=”" height=”" id=”">
    • <title: the title/name of a widget
    • <description> a description
    • <author email=”" url=”"> some details about the author
    • <license> paste your GPL here! :)
    • <icon src=”"> the icon
    • <access network=”true|false” plugins=”true|false”> if your widget need to get online
    • <content src=”"> some file in the widget archive

Only <widget> and <content> are mandatory at this point.

The processing model for the XML is going to be quite forgiving. The only thing that will cause an error, is not having a well-formed document.  For example, the following the following would result in “The Awesome Super Dude Widget” as the title:

<widget xmlns="http://www.w3.org/ns/widgets">
   <title>
     The <blink>Awesome</blink> 
     <author email="dude@example.com">Super Dude</author> Widget</title>
</widget>

The unrecognized elements are simply ignored, but their text content is extracted. This makes processing more forgiving and allows for extensibility and some graceful degradation. I also want to push that the widget should function if the namespace is omitted.

We are also currently investigating how we are going to deal with internationalization in the configuration document format. We are looking at following ideas from the Best Practices for XML Internationalization.

PhD, W3C, WAF-WG, Widgets

WAF and WebAPI are dead. Long Live WebApps Working Group!

December 19th, 2007

The charters of both  the W3C Web Application Formats and WebAPI Working Groups have now expired (as of the 15th of November, 2007) meaning they are effectively dead (although still twitching!). From their ashes will rise a new merged working group called the Web Applications Working group… hopefully by the 31 of January.

According to the new proposed charter, the missions of the new working group is to:

…is to provide specifications that enable improved client-side application development on the Web, including specifications both for application programming interfaces (APIs) for client-side development and for markup vocabularies for describing and controlling client-side application behavior.

The new Web Applications Working Group is chartered with the continual development of the following specifications:

Specification FPWD LC CR PR Rec
ClipOps spec 2007-Q2 2008-Q4 2009-Q2 2009-Q4 2010
DOM 3 Core bis spec          
DOM 3 Events spec 2007-Q2 2008-Q2 2008-Q4 2009-Q4 2010
Element Traversal spec 2007-Q2 2007-Q4 2008-Q2 2008-Q4 2008
Access Control spec 2006-Q2 2008-Q1 2008-Q3 2009-Q4 2010
File Upload spec 2007-Q2 2008-Q2 2008-Q4 2009-Q4 2010
Language Bindings spec 2007-Q2 2008-Q2 2008-Q4 2009-Q4 2010
MAXIM spec 2008-Q1 2008-Q3 2008-Q4 2009-Q2 2009
Network API spec 2008-Q2 2009-Q1 2009-Q3 2010-Q2 2010
Progress Events spec 2007-Q2 2008-Q2 2008-Q3 2009-Q2 2009
Selectors API spec 2007-Q2 2007-Q4 2008-Q2 2008-Q4 2008
XHR Object spec 2007-Q2 2008-Q2 2008-Q4 2009-Q4 2010
Widgets spec 2006-Q4 2008-Q4 2009-Q1 2009-Q3 2009-Q4
Widgets Requirements 2006-Q3 2008-Q4 2009-Q1 2009-Q3 2009-Q4
Window Object spec 2007-Q2 2008-Q2 2008-Q4 2009-Q4 2010
XBL2 spec 2006-Q2 2010 2011 2013 2013
XBL2 Primer 2007-Q3 2010 2011 2013 2013

Another cool thing about the new working group is that it is modeled on the HTML Working Group, meaning that is open, transparent (no secret chats on the members list) and anyone will be able to participate via the public mailing list.

I’ll continue to edit the Widget Spec and Requirements, and possibly continue to help out with the XBL Primer.  I’ll continue to be part of this new working group for a least 1 year, as I my PhD program ends in March 2009… and hopefully longer, if someone gives me a job to continue working on specs! ;)

PhD, W3C, WAF-WG, Widgets, Work

HTML5 to be published by W3C

December 19th, 2007

According to this email by Dan Connolly (HTML-WG chair), HTML5 will be finally published as a First Public Working Draft (FPWD) by the W3C on the 26th of Feb January 22, 2008. Microsoft has been mainly responsible about stalling the publication of HTML5 because of their concerns over <canvas> and its related graphics API. On various occasions, Microsoft argued that the graphics API was out outside the scope of the HTML WG charter and that they would have to look at the legal implications.

In the email, Dan Connolly wrote:

... and adding 3 months, we get: 2007-11-26 + 3 months = 2007-02-26
for a deadline for publication for the HTML 5 specification.

The W3C Director, Tim Berners-Lee, sees no reason why this
working group should be excused further from the three-month
heartbeat rule, and further, encourages us to publish sooner
if at all possible.

I still think it’s really disappointing that it’s going to take a further two months to publish the document. I was personally wishing it would be published for XMas (a nice present for the web community!). A FPWD is important for both marketing reasons and legal reasons: when a FPWD, all sorts of legal things in the W3C process go into effect. From a marketing perspective, it will be good as lots of media attention. However, from a technical perspective, a FPWD is irrelevant because of the rate at which HTML5 is being edited by Hixie (on a daily, if not hourly basis). The latest draft of the HTML5 document is always available to anyone either via the WHATWG site or theW3C CVS repository.

Update: In a follow-up email, Hixie sees no reason not to publish the document straight away! He writes “Cool. Since we are encouraged to publish sooner rather than later, and since there doesn’t appear to be any reason for us not to publish immediately, I have prepared the document for Working Draft publication.” If we are lucky, we might see the document published for xmas! :)

Update: According this post by to Anne van Kesteren, the publication wheels are now in motion: Mike(tm) Smith sent the request for publication earlier today! Now pending Chris Lilley’s approval… will Chris be the scrooge that ruins christmas?Lets hope not.

Update: No HTML for xmas I am afraid… In this email, Mike(tm) Smith writes, “after discussion with others on the team, the target publication date I’m requesting for the First Public Working Draft of the HTML5 specification is January 22.”

HTML, W3C

Widgets 1.0 (v2)

October 17th, 2007

Today the W3C published the Second Public Working Draft of the Widgets 1.0 Specification. It’s been nearly a year since we published the first public working draft (11 Nov, 2006) and much has changed and been added to the spec (…and it still has a long long way to go yet before it will be finished!). The most notable addition to this version of the spec are in the attempt to standardize a subset of the Zip specification and support for digital signatures using XML Digital Signatures. Unfortunately, a lot of exciting things that are under discussion by those participating in the standardization effort have not made it into this latest draft. For example, we are still trying to work out a nice model for automatic updates, but we should have something drafted up fairly soon.

The main problem I’ve been working on over the last two months is trying to specify a subset of Zip that should be used by widgets. My goal has been to define a subset that is interoperable across all platforms and devices in such a way that it also ensures longevity. As you might imagine, this has proven to be quite a challenge…

The issues with Zip

The Zip file format is what is commonly referred to as a de facto standard: it is not formally specified by any standards body, but of it is so widely implemented that it is interoperable across OSs and devices. This seems great on the surface, but when you try to standardize it, it becomes quite a nightmare. The main issues are these:

  • There are competing Zip specifications and there are many versions of each of the Zip specifications.
  • Different version of the Zip specification are implemented across different platforms and OSs.
  • There are many features in Zip that are desirable (eg. UTF-8 support), but are not widely implemented.
  • Zip is not an “open standard”, it is the property of PKWARE.
  • Zip is periodically updated and PKWARE does not provide any links to previous versions of their specs.

Competing Zip specifications

There are essentially two Zip Specifications that applications make use of: the “official” PKWARE Zip Application Notes and the “unofficial”Info-Zip Application Notes (mostly on Unix). The unofficial notes basically take whatever PKWARE has officially published, and gets modified, or otherwise clarified, by the guys at Info-Zip. In this sense, much of what one finds in the Info-Zip specs is identical to the PKWARE Zip spec. But, because PKWARE actually maintains the official spec, the PKWARE spec is always more up-to-data than what Info-Zip has on its website (for instance, the latests version of Info-zip covers version 6.2.0 of the official Zip spec (26 April 2004); the latest version of Zip is version 6.3.2 which came out in September 2007!, so InfoZip is three years behind PKWARE!).

Problem: Info-zip contains details that pertain to how info-zip works and may not be compatible/interoperable with the PKZip Spec. For example, Info-zip contains details about how to handle Unix permissions, while PKWARE’s Zip spec does not. This might not make the file formats incompatible, but it does make them physically different. You can try this out yourself: zip up a file using Info-Zip’s zip implementation and then zip up the same file using Windows’ Compressed Folders. The results will be different, but you should still be able to decompress the Info-Zip file using Windows’ native Zip implementation.

Different version of the Zip specification are implemented across different platforms, OSs, Specs

Another significant issue form a standardization perspective is that packaging formats are making use of either some Info-Zip spec or some PWARE spec. Significant examples include:

Java/JAR (including WAR and EAR) :
Info-ZIP Application Note 19970311
Open Document Format (ODF):
Info-ZIP Application Note 19970311
Open Office XML – Open Packaging Convention (OOXML-OPC):
PKWARE Zip Application Note (version 6.2.1), but with a bunch of clarifications.
OEBPS Container Format 1.0:
PKWARE Zip Application Note (no explicit version, but at least version 2.0 needed to extract and version 4.5 needed to extract Zip64).

I still have little idea as to what version of the Zip specification is actually implemented on each OS, let alone on mobile devices (information that seems to be quite difficult to come by!). As a result, and after some discussion with Jon Ferraiolo of IBM, I decided to base the Widget Spec on the OEBPS-OCF’s conformance requirements for Zip packages. I was tempted to make the widgets specification conform to the OOXML-OPC spec (put away your tomatoes!) because, in my opinion, the container aspects and conformance requirements are well specified (even if the rest of OOXML is “evil”).

Desirable features in Zip (6.3.2)

There are a number of really cool features in Zip that would make specifying a container format for widgets much better. They include:

  • Strong Encryption (using x.509 digital certificates): basically solves the digital signature problem, I think.
  • UTF-8 support: solves a significant part of the internationalization problem.
  • Zip64: future proofing.

To require widget engines to actually support these features puts a fair bit of strain on makers of widget engines. At this point, we have required that implementers support UTF-8 and Zip64.

Zip is not an open standard

The fact that Zip is proprietary might be something that comes back to bite us on the ass. I’m no lawyer, but there of patents/IPR issues surrounding Zip. I’m also not sure about how PKWARE will feel about WAF specifying a subset of their specification. I’ve emailed PKWARE and informed them of what we are doing and requested that they review the spec. They have responded and said that they will look into it.

Where to from here…

Looking forward, I’d really like to get all the physical and logical packaging stuff done. That includes:

  • Anything Zip related
  • The inter-package addressing model
  • How to handle decompression
  • How to name files in ASCII and UT-8

I’d also really like to nail down the auto-updates model and make sure that the manifest language we are specifying is covers all the common use cases. The security model is the elephant in the room :) No one wants to touch it at this point; but we know its a massive issue. Another massive issue is the APIs… but that’s not something I want to get into now. A big issue for me is internationalization. I’ve been blocked a number of times when I’ve proposed doing internationalization using folders… every widget engine except Opera does it, so I think we should do it too.

W3C, WAF-WG, Widgets ,

Web Directions South Conference

October 7th, 2007

Last week I attended the Web Directions South Conference, in Sydney. I was invited to give a talk on Widgets as part of the conference’s W3C SIG day. Overall, I thought the conference was really good: very well organized with lots of good interesting talks. The slideshow for my talk are now hosted on slideshare:

W3C, Widgets

W3C stops standardization of the declarative format for application and user interfaces (about time!)

September 13th, 2007

Yay! the W3C has canned the work on the Declarative Format for Applications and User Interfaces (DFAUI), putting an end to something that had no way of ever finishing. Of course, you probably have never heard of the DFAUI because the WAF WG never published any documents about it. The idea was to standardized an XML language similar to XAML or Openlaszlo…. but instead, what the WAF-WG got was an input from Nexaweb called XAL. Anyway, the people that were supposed to be editing the document never got very far, and as far as I am concerned, the work they produced was of fairly low quality (that’s not to say my work doesn’t suck!).

These are my random thought on how I think the DFAUI should have been standardized…and why it failed….

Read more…

Rant, W3C, WAF-WG

June Wrap-up

July 6th, 2007

June was a fairly busy month:

July is also going to be pretty intense:

  • will try to get the First Public Working Draft of the XBL Primer by the 13th of July.
  • Presenting my PhD confirmation on the 25th.
  • Going to Melbourne on the 26th to work with Cameron McCormack for two days on a model for the XBL 2.0 Test Suite.

Widgets Requirements (4.0)

July 6th, 2007

Widget Reqs on the W3C homepage

The Fourth Working Draft of the Widgets 1.0 Requirements has now been published at the w3c.

Next come a lot of research into each requirement and getting the normative wording into the spec. I think I will start with the "low hanging fruit" (the manifest) while I research aspects related to persistent storage.

W3C, WAF-WG, Widgets