- Costs stupid amounts of money
- Isn’t very good
- Is only available on windows
- …but “usually” works
- Is completely free
- Is an almost exact clone of Microsoft Office circa Office 2000/XP
- Is open-source (so that sometimes you can easily fix it yourself, quite surprisingly!)
- …but apart from the Microsoft Word part, “often doesn’t work”
I’ve been using OpenOffice’s Word clone as a complete replacement for Word for the past 3 years, and it’s been perfect. Previously, I used to use Word *and* OpenOffice, because the latter had some big bugs left in it.
Sadly, OpenOffice’s Excel clone is … often shockingly buggy. I won’t go into the details. But this post is about one missing/broken feature in particular: OpenOffice by default saves / exports XML which (for most people, and all simple uses) is unusable/unreadable – and is very hard to convert with XSLT. Read on for a script that will fix this for you…
An Obsession, with Compression…
In a well-intentioned-but-actually-stupid attempt to make the filesize slightly smaller, it implements a trivial form of compression that – in any imperative programming language – would be easy to decompress.
Sadly, the primary language for XML processing is XSLT – which is a functional language. Functional languages fundamentally (I keep using this word, but it’s true – this is a core part of the theory of the basic precepts of the language!) *cannot* handle that form of decompression. By design. By mathematical proof.
NB: you can workaround this, of course, but doing so within a functional language is difficult for most people.
There is no option in OO to “not use this optimization”. The file format for OO requires that the output XML file be compressed as a ZIP file anyway! So this compression trick is pointless!, maybe? (unless you are using non-XSLT code to process it. Which is pretty sad, since XSLT/XML are designed to be used together)
Several people have tried to provide exporters for OO to get around this (hmm. It’s a tad ironic that the built-in file-format is so hard to work with that people have to write custom exporters instead). Unfortunately, most of these exporters WILL FAIL on any real spreadsheet and WILL CORRUPT THE DATA, because the exporter-authors didn’t look carefully enough at the incoming data.
I’ve found one instance of an otherwise quite nice exporter that has the data corruption bug here – http://digitalimprint.com/misc/oooexport/ (NB: I’ve emailed my fixed version to the author of that page – hopefully they will replace their version with the fixed one)
I’ve found one instance of a “fixed” exporter that worksaround some of the data corruption, BUT … is missing the basic features everyone wants of “use the column label as the XML tag”.
And I’ve found loads and loads of broken attempts :(.
FIXED: an XML exporter that seems to work even on large, complex spreadsheets
So … after dusting off my XSLT skills, I’ve taken the former one, rewritten most of it, and here’s a fixed exporter for OpenOffice that will write your spreadsheet out as simple XML. I’ve been using this to convert XL spreadsheets to various things, including custom SQL databases (XLS can be opened directly in OpenOffice; OO exports as XML using my script; I then run a final XSLT from the command line using SAXON which converts the XML into a series of SQL INSERT/CREATE/UPDATE statements, as appropriate).
To install this, you need to follow the instructions at the http://digitalimprint.com/misc/oooexport/ site, but overwrite the “generic_xml.xslt” file that creates with the one provided here: (sorry, I haven’t got around to packaging it myself yet – I’m hoping this gets adopted by the digitalimprint folks!)
Download this file, save it as “generic_xml.xslt”, and copy it over the top of the one that oooexport installed; NB: different Operating Systems will install that file in different locations
I have some unit tests, but I’m trying to work out where to put them online for easy download (I’m almost tempted to start an SF.net project page just to hold them. Unless there’s somewhere better, I’ll probably do that)
Oh, BTW – here’s a small design flaw in OO: when you run an export in OO, any error messages are ignored and replaced with a useless error dialog that just says “writing to file failed” (it doesn’t even give the filename!). Those error messages are hiding on the serial console – if you’re a sufficiently advanced unix sysadmin to know what a serial console is, and how to read it.
Anyway, if you try running the exporter and it fails, now you know where to look…