Category Archives: technology

What price a byline? (Or: what’s wrong with Knol)

A reader criticised my frequent referencing of Wikipedia in my last post, on the basis that everyone knows what WP is and that indeed some of us have Firefox extensions[1] to make quickly consulting it easy. I admitted he had a point, prompting another reader to protest that it doesn’t matter where the links go to, as long as they’re informative and well-written. The degree to which they were both right was strikingly indicative of how far WP has come. Given that it’s so often the first search result on Google for a huge number of queries, making explicit links to it can seem like adding links to for longer semantemes[2]. And the reason I reference it so often is that its collective writing style and usual accuracy is ideal for a quick introduction to what may be unfamiliar ground.

But its status as the #1 go-to place for so many Google queries didn’t go unnoticed in Mountain View. Yesterday Google finally released their long-in-development Knol. A “Knol” is an unnecessary neologism coined by Google to mean a “unit of knowledge”, but seems the basic idea is to compete with Wikipedia on the authoritative content front, by meeting one of the oft-heard (albeit not so much anymore, if only due to exhaustion) criticism of WP: that you can’t trust it because you don’t know who wrote it. Knol’s point of differences with WP are then as follows:

  • You can have more than article on a topic.
  • Articles are signed by their authors
  • Advertising will be displayed, and it will be split with authors.
  • The level of collaborative editing allowed on each article is controlled by the author, as is the licensing.

I’ve been reading through a few of its articles, and what’s striking me is what they’ve lost by not having universal editing. So often WP was compared to the Encyclopedia Brittanica. Knol tries to compromise between the two, but in doing so completely erodes the role of the editor. The person who doesn’t actually write the content, but polishes it to a publishable standard, and makes it consistent with the rest of the corpus of work. Today’s featured Knol is on Migraines and Migraine Management. It’s written by a neurologist, so you know it’s authoritative, and it doesn’t allow public editing, so you know it hasn’t been tampered with.

But compare it with WP’s article on Migraines, and you’ll see just how needing of an editor it is. It’s written as if it’s intended for paper, with non-hyperlinked cross references “Migraine is defined as a headache that [TABLE 2]:”. “[TABLE 2]”, is a JPEG image at reduced size. There’s no reason for that and not an actual HTML table. (Additionally Google, there’s no reason for the images to be inline with the content like that. Consider a Tufte-like layout, where the tables and references and footnotes can go out to the side).

Throughout Knol you’ll find all sort of bad design practice. I swear I saw image hotlinking in one article before. But in particular, a lot the seed articles seem to be HTML dumps of articles already written by medical professionals, like this one. It’s closed collaboration, so unlike WP, you can’t just drop in and quickly format that into something presentable (at present there’s no change in style, the intra-page headings are just capitalised, there’s an odd amount of whitespace, and the page title itself isn’t capitalised).

There’s two big surprises here, given that this is a Google project, and how long it’s been in development there. And if they don’t fix this, I fear an epic failure.

The first is that they’ve provided such an unstructured writing environment. If you’re trying to create a body of high quality written material, there are ways you can structure your editing environment so that content conforms to certain styles and expectations. It’s particularly in Google’s interest to do so, since as they keep telling the SEO world, well-structured HTML and documents are easier for them to index and search. And yet Knol’s featured Migraines article has swathes of tabular content in the un-indexable, un-accessible JPEG format.

The second is much more subtle and can’t be fixed with as much of a technology patch as the first can. Google have failed to realise that often the most expert authors are going to be simultaneously the least equipped to properly format and polish their own documents (whether it be  due to lack of technical skills, or time), and also the least willing to submit their work to editorial changes from the unwashed anonymous masses. The fix for this I think will involve a recognition and separation of the two types of editing that happen on Wikipedia: authoring or fixing of content; and editing for quality control (fixing grammar, spelling, style, adding useful metadata to a document). Then build a system to acknowledge the good editors, not just the good authors. Then encourage authors to allow editorial changes from recognised quality editors. In fact, drop the “closed collaboration” option altogether.

This is even harder than getting good quality content in the first place. Writing is glamorous. Editing isn’t, but it’s so very important. Knol’s only got half the problem solved.

[1] Certainly one of my favourite extensions is the perenially useful Googlepedia, which remixes your Google search results to embed the first returned WP article on the right (It’s particularly nice on widescreen monitors).

[2] So it’s not a directly applicable synonym of ‘word’, but it was the best the thesaurus could give me


The chart-junk of Steve Jobs

On June 9th Apple CEO Steve Jobs will take to the stage in San Francisco to give the keynote address at his company’s 2008 WWDC. Rumours are strong that he’ll be unveiling the second generation (or at least 3G capable) iPhone. He usually does about two big keynotes a year, and they’re almost always worth watching. Undeniably the best showman in the technology industry, he packs his own reality distortion field, a charismatic glow that can convince you of the urgent need to buy Apple’s latest product if you are to live a fulfilled iLife™. The older keynotes for significant product launches like the Macintosh launch, the unveiling of OS X, and the iPhone launch are particularly good examples[1]  worth watching to see how he brings the audience into the palm of his hand. He brings them up to speed with what the company’s been doing. He then starts naturally turning the topic into a story of a missing product in the market. He’ll talk about competitors’ products (if they have any) in that space, and talk about their short-comings, shaping the descriptions so that it becomes increasingly obvious what the perfect example of that product should look like. And then he unveils it, pulling it out of his pocket, pulling a sheet off it, or pulling it out of a manilla envelope. He’s spell-binding, and it’s only obvious what a talent he has when you see other CEOs try to emulate it and fail decidedly (warning: video autoplays).

One of the props he often relies on are graphical charts to show things like how marketshare of an product has been doing, or how a new one compares with competition for performance. Data presentation guru Edward Tufte has an expression for Job’s style: chart-junk. One of the themes Tufte keeps coming back to in his book Visual Explanations (which I’m reading and really quite enjoying at the moment) is that the magician is worth studying, as an example of a performer who acts out a lie. Learn from what he does, says Tufte, and then seek to do the opposite when presenting information. So it’s appropriate when you consider Job’s style how often he’s described as a magician. And the elements of a Stevenote are in some ways closer to a magic show than they are an honest data presentation. It’s sadly to be expected from what is in essence a marketing pitch. I’m always slightly annoyed when good marketing crosses the line into subtle dishonesty, and Jobs really shouldn’t have to engage in it. His products usually advertise on their own merits well enough.

On Tufte’s forum I saw a comment in a discussion on the sins of pie graphs pointing out this photo.

It’s actually horribly blatant. The Apple segment gets the closer perspective, and somehow 19.5% ends up with a bigger surface area than 21.2%.

Wondering if he did this often, I delved off into the Engadget archives of their coverage of other Stevenotes. And found plenty of chart-junk. For each of these, spot the bias introduced by the 3D perspective, and how it always seems to fall in Apple’s favour when there’s a direct comparison to be made.

Above: It’s a performance comparison, so smaller numbers mean the faster software. Safari is Apple’s browser.

These two above aren’t so bad. But the parallax bias is there, and it is in Apple’s favour. Apple’s numbers fall the left of the vertical vanishing line, so get foreshortened.

He’s normally so good about have some numbers on a chart. In their absence, and the presence of a 3D perspective, it’s impossible to regard this chart as anything useful.

Tiger is the latest version of OS X in this case. Panther’s the previous one, and older is everything before that.

In a 2D graph you’ve got two opportunities to skew the data. Growth is good, so if you want to make the numbers at the right hand side of the graph look better, bring that end closer. But additionally in the vertical axis, we see the same parallax bias as the web browser performance comparison charts from above, so that the “Industry” figures get skewed to look even lower than the Mac’s.

This one’s angled and segmented slightly so that the Leopard (the latest version of OS X) area gets magnified, but there’s not much you can do about numbers that different.

This is the blurry successor to the first one. Apple’s the green segment. Again the upward perspective that the Apple segment benefits the most from.

Again, two perspective skews here for bias, and a style bias. The first is the obvious make-the-right-hand-side-closer angle to magnify the latest figures. Which is interesting, since the vertical bias (the vanishing point is in line with the top of the graph) is shrinking those numbers compared to the older ones. Which would appear to be counter-productive, until you read the title and realise that this is a cumulative sales graph. Which is an interesting choice in itself, since cumulative graphs are often hard to get meaningful data out of. They don’t go down after all. Which is why they chose this form for this slide. Growth is slowing. It’s inevitable with a product like the iPod, but there’s ways to soften the news. Combined with the choice of a cumulative graph, the vertical perspective now makes sense. The vertical perspective has the effect of playing down the early growth and maximising the latest, bringing the graph slightly closer to looking linear, than an S-shaped graph that’s beginning to hit the final plateau.

So come June 9 we’ll once again be invited to be spell-bound by Cupertino’s wizard, and expect more chart-junk. It’s fun to be sold a story like this, provided you’re aware that you are being sold one. It’s not a scientific presentation. Jobs is known for prizing aesthetic form, and in a way 3D charts look nicer than 2D. But I would question whether it’s honest to take this consistently a biased approach in presenting your data. And as Tufte asks in Visual Explanations, “Why lie?”

[1] Worth watching for the contrast to these celebrations of Appledom is the infamous Macworld 1997 keynote where he announced the company’s partnership with Microsoft to a congregration (any religious implications from that word are fully intended) that doesn’t buy into his forced enthusiasm. It’s rare to hear actual booing during a “Steve-note”, and the scene was recreated in the 1999 TV movie Pirates of Silicon Valley.

Will WinFS return? Will anyone care?

Guns N’ Roses started recording their ‘upcoming’ album Chinese Democracy in 1994. George Broussard started work on Duke Nukem Forever in 1997. Both titles have become standalone jokes in the music and game industries respectively, commonly regarded as having a release date in the vicinity of Armageddon. In some way, Microsoft has been working on WinFS (on and off, and not always under that name or scope. To be honest, I’m probably not being to fair them in this paragraph) since 1990, promised at some point to be released with multiple versions of Windows, and always pulled before release. When it appeared in the Longhorn betas (the OS that would become Vista), it was slow and incomplete, and the eventual news that it would be pulled from Vista too wasn’t terribly shocking. Vista itself was at risk of becoming a perpetual vapourware joke like Duke Nukem Forever, and after five years development MS was very painfully aware that they needed to get something out the door. So to much jeering at having once again over-promised and under-delivered, one of the three pillars of Vista was dropped.

Not that there wasn’t good reason for taking a long time about it. It was actually a really, really tricky problem they were biting off. Or an entire set of problems. WinFS was Microsoft’s latest iteration on their ongoing attempts to unify (or at least bridge) the concepts of a filesystem and a database. It’s the sort of proposal that automatically intrigue computer scientists, (as can be seen in the many other attempts at it). Why the hard separation between data stored as files, and data stored in a database? Surely the two could be unified, and imagine what it would bring! You could store your music and movies in a normal directory structure, and access them through the normal file access means, but browse them by artist, genre, or with queries such as “All movies directed by Steven Spielberg from the 1990s”.

Why it died

WinFS died as a monolithic entity that integrated into Windows for a number of reasons:

The Web – The concept of an Object Filesystem was something MS had been touting since 1990. It also made more sense back then. In 2006, with the web taking off and obviously becoming the new place for data to live, it didn’t so much. Why bother maintaining your contacts as entries in a local database so you could perform interesting queries on them when Facebook et al could do a better job for the most part, in a centralised location? And if this trend of apps moving to the web continues, then WinFS as a client-side application is weakened drastically as your data moves out of its purview.

Desktop search: good enough – The biggest use case scenario for why WinFS would be awesome inevitably worked out as being desktop search. But when Apple introduced Spotlight for OS X, which was just a simple (compared to what WinFS hoped to achieve) file indexing service it made a mockery of the need for such a complex solution to the problem. The release of Google Desktop for Windows put pressure on this wound. Eventually Microsoft released their own desktop search client for XP, an embarrassing and face-saving move given that file-indexing had already existed as a turned-off-by-default option in XP.

Embedded databases: good enough – The other client-side stories for why WinFS would be a good thing often involved media files. Everyone likes movies and music, and they’ve got lots of meta-data associated with them, like genre, year, length, artists, albums, etc. Lots of ways to order and query a collection. The problem for WinFS was it was never clear why this couldn’t be just as easily handled by an application-specific database. Like iTunes. And the tides were shifting on this front: cross-platform development has become important again. And SQLite is a lot more Mac and Linux-friendly than a new Windows-only API would be. It’s also a lot more existent. Developers like that.

But who can really say what “dead” means?

Jon Udell has an interesting interview [1](for certain definitions of “interesting”) with Quentin Clark, a manager at Microsoft who used to be the project manager on WinFS.

Take-away points:

– Microsoft proceeded to start rolling the WinFS development into its backend software efforts ADO.NET and SQL Server. This isn’t news. And it makes sense in its own way. If web-apps are going to be where we store our data in the future, then the databases backing them are going to become our filesystems in a sense. Although if you think about it that way, then we’re a good way already towards the WinFS vision, and WinFS as it was originally envisioned is once again undermined by good-enough approaches.

In the interview Clark talks about several features in SQL Server 2008 that keep alive the WinFS dream:

– They’ve added some filestream column-type, which reference files on the underlying NTFS partition rather than storing the data itself in the database, which makes for better performance for large binaries.

– They’ve added a hierarchical column-type. You know, hierarchical, like a directory structure.

– They’re going to add a Win32 namespacing function, which will expose the database to Windows as another file storage device you can then browse and do all the usual fun stuff. WinFS by complex stealth. There’s more than one project for Linux that does the same thing through FUSE.

So in short, SQL Server 2008 will/is able to store large files just as well as NTFS is able to. It will be able to describe hierarchial data structures. It will be accessible from the Win32 file-system APIs. It’s pretty much offering all WinFS did, except for the client-side specific schemas (such as contacts, music, etc).

It’s also still as useless for most users.

I think the interesting part about all this (certainly SQL databases are a subject I struggle to get excited about) is that once you examine the WinFS idea from the server end of things, stripped of the client-side schemas and vapourous UIs and dubious use-cases, it’s pretty mundane. That is, it’s everyday stuff. The critical step’s already tak[en/ing] place: we’re moving our own data off the filesystems and into the cloud, where they’re shaped behind the scenes into whatever schema is best for the application, with an interface on the front-end designed to fit that structure. Files can exist in multiple folders in Google Docs. Social networking sites deliver on much and more of the contacts functionality originally promised by WinFS. iTunes structures your music collection as a simple database and collection of playlists built from queries into it (dynamically if you wish). The battle WinFS was going to fight has already been won. The next one is one Microsoft was never going to fight anyway, one for the structure and open exchange of this data.

[1] Via OS News