What price a byline? (Or: what’s wrong with Knol)

A reader criticised my frequent referencing of Wikipedia in my last post, on the basis that everyone knows what WP is and that indeed some of us have Firefox extensions[1] to make quickly consulting it easy. I admitted he had a point, prompting another reader to protest that it doesn’t matter where the links go to, as long as they’re informative and well-written. The degree to which they were both right was strikingly indicative of how far WP has come. Given that it’s so often the first search result on Google for a huge number of queries, making explicit links to it can seem like adding links to for longer semantemes[2]. And the reason I reference it so often is that its collective writing style and usual accuracy is ideal for a quick introduction to what may be unfamiliar ground.

But its status as the #1 go-to place for so many Google queries didn’t go unnoticed in Mountain View. Yesterday Google finally released their long-in-development Knol. A “Knol” is an unnecessary neologism coined by Google to mean a “unit of knowledge”, but seems the basic idea is to compete with Wikipedia on the authoritative content front, by meeting one of the oft-heard (albeit not so much anymore, if only due to exhaustion) criticism of WP: that you can’t trust it because you don’t know who wrote it. Knol’s point of differences with WP are then as follows:

  • You can have more than article on a topic.
  • Articles are signed by their authors
  • Advertising will be displayed, and it will be split with authors.
  • The level of collaborative editing allowed on each article is controlled by the author, as is the licensing.

I’ve been reading through a few of its articles, and what’s striking me is what they’ve lost by not having universal editing. So often WP was compared to the Encyclopedia Brittanica. Knol tries to compromise between the two, but in doing so completely erodes the role of the editor. The person who doesn’t actually write the content, but polishes it to a publishable standard, and makes it consistent with the rest of the corpus of work. Today’s featured Knol is on Migraines and Migraine Management. It’s written by a neurologist, so you know it’s authoritative, and it doesn’t allow public editing, so you know it hasn’t been tampered with.

But compare it with WP’s article on Migraines, and you’ll see just how needing of an editor it is. It’s written as if it’s intended for paper, with non-hyperlinked cross references “Migraine is defined as a headache that [TABLE 2]:”. “[TABLE 2]”, is a JPEG image at reduced size. There’s no reason for that and not an actual HTML table. (Additionally Google, there’s no reason for the images to be inline with the content like that. Consider a Tufte-like layout, where the tables and references and footnotes can go out to the side).

Throughout Knol you’ll find all sort of bad design practice. I swear I saw image hotlinking in one article before. But in particular, a lot the seed articles seem to be HTML dumps of articles already written by medical professionals, like this one. It’s closed collaboration, so unlike WP, you can’t just drop in and quickly format that into something presentable (at present there’s no change in style, the intra-page headings are just capitalised, there’s an odd amount of whitespace, and the page title itself isn’t capitalised).

There’s two big surprises here, given that this is a Google project, and how long it’s been in development there. And if they don’t fix this, I fear an epic failure.

The first is that they’ve provided such an unstructured writing environment. If you’re trying to create a body of high quality written material, there are ways you can structure your editing environment so that content conforms to certain styles and expectations. It’s particularly in Google’s interest to do so, since as they keep telling the SEO world, well-structured HTML and documents are easier for them to index and search. And yet Knol’s featured Migraines article has swathes of tabular content in the un-indexable, un-accessible JPEG format.

The second is much more subtle and can’t be fixed with as much of a technology patch as the first can. Google have failed to realise that often the most expert authors are going to be simultaneously the least equipped to properly format and polish their own documents (whether it be  due to lack of technical skills, or time), and also the least willing to submit their work to editorial changes from the unwashed anonymous masses. The fix for this I think will involve a recognition and separation of the two types of editing that happen on Wikipedia: authoring or fixing of content; and editing for quality control (fixing grammar, spelling, style, adding useful metadata to a document). Then build a system to acknowledge the good editors, not just the good authors. Then encourage authors to allow editorial changes from recognised quality editors. In fact, drop the “closed collaboration” option altogether.

This is even harder than getting good quality content in the first place. Writing is glamorous. Editing isn’t, but it’s so very important. Knol’s only got half the problem solved.

[1] Certainly one of my favourite extensions is the perenially useful Googlepedia, which remixes your Google search results to embed the first returned WP article on the right (It’s particularly nice on widescreen monitors).

[2] So it’s not a directly applicable synonym of ‘word’, but it was the best the thesaurus could give me

Using that second display: 4 news visualisations of questionable utility

For both yours and my ever decreasing attention spans, in the race to distinguish and spice up the daily news product, here’s more news, shallower, and faster.

MSNBC Spectra screenshot

Spectra from MSNBC is a pretty terrible romp into 3D. Pretty, but completely unusable and just rather useless. You select what channels of news you want, and as you do a selection of stories from each channel floats into the display as a rotating ring. It wouldn’t be so bad if you could actually click on the floating news items. But no, that does something completely unexpected, it ejects that entire ring of stores. To get to a story you want, you have to navigate via a ridiculous horizontal scrollbar. I thought we had learnt in the 90s that 3D interfaces like this just don’t work. From Information Aesthetics via Data Mining.


Moving from the realms of insanity to just the slightly overwhelming comes Newsmap, based off Google News.

Digg\'s \"Big Spy\" visualization

Digg\'s \"Stack\" visualization

From the very epitome of fickle and populist news rivers comes a selection of cool-looking, fast moving and not really that value-additive visualizations at their Labs section.

Mapped Up screenshot

Finally comes a low-key (and the most embeddable of the lot) Flash widget that just rotates geo-coded stories on a world map.

Graceful Degradation, or Progressive Enhancement?

There’s a question of design philosophies in software that describe two diametrically opposite ways of theoretically getting the same results: Top-down or bottom-up? Traditionally we’re supposed to do the former, designing the big picture first and then filling in the details until we’ve built all the way down from abstracted design to concrete reality. We usually do the latter, building little lego bits and then trying to connect them into a structure approximating the original design.

But in a sense in the world of web application design, where “best practice” isn’t just a moving target but one moving in quite different directions, the opposite is in effect. We’re doing top-down experience design, when we should really be doing bottom up. The distinguishing issue is that on the web, we’re not just creating one design, we’re creating a suggested design that will then be rendered in a whole multitude of ways.

Normal practice in web design/development is to work out what you want to functionally do, then make the call on what technology (Flash, Shockwave (remember that?), Java, AJAX, ActiveX, PDF, or even Silverlight) would be best for making that happen, evaluating the “best” as a measure of time, expense, longevity, security, and market support. And then if time allowed, you started designing fallbacks for clients without those technologies.

Chris Heilmann has done a good job advocating the opposite philosophy of progressive enhancement. This is the philosophy that involves you starting your site/web-app design with the lowest common denominator, and you produce a functional product at that tech level. If it can’t be done, you need a good reason for it to be so. Then you progressively layer on “richer” technology. It’s the humble and unassuming philosophy: you don’t presume more than you must about your user and their circumstances.

They’re two opposing philosophies that theoretically should give the same results. You start high-tech and work backwards, or you start low-tech and move forwards.

The problem that works against this is Hofstadter’s law: Work has a knack of taking longer than you expect. Unexpected new things to work on arise, and then you start budgeting your time and triaging things you shouldn’t. In the first design model, you would design low-bandwidth HTML versions of your all-Flash site. Unless a new feature request came in and you had to implement that first in the Flash. Eventually you just give up and require that your clients all use Flash. Then you wonder why Google isn’t doing such a hot job of indexing your site anymore. Or you bite the bullet and spend a lot of time doing things properly. As soon as you start prioritizing the high-tech experience as the primary and complete version, you’re just constraining yourself against future flexibility. And then you sometimes end up irrationally justify that primary experience in places that shouldn’t really exist.

The positive reasons for progressive enhancement then start flowing out of varied examples. There’s increasing numbers of users who use something like the Flashblock extension (because I’m sick of Flash-based ads, especially the ones that start streaming video, sucking bandwidth without your permission). Similarly, people have taken to using NoScript, an extension that imposes a white-list on allow Javascript. And don’t forget the disabled. Screen readers for the visually-impaired do a really bad job of handling Javascript. So does the Google web spider, for that matter. Or take the iPhone, a suddenly popular new platform that completely eschewed Flash. If you had invested into a site that required Flash, you were inaccessible. If you had built a site around progressive enhancement, you were much more well equipped to support mobile Safari. So adopting a philosophy of progressive enhancement in these cases improves support for niche users, accessibility, search engine coverage, and an unforeseen new platform.

This means things like coding HTML that’s completely functional without Javascript, or Flash. They’re technology it’s often reasonable to assume the average client will have. But unless you can really justify it, you shouldn’t.

It involves things like not using links with href="javascript:doThis()" or onClick event handlers hard coded into their HTML. Instead just give the links decent ids and then add the event handlers dynamically from Javascript. It’s not hard to do, if you do it right the first time.

There are some surprising offenders in this class. Try adding accepting a friend request on Facebook with Javascript turned off. You can’t actually click the button, and there’s no reason that should be so. Why did I run into that?[1] Well, if you’re the site owner, does it matter?

I had a Dynalink switch with firmware that broke the rule too. It used Javascript-powered links for its navigation, instead of plain HTML. I wouldn’t have noticed, if it weren’t for the Javascript not actually working on browsers that weren’t Internet Explorer. There was no earthly reason for those links to use Javascript, and every time I had to load up IE (particularly if it involved a reboot to Windows to do so) just to configure my switch, it didn’t do much for my opinion of Dynalink.

If you’re a web developer and you’re not already doing this or haven’t heard of the idea before, I strongly encourage you to read Chris’ full article on progressive enhancement. If you haven’t, but you’re exercising sound development principles (separation of code and content, observing standards, using semantically sensible markup, designing with accessibility in mind etc) you’re probably already most of the way there. But do skim over it all the same. It’s a descriptive philosophy that successfully captures much of what we should already be doing, but for reasons that fallen under different hats previous.

A more intelligent use of nofollow

Back in February I posted a rather rambling diatribe on the use of rel=’nofollow’ by various websites. I complained that the social news sites like Slashdot were misusing it or being inconsistent, and really it was a wasted resource. Jeff Wang’s noticed that Paul Graham’s Hacker News (it’s a submit-and-vote based news site like Reddit but more specialised towards the tech startup audience) is making a smarter use of it. Simply, stories get nofollowed until they’ve got more than 5 votes, and then they’re let free. It’s a simple heuristic that hopefully gives the best of both worlds: rewarding good links, but still discouraging high volume/low quality/smells like canned ham links.

Tact and keywords

The rise of the laser-beam-narrow targeting allowed by Google AdWords and AdSense has led to some interesting uses. It’s also led to accusations of insensitivity on Google’s part, who explicitly point out they don’t exercise human editorial control over ad placement. But Cameron showed me one last night that leaves me feeling slightly odd, and this one isn’t Google’s “fault” as much as the advertiser.

Campbell Live ad on Google search results page

For the non-NZers, the biggest news item this week here has been a quite tragic accident where six students and a teacher from a high school, Elim Christian College, were swept to their deaths after a flash flood during an outdoor exercise in a gorge. What you see above is an ad using the school’s name as a keyword for a Campbell Live, an evening TV news/interview show, or rather their specific portal page for the subject.

I can see why they did it. They may have done it automatically even, with some system to buy up keywords on common phrases in hot stories, and part of me thinks it’s a good idea. But I still can’t help feeling that this a somewhat tasteless use.

Google’s new broadside against AWS

Apparently Thomas Watson of IBM never actually said in 1943 that the world market only had room for 5 computers. Still, the misattribution’s been favourite fodder for years on lists of short-sighted predictions, along with Bill G’s equally misattributed “Nobody needs more than 640K of memory”.

The funny thing about history is how we’re now at a point where people are actually regarding that first nonquote with fresh regard. Sun’s John Gage, one of their original employees, once famously said that the network is the computer, and in this regard Watson’s nonquote starts to make some sense. To be more specific, substitute “network” with “distributed computing platform”. The idea is simple. Only a few companies have the resources and expertise to maintain an international-scale computing environment that applications can scale across to meet the gigantic range of demand the internet can provide. It’s also the source of a compelling business model to the potential owners of such “computers”.

Amazon Web Services have been the biggest and most prominent push in this direction for some time. Sun did come up with their Sun Grid product, but it was a dud by most accounts. Why? Because it wasn’t really connected to the internet. AWS (by which I primarily mean the EC2 computing services and the S3 storage service) are oriented all around supporting web applications and rich internet applications. They recognised the value in providing a service that small developers can build on with a reasonable expectation that should they hit the ball out of the park, that they’ll be able to handle any surge in traffic without going into the red ink for three years to come.

It was always strange that the world’s most famous distributed computing platform, Google, not be a fore-runner in this game. But that’s changed now. They’ve arrived with a flash and a bang. Scoble has videos of the launch, but the bare facts seem pretty cool. A Python environment with access to a storage service based on BigTable, and free accounts. The accounts are limited to 500MB of storage, 200 million megacycles/day CPU time, and 10 GB/day bandwidth, with the obvious business plan being to provide scaling resources beyond that for a fee.

Potentially it’s a huge announcement for web developers, and for Python. Google has a very strong brand in when it comes to reputed distributed computing power, and fears of platform lock-in are mostly eroded by the open tools architecture. It would require a rewrite to move an app from the GAE to your own servers (unless you thought about everything closely up front), but it wouldn’t be a huge one. It’s WSGI compliant, and it even comes with Django built-in. The only really unique part about the platform is their GSQL language, which is an acceptable change from the norm since BigTable isn’t a row-oriented database. That and some other features like Google Accounts (which they should really hurry up and turn into an OpenID service) integration, which is of secondary value to the average developer.

It’s limited availability and the first 10000 accounts have already been snapped up (and I missed out :-(), but there’s an SDK available for playing around with. Expect some nice experimental web apps in the next few months, afforded by the very low barrier of entry on this.

Jamming with Django (and Facebook too)


A few days ago I posted about the experience Cameron and I had with Django in creating the Campus Church website. I only covered the model side of the model-view split that Django strongly encourages, since the sermon database I was using as an example didn’t have any data yet to create a view from. Well that’s changed now.

Refresher: We had decided the sermon database was  based on two objects, Sermon and Passage. Sermon described things like a sermon’s title, date, mp3 file, speaker name, and outline. The Passage object described the bible passage[s] associated with the sermon. We split that into its own object since there’s a one-to-many relationship between Sermon and Passage, i.e. we wanted to be able to associate multiple passages with a single sermon.

A fresh view on things

So now we have a look at the source of in the Sermondb app directory (again, Django encourages a heavily modular design in your web applications. We subdivided the management of the site into several such applications, and the sermon database made its own).

import urllib
from django.template import Context, loader
from django.http import HttpResponse
from django.shortcuts import get_object_or_404
from campuschurch.sermondb.models import Sermon, Passage

def get_passages(sermon):
    return sermon.passage_set.all()

def index(request):
    latest_sermon_list = Sermon.objects.all().order_by('-date','-id')[:10]
    passages = map(get_passages, latest_sermon_list)
    t = loader.get_template('sermondb/index.html')
    c = Context({'latest_sermon_list': zip(latest_sermon_list,passages)})
    return HttpResponse(t.render(c))

Before I go any further, I should mention that this isn’t finished yet. We haven’t tackled pagination of results, but it’s not going to be difficult. Django makes that rather easy. So here’s what’s going on in this snippet. We import a bunch of modules. One comes from the default Python installation, some come from Django, some come from elsewhere in the Campus Church application.We define a function get_passages that takes in a Sermon and returns all the passages associated with it. As we’ll see in a moment, this is just a convenience function that really could’ve been replaced by a lambda expression. Take note of the way it actually gets the associated passages. Although we didn’t define anything about passages in the Sermon class (the relationship was only described in the Passage class), Django’s added some very convenient accessor methods to the sermon object.Then we come to the first real view. A view is a function that takes in a client request (in this case and usually, an HTTP request) and returns an HTML string. This first view, index, is describing the sermon database’s index page. It starts by getting the list of the latest sermons, using Django’s neat database API. Then using the convenience function get_passages and map(), we create a list of all the passages that apply to that page. Then we load in the relevant template and then render it. To render the template we pass it a context, created with a dictionary of variable names and values. In this case there’s only one value: the list of sermons to appear on the page and their corresponding lists of passages.Here’s the relevant snippets of the index page template:

{% if latest_sermon_list %}
{% for s in latest_sermon_list %}

<b><a href="">{{ s.0.title }} ({{ s.0.speaker }})</a></b>
{% if s.1 %}

Passage{% if not s.1|length_is:"1" %}s{% endif %}:
{% for p in s.1 %}
{{ p }}{% if not forloop.last %}, {% endif %}{% endfor %}{% endif %}
{{|date:"F j, Y" }}

{% endfor %}

{% else %}

 No sermons are available (yet!).

{% endif %}</div>

It looks a bit messy, but primarily it’s just two nested for-loops. The outer one is running over the supplied list of sermon objects. The inner one runs over the passages associated with each sermon. I’ve excluded the Javascript included in this template that adds the drop-down play and download links, but that doesn’t really matter, because now we’re going to look at the other view in this app. It’s also pretty easy to do with jQuery.

def detail(request, id):
    t = loader.get_template('sermondb/detail.html')
    s = get_object_or_404(Sermon, pk=id)
    passages = get_passages(s)
    for p in passages:
        esvLookupURL = "" + str(p.esvstr()) + "&include-headings=false&include-footnotes=false&include-audio-link=false&include-passage-references=false&include-short-copyright=false"
        p.esv = urllib.urlopen(esvLookupURL).read()
    c = Context({
    return HttpResponse(t.render(c))

This is the view that gets rendered on sermon detail pages like this one. This looks much the same as the previous view in nature, with some slight differences. The function takes another parameter id that corresponds to the id of the sermon that is to be shown. This is passed in via, which lives in the root of the Django project directory and is basically a list of regular expressions matching different URL-types. The relevant sermon is found with the shortcut function get_object_or_404 and the database API, this time by specifying that we want the sermon object with a primary key equal to id. The other thing worthy of mention here is the usage of the ESV Online API. This is a seriously neat little web service, provided for free. We’ve written a method on our Passage class called esvstr() that renders the passage’s location into a format consistent with what the web service expects, and just use that to construct a URL. We use Python’s urllib module to make a request to that page, and include the resulting text. This isn’t a scaling solution, and really we should be caching it, but the terms of service limit us to not storing more of half of any book of the Bible at any one time. Given that we expect consecutive series of sermons to cover entire books at times (the first one will be covering all of Ephesians, for example), we decided it would be rather complicated to work out a caching solution that took that into account. Given our current traffic levels, we’re satisfied we’re not going to be at risk of breaking the 500 lookups/day limit on the service any time soon.


As I mentioned last time, the promotion campaign for Campus Church is incorporating a lot of Facebook usage, since the target demograph (university students) is rather well represented on that site. There was a Facebook group from quite early on, but Cameron and I had the thought that it might be a good idea to reuse what we had already done on the site and create a Facebook application that would just embed a flash player for the latest sermon and a link or two back to the site. Our motivation was two-fold:

  • We had created an RSS feed for the sermon database (again, Django makes this pretty simple). But although I can’t really back it up with numbers, I don’t think it would be too risky an assertion to claim that the set of Facebook users is in any way a subset of the set of regular RSS reader users. A Facebook application would be a way to provide the same sort of functionality to users who might not otherwise use it.
  • The “Share application” functionality Facebook provides is an interesting promotion tool that we were interested in taking advantage of.

You can see the resulting application. Sermonbook is another application in the Django project, but it reuses the Sermondb models to query the same database. With the addition of some boilerplate Facebook code (which you can find on their developers’ wiki), even the views look remarkably similar. The basic process is that Facebook refers to a supplied URL on our site to get the required FBML (Facebook’s XML variation on HTML) to render the application, with a few complications I won’t go into. My aim here is to show what the modularity of Django gives you in terms of flexibility.

def canvas(request):
    latest_sermon_list = Sermon.objects.all().order_by('-date','-id')[:1]
    s = get_object_or_404(Sermon, pk=latest_sermon_list[0].id)
    passages = get_passages(s)
    for p in passages:
        esvLookupURL = "" + str(p.esvstr()) + "&include-headings=false&include-footnotes=false&include-audio-link=false&include-passage-references=false&include-short-copyright=false"
        p.esv = urllib.urlopen(esvLookupURL).read()

    return render_to_response('facebook/canvas_details.fbml', {'name': 'stranger','key':'','sermon':s,'passages':passages})


def post_add(request): #The function that gets called when the user first adds the application
    b = FacebookUser(uid=request.facebook.uid)
    latest_sermon_list = Sermon.objects.all().order_by('-date','-id')[:10]
    fbml = '<br><a href="">More about this sermon</a> | \
<br><a href="">Browse other sermons</a>'
    return request.facebook.redirect(request.facebook.get_url('profile', id=request.facebook.uid))
post_add = facebook.require_login()(post_add)

Really that last part should have been done with a template instead of just hardcoding the FBML, and I expect we’ll tweak that as soon as we can be bothered. Looking at it now, we could have saved on code duplication if the ESV lookup was done on the model, rather than in the view. We should probably tweak that too. Overall the process was pretty straight forward, apart from some trickiness with updates. Because of the way the Facebook application model works, we have to keep a list of all the UIDs of Facebook users who add our app. We’d be happier to not to have do that and would be quite content with some sort of iframe arrangement where it just blindly loads in some page. But iframe apps can’t be displayed inline on the user’s profile page for security reasons. So every time there’s an update, we now have the sermondb application go out and ping Facebook with a list of all our registered users and the updated FBML to display. It’s an inelegant solution, but these are the early days still of web service to web service integration I guess.

The finished app, inline on my profile page: