Category Archives: development

What price a byline? (Or: what’s wrong with Knol)

A reader criticised my frequent referencing of Wikipedia in my last post, on the basis that everyone knows what WP is and that indeed some of us have Firefox extensions[1] to make quickly consulting it easy. I admitted he had a point, prompting another reader to protest that it doesn’t matter where the links go to, as long as they’re informative and well-written. The degree to which they were both right was strikingly indicative of how far WP has come. Given that it’s so often the first search result on Google for a huge number of queries, making explicit links to it can seem like adding links to dictionary.com for longer semantemes[2]. And the reason I reference it so often is that its collective writing style and usual accuracy is ideal for a quick introduction to what may be unfamiliar ground.

But its status as the #1 go-to place for so many Google queries didn’t go unnoticed in Mountain View. Yesterday Google finally released their long-in-development About.com-mimic Knol. A “Knol” is an unnecessary neologism coined by Google to mean a “unit of knowledge”, but seems the basic idea is to compete with Wikipedia on the authoritative content front, by meeting one of the oft-heard (albeit not so much anymore, if only due to exhaustion) criticism of WP: that you can’t trust it because you don’t know who wrote it. Knol’s point of differences with WP are then as follows:

  • You can have more than article on a topic.
  • Articles are signed by their authors
  • Advertising will be displayed, and it will be split with authors.
  • The level of collaborative editing allowed on each article is controlled by the author, as is the licensing.

I’ve been reading through a few of its articles, and what’s striking me is what they’ve lost by not having universal editing. So often WP was compared to the Encyclopedia Brittanica. Knol tries to compromise between the two, but in doing so completely erodes the role of the editor. The person who doesn’t actually write the content, but polishes it to a publishable standard, and makes it consistent with the rest of the corpus of work. Today’s featured Knol is on Migraines and Migraine Management. It’s written by a neurologist, so you know it’s authoritative, and it doesn’t allow public editing, so you know it hasn’t been tampered with.

But compare it with WP’s article on Migraines, and you’ll see just how needing of an editor it is. It’s written as if it’s intended for paper, with non-hyperlinked cross references “Migraine is defined as a headache that [TABLE 2]:”. “[TABLE 2]”, is a JPEG image at reduced size. There’s no reason for that and not an actual HTML table. (Additionally Google, there’s no reason for the images to be inline with the content like that. Consider a Tufte-like layout, where the tables and references and footnotes can go out to the side).

Throughout Knol you’ll find all sort of bad design practice. I swear I saw image hotlinking in one article before. But in particular, a lot the seed articles seem to be HTML dumps of articles already written by medical professionals, like this one. It’s closed collaboration, so unlike WP, you can’t just drop in and quickly format that into something presentable (at present there’s no change in style, the intra-page headings are just capitalised, there’s an odd amount of whitespace, and the page title itself isn’t capitalised).

There’s two big surprises here, given that this is a Google project, and how long it’s been in development there. And if they don’t fix this, I fear an epic failure.

The first is that they’ve provided such an unstructured writing environment. If you’re trying to create a body of high quality written material, there are ways you can structure your editing environment so that content conforms to certain styles and expectations. It’s particularly in Google’s interest to do so, since as they keep telling the SEO world, well-structured HTML and documents are easier for them to index and search. And yet Knol’s featured Migraines article has swathes of tabular content in the un-indexable, un-accessible JPEG format.

The second is much more subtle and can’t be fixed with as much of a technology patch as the first can. Google have failed to realise that often the most expert authors are going to be simultaneously the least equipped to properly format and polish their own documents (whether it be  due to lack of technical skills, or time), and also the least willing to submit their work to editorial changes from the unwashed anonymous masses. The fix for this I think will involve a recognition and separation of the two types of editing that happen on Wikipedia: authoring or fixing of content; and editing for quality control (fixing grammar, spelling, style, adding useful metadata to a document). Then build a system to acknowledge the good editors, not just the good authors. Then encourage authors to allow editorial changes from recognised quality editors. In fact, drop the “closed collaboration” option altogether.

This is even harder than getting good quality content in the first place. Writing is glamorous. Editing isn’t, but it’s so very important. Knol’s only got half the problem solved.

[1] Certainly one of my favourite extensions is the perenially useful Googlepedia, which remixes your Google search results to embed the first returned WP article on the right (It’s particularly nice on widescreen monitors).

[2] So it’s not a directly applicable synonym of ‘word’, but it was the best the thesaurus could give me

Advertisements

When bugs really do matter: 22 years after the Therac 25

At the moment my master’s work has me in the middle of what seems like an never-ending sea of runtime errors. It’s the largest Python project I’ve ever worked on, and I’ve been forced to re-learn a lot of practices that I really should have internalized a long time ago but never really did. I’ve also been able to play around with interesting meta-programming techniques (ie, modifying the language on the fly, so to speak, though not as much as a genuine macro system would allow) to stop certain types of bugs reoccurring.

As things progress, the new code settles down into something that actually runs more often than not, and becomes predictable. One thing I’ve relearned the value of has been the sanity check: throwing assert statements around like it’s going out of style, and I’ve found myself almost wishing Python had Eiffel’s native contracts[1]. Almost. I’m willing to pay the development time/code stability trade-off of this approach.

But it’s a whole other matter when you consider articles like this one from the Baltimore Sun (June 30). It’s about a forensic bug-hunter team at the FDA set up to address the growing dangers as software runs more and more medical systems. It talks up static code analysis as a possible solution, though after passing through the mainstream journalism medium it probably comes across as much more of a panacea than it really is. After all, there is no single magic pill for complex software like this.

One passage stood out though.

FDA officials declined to name the maker of the infusion pumps. (In 2006, Cardinal Health, of Dublin, Ohio, stopped production of its Alaris SE pumps because of a key-bounce error that reportedly killed two patients, including a 16-day-old baby that got 44.8 milliliters of intravenous nutrition, rather than 4.8 milliliters.)

During the investigation into the malfunctioning pumps, nurses complained about frequent keyboard errors, while the manufacturer blamed nurses for entering the wrong drug information and then failing to double-check, said Brian Fitzgerald, who heads the FDA’s software specialists.

A shade over 23 years ago an equally tragic disregard for a reported malfunction that appeared to have no effect gave radiation overdoses to 6 people. The Therac 25, a radiation therapy machine that had a subtle bug that led to massive overdoses, went down in software engineering history as being one of the most infamous cases of a race condition bug. There was a comprehensive (and quite readable) report that’s very much worth reading for all developers working on non-trivial software or electronics. And it makes for spine-chilling reading.

From the Boston Globe, June 20, 1986 (source):

MAN KILLED BY ACCIDENT WITH MEDICAL RADIATION

by Richard Saltos, Globe Staff

A series of accidental radiation overdoses from identical cancer therapy machines in Texas and Georgia has left one person dead and two others with deep burns and partial paralysis, according to federal investigators.

Evidently caused by a flaw in the computer program controlling the highly automated devices, the overdoses – unreported until now – are believed to be the worst medical radiation accidents to date.

The malfunctions occurred once last year and twice in March and April of this year in two of the Canadian-built linear accelerators, sold under the name Therac 25.

Two patients were injured, one who died three weeks later, at the East Texas Cancer Center in Tyler, Texas, and another at the Kennestone Regional Oncology Center in Marietta, Ga.

The defect in the machines was a “bug” so subtle, say those familiar with the cases, that although the accident occurred in June 1985, the problem remained a mystery until the third, most serious accident occurred on April 11 of this year.

Late that night, technicians at the Tyler facility discovered the cause of that accident and notified users of the device in other cities.

The US Food and Drug Administration, which regulates medical devices, has not yet completed its investigation. However, sources say that discipline or penalty for the manufacturer is unlikely.

Modern cancer radiation treatment is extremely safe, say cancer specialists. “This is the first time I’ve ever heard of a death” from a therapeutic rediation accident, said FDA official Edwin Miller. “There have been overtreatments to various degrees, but nothing quite as serious as this that I’m aware of.”

Physicians did not at first suspect a rediation overdose because the injuries appeared so soon after treatment and were far more serious than an overexposure would ordinarily have produced.

“It was certainly not like anything any of us have ever seen,” said Dr. Kenneth Haile, director of radiation oncology of the Kennestone radiation facility. “We had never seen an overtreatment of that magnitude.”

Estimates are that the patients received 17,000 to 25,000 rads to very small body areas. Doses of 1,000 rads can be fatal if delivered to the whole body.

The software fault has since been corrected by the manufacturer, according to FDA and Texas officials, and some of the machines have been retured to service.

… (description of the accidents)

The Therac 25 is designed so that the operator selects either X-ray or electron-beam treatment, as well as a series of other items, by typing on a keyboard and watching a video display screen for verification of the orders.

It was revealed that if an extremely fast-typing operater inadvertently selected the X-ray mode, then used an editing key to correct the command and select the electron mode instead, it was possible for the computer to lag behind the orders. The result was that the device appeared to have made the correct adjustment but in fact had an improper setting so it focussed electrons at full power to a tiny spot on the body.

David Parnas, a programming specialist at Queens University in Kingston, Ontario, said that from a description of the problem, it appeared there were two types of programming errors.

First, he said, the machine should have been programmed to discard “unreasonable” readings – as the injurious setting presumably would have been. Second, said Parnas, there should have been no way for the computer’s verifications on the video screen to become unsynchronized from the keyboard commands.

As the report makes devastatingly clear, there was far more wrong with the system than just those two issues in the last paragraph. The code-base as a whole was shoddy, there weren’t hardware failsafes to stop unreasonable behaviour, and there was insufficient or non-existent sanity-checking (i.e, checking an internal state of the program is sensible). From the report:

The operator can later edit the mode and energy separately. If the keyboard handler sets the Data Entry Complete flag before the operator changes the data in MEOS, Datent will not detect the changes because it has already exited and will not be reentered again. The upper collimator (turntable), on the other hand, is set to the position dictated by the low-order byte of MEOS by another concurrently running task (Hand) and can therefore be inconsistent with the parameters set in accordance with the information in the high-order byte. The software appears to contain no checks to detect such an incompatibility.

Take away point (and again, reading the report will give you several): Sanity checks are good. Make use of assertions. Code defensively.

But most staggering was the way the manufacturer AECL reacted to the news. The blame for why it took two years for the bug to be recognised can be laid at their feet for not responding adequately to reports of malfunctions. Their attitude can be summed up in this paragraph from the letter they sent to hospitals after the bug was found, describing how to keep the machine safely operational.

Effective immediately, and until further notice, the key used for moving the cursor back through the prescription sequence (i.e, cursor “UP” inscribed with an upward pointing arrow) must not be used for editing or any other purpose.
To avoid accidental use of this key, the key cap must be removed and the switch contacts fixed in the open position with electrical tape or other insulating material. For assistance with the latter you should contact your local AECL service representative.

Take away point: If problems occur because your software’s allowed to drift into a nonsensical state when the users use the keyboard too fast, it’s actually not really a fix to tell them to remove buttons so they can’t use the keyboard too fast. It seems comical in hindsight. But in 2006, the FDA noted in a press release how Alaris responded to their faulty infusion pumps:

In an August 15 recall letter, Alaris informed customers that it will provide a warning label for the pumps and a permanent correction for the key bounce problem once it is available.  In the letter, Alaris also provided recommendations to pump users on steps they can take to minimize key entry errors until the problem can be corrected. The steps are as follows:

Proper Stance
When programming pumps, stand squarely in front of the keypad (ideally with the pump at eye level for best visibility) to facilitate proper depth of depressing each key.

Listen
Focus on listening to the number of beeps while programming IV pumps; each beep will correspond to a single digit entry.  Unexpected double tone could indicate an unintended entry.

Verify Screen Display
When programming the pump or changing settings, always compare the patient’s prescribed therapy or the medication administration record, original order, or bar code device to the displayed pump settings for verification before starting or re-starting the infusion.

Independent Double Check
Request an independent double check of pump settings by another practitioner before starting or changing infusions with hospital-selected high alert drugs.

Look
Before leaving the patient’s room, observe the IV tubing drip chamber to see if the observed rate of infusion looks faster or slower than expected.  Adjust accordingly.

In a way, it’s worse than “take the key off the keyboard and tape it over”. At least that stopped the error as known from happening. So here’s a final take away point: Telling your users to be constantly checking the system hasn’t malfunctioned, saying you’ll send out a warning sticker and eventually a real fix, doesn’t really count as a fix. Relying on human vigilance isn’t a solution. Not when bugs really do matter.

[1] PEP316 actually proposed that contracts be added to Python. Current status: Deferred.

Using that second display: 4 news visualisations of questionable utility

For both yours and my ever decreasing attention spans, in the race to distinguish and spice up the daily news product, here’s more news, shallower, and faster.

MSNBC Spectra screenshot

Spectra from MSNBC is a pretty terrible romp into 3D. Pretty, but completely unusable and just rather useless. You select what channels of news you want, and as you do a selection of stories from each channel floats into the display as a rotating ring. It wouldn’t be so bad if you could actually click on the floating news items. But no, that does something completely unexpected, it ejects that entire ring of stores. To get to a story you want, you have to navigate via a ridiculous horizontal scrollbar. I thought we had learnt in the 90s that 3D interfaces like this just don’t work. From Information Aesthetics via Data Mining.

Newsmap

Moving from the realms of insanity to just the slightly overwhelming comes Newsmap, based off Google News.

Digg\'s \"Big Spy\" visualization

Digg\'s \"Stack\" visualization

From the very epitome of fickle and populist news rivers comes a selection of cool-looking, fast moving and not really that value-additive visualizations at their Labs section.

Mapped Up screenshot

Finally comes a low-key (and the most embeddable of the lot) Flash widget that just rotates geo-coded stories on a world map.

John Resig just released something rather neat

John Resig just released a rather awesome Javascript library that implements the Processing language.

Jaw status: dropped. My empathy for the Reddit commenter who had nothing more to say on this release than “I give up on programming now.”

Make sure you check out the extensive list of demos. The long predicted competitor to Flash that Javascript + <canvas> could be may be soon upon us. Good. Or at least, some neat games should come out of it.

Graceful Degradation, or Progressive Enhancement?

There’s a question of design philosophies in software that describe two diametrically opposite ways of theoretically getting the same results: Top-down or bottom-up? Traditionally we’re supposed to do the former, designing the big picture first and then filling in the details until we’ve built all the way down from abstracted design to concrete reality. We usually do the latter, building little lego bits and then trying to connect them into a structure approximating the original design.

But in a sense in the world of web application design, where “best practice” isn’t just a moving target but one moving in quite different directions, the opposite is in effect. We’re doing top-down experience design, when we should really be doing bottom up. The distinguishing issue is that on the web, we’re not just creating one design, we’re creating a suggested design that will then be rendered in a whole multitude of ways.

Normal practice in web design/development is to work out what you want to functionally do, then make the call on what technology (Flash, Shockwave (remember that?), Java, AJAX, ActiveX, PDF, or even Silverlight) would be best for making that happen, evaluating the “best” as a measure of time, expense, longevity, security, and market support. And then if time allowed, you started designing fallbacks for clients without those technologies.

Chris Heilmann has done a good job advocating the opposite philosophy of progressive enhancement. This is the philosophy that involves you starting your site/web-app design with the lowest common denominator, and you produce a functional product at that tech level. If it can’t be done, you need a good reason for it to be so. Then you progressively layer on “richer” technology. It’s the humble and unassuming philosophy: you don’t presume more than you must about your user and their circumstances.

They’re two opposing philosophies that theoretically should give the same results. You start high-tech and work backwards, or you start low-tech and move forwards.

The problem that works against this is Hofstadter’s law: Work has a knack of taking longer than you expect. Unexpected new things to work on arise, and then you start budgeting your time and triaging things you shouldn’t. In the first design model, you would design low-bandwidth HTML versions of your all-Flash site. Unless a new feature request came in and you had to implement that first in the Flash. Eventually you just give up and require that your clients all use Flash. Then you wonder why Google isn’t doing such a hot job of indexing your site anymore. Or you bite the bullet and spend a lot of time doing things properly. As soon as you start prioritizing the high-tech experience as the primary and complete version, you’re just constraining yourself against future flexibility. And then you sometimes end up irrationally justify that primary experience in places that shouldn’t really exist.

The positive reasons for progressive enhancement then start flowing out of varied examples. There’s increasing numbers of users who use something like the Flashblock extension (because I’m sick of Flash-based ads, especially the ones that start streaming video, sucking bandwidth without your permission). Similarly, people have taken to using NoScript, an extension that imposes a white-list on allow Javascript. And don’t forget the disabled. Screen readers for the visually-impaired do a really bad job of handling Javascript. So does the Google web spider, for that matter. Or take the iPhone, a suddenly popular new platform that completely eschewed Flash. If you had invested into a site that required Flash, you were inaccessible. If you had built a site around progressive enhancement, you were much more well equipped to support mobile Safari. So adopting a philosophy of progressive enhancement in these cases improves support for niche users, accessibility, search engine coverage, and an unforeseen new platform.

This means things like coding HTML that’s completely functional without Javascript, or Flash. They’re technology it’s often reasonable to assume the average client will have. But unless you can really justify it, you shouldn’t.

It involves things like not using links with href="javascript:doThis()" or onClick event handlers hard coded into their HTML. Instead just give the links decent ids and then add the event handlers dynamically from Javascript. It’s not hard to do, if you do it right the first time.

There are some surprising offenders in this class. Try adding accepting a friend request on Facebook with Javascript turned off. You can’t actually click the button, and there’s no reason that should be so. Why did I run into that?[1] Well, if you’re the site owner, does it matter?

I had a Dynalink switch with firmware that broke the rule too. It used Javascript-powered links for its navigation, instead of plain HTML. I wouldn’t have noticed, if it weren’t for the Javascript not actually working on browsers that weren’t Internet Explorer. There was no earthly reason for those links to use Javascript, and every time I had to load up IE (particularly if it involved a reboot to Windows to do so) just to configure my switch, it didn’t do much for my opinion of Dynalink.

If you’re a web developer and you’re not already doing this or haven’t heard of the idea before, I strongly encourage you to read Chris’ full article on progressive enhancement. If you haven’t, but you’re exercising sound development principles (separation of code and content, observing standards, using semantically sensible markup, designing with accessibility in mind etc) you’re probably already most of the way there. But do skim over it all the same. It’s a descriptive philosophy that successfully captures much of what we should already be doing, but for reasons that fallen under different hats previous.

A more intelligent use of nofollow

Back in February I posted a rather rambling diatribe on the use of rel=’nofollow’ by various websites. I complained that the social news sites like Slashdot were misusing it or being inconsistent, and really it was a wasted resource. Jeff Wang’s noticed that Paul Graham’s Hacker News (it’s a submit-and-vote based news site like Reddit but more specialised towards the tech startup audience) is making a smarter use of it. Simply, stories get nofollowed until they’ve got more than 5 votes, and then they’re let free. It’s a simple heuristic that hopefully gives the best of both worlds: rewarding good links, but still discouraging high volume/low quality/smells like canned ham links.

Tact and keywords

The rise of the laser-beam-narrow targeting allowed by Google AdWords and AdSense has led to some interesting uses. It’s also led to accusations of insensitivity on Google’s part, who explicitly point out they don’t exercise human editorial control over ad placement. But Cameron showed me one last night that leaves me feeling slightly odd, and this one isn’t Google’s “fault” as much as the advertiser.

Campbell Live ad on Google search results page

For the non-NZers, the biggest news item this week here has been a quite tragic accident where six students and a teacher from a high school, Elim Christian College, were swept to their deaths after a flash flood during an outdoor exercise in a gorge. What you see above is an ad using the school’s name as a keyword for a Campbell Live, an evening TV news/interview show, or rather their specific portal page for the subject.

I can see why they did it. They may have done it automatically even, with some system to buy up keywords on common phrases in hot stories, and part of me thinks it’s a good idea. But I still can’t help feeling that this a somewhat tasteless use.