Monthly Archives: February 2008

Returning to the Culture

“Matter” by Iain M Banks, in my possession

I’ve been looking forward to this. It’s the first novel  Banks has written in his famous Culture story universe for 8 years. I was first introduced to his writing on my 14th or 15th birthday with The Player of Games, and was immediately hooked by the author’s obvious imagination and talent. This will be the ninth title of his to go on my shelf. Review to follow far too shortly, I’m sure. 🙂

Jamming with Django (and Facebook too)

 reinhardt_nuages_8120726.jpg

A few days ago I posted about the experience Cameron and I had with Django in creating the Campus Church website. I only covered the model side of the model-view split that Django strongly encourages, since the sermon database I was using as an example didn’t have any data yet to create a view from. Well that’s changed now.

Refresher: We had decided the sermon database was  based on two objects, Sermon and Passage. Sermon described things like a sermon’s title, date, mp3 file, speaker name, and outline. The Passage object described the bible passage[s] associated with the sermon. We split that into its own object since there’s a one-to-many relationship between Sermon and Passage, i.e. we wanted to be able to associate multiple passages with a single sermon.

A fresh view on things

So now we have a look at the source of views.py in the Sermondb app directory (again, Django encourages a heavily modular design in your web applications. We subdivided the management of the site into several such applications, and the sermon database made its own).

import urllib
from django.template import Context, loader
from django.http import HttpResponse
from django.shortcuts import get_object_or_404
from campuschurch.sermondb.models import Sermon, Passage

def get_passages(sermon):
    return sermon.passage_set.all()

def index(request):
    latest_sermon_list = Sermon.objects.all().order_by('-date','-id')[:10]
    passages = map(get_passages, latest_sermon_list)
    t = loader.get_template('sermondb/index.html')
    c = Context({'latest_sermon_list': zip(latest_sermon_list,passages)})
    return HttpResponse(t.render(c))

Before I go any further, I should mention that this isn’t finished yet. We haven’t tackled pagination of results, but it’s not going to be difficult. Django makes that rather easy. So here’s what’s going on in this snippet. We import a bunch of modules. One comes from the default Python installation, some come from Django, some come from elsewhere in the Campus Church application.We define a function get_passages that takes in a Sermon and returns all the passages associated with it. As we’ll see in a moment, this is just a convenience function that really could’ve been replaced by a lambda expression. Take note of the way it actually gets the associated passages. Although we didn’t define anything about passages in the Sermon class (the relationship was only described in the Passage class), Django’s added some very convenient accessor methods to the sermon object.Then we come to the first real view. A view is a function that takes in a client request (in this case and usually, an HTTP request) and returns an HTML string. This first view, index, is describing the sermon database’s index page. It starts by getting the list of the latest sermons, using Django’s neat database API. Then using the convenience function get_passages and map(), we create a list of all the passages that apply to that page. Then we load in the relevant template and then render it. To render the template we pass it a context, created with a dictionary of variable names and values. In this case there’s only one value: the list of sermons to appear on the page and their corresponding lists of passages.Here’s the relevant snippets of the index page template:

<div>
<h1>sermons</h1>
{% if latest_sermon_list %}
{% for s in latest_sermon_list %}

<b><a href="https://paragraft.wordpress.com/wp-admin/%7B%7B%20s.0.id%20%7D%7D/">{{ s.0.title }} ({{ s.0.speaker }})</a></b>
{% if s.1 %}

Passage{% if not s.1|length_is:"1" %}s{% endif %}:
{% for p in s.1 %}
{{ p }}{% if not forloop.last %}, {% endif %}{% endfor %}{% endif %}
{{ s.0.date|date:"F j, Y" }}

{% endfor %}

{% else %}

 No sermons are available (yet!).

{% endif %}</div>

It looks a bit messy, but primarily it’s just two nested for-loops. The outer one is running over the supplied list of sermon objects. The inner one runs over the passages associated with each sermon. I’ve excluded the Javascript included in this template that adds the drop-down play and download links, but that doesn’t really matter, because now we’re going to look at the other view in this app. It’s also pretty easy to do with jQuery.

def detail(request, id):
    t = loader.get_template('sermondb/detail.html')
    s = get_object_or_404(Sermon, pk=id)
    passages = get_passages(s)
    for p in passages:
        esvLookupURL = "http://www.esvapi.org/v2/rest/passageQuery?key=SNIPPED&passage=" + str(p.esvstr()) + "&include-headings=false&include-footnotes=false&include-audio-link=false&include-passage-references=false&include-short-copyright=false"
        p.esv = urllib.urlopen(esvLookupURL).read()
    c = Context({
        'sermon':s,
        'passages':passages,
         })
    return HttpResponse(t.render(c))

This is the view that gets rendered on sermon detail pages like this one. This looks much the same as the previous view in nature, with some slight differences. The function takes another parameter id that corresponds to the id of the sermon that is to be shown. This is passed in via urls.py, which lives in the root of the Django project directory and is basically a list of regular expressions matching different URL-types. The relevant sermon is found with the shortcut function get_object_or_404 and the database API, this time by specifying that we want the sermon object with a primary key equal to id. The other thing worthy of mention here is the usage of the ESV Online API. This is a seriously neat little web service, provided for free. We’ve written a method on our Passage class called esvstr() that renders the passage’s location into a format consistent with what the web service expects, and just use that to construct a URL. We use Python’s urllib module to make a request to that page, and include the resulting text. This isn’t a scaling solution, and really we should be caching it, but the terms of service limit us to not storing more of half of any book of the Bible at any one time. Given that we expect consecutive series of sermons to cover entire books at times (the first one will be covering all of Ephesians, for example), we decided it would be rather complicated to work out a caching solution that took that into account. Given our current traffic levels, we’re satisfied we’re not going to be at risk of breaking the 500 lookups/day limit on the service any time soon.

Sermonbook

As I mentioned last time, the promotion campaign for Campus Church is incorporating a lot of Facebook usage, since the target demograph (university students) is rather well represented on that site. There was a Facebook group from quite early on, but Cameron and I had the thought that it might be a good idea to reuse what we had already done on the site and create a Facebook application that would just embed a flash player for the latest sermon and a link or two back to the site. Our motivation was two-fold:

  • We had created an RSS feed for the sermon database (again, Django makes this pretty simple). But although I can’t really back it up with numbers, I don’t think it would be too risky an assertion to claim that the set of Facebook users is in any way a subset of the set of regular RSS reader users. A Facebook application would be a way to provide the same sort of functionality to users who might not otherwise use it.
  • The “Share application” functionality Facebook provides is an interesting promotion tool that we were interested in taking advantage of.

You can see the resulting application. Sermonbook is another application in the Django project, but it reuses the Sermondb models to query the same database. With the addition of some boilerplate Facebook code (which you can find on their developers’ wiki), even the views look remarkably similar. The basic process is that Facebook refers to a supplied URL on our site to get the required FBML (Facebook’s XML variation on HTML) to render the application, with a few complications I won’t go into. My aim here is to show what the modularity of Django gives you in terms of flexibility.

def canvas(request):
    latest_sermon_list = Sermon.objects.all().order_by('-date','-id')[:1]
    s = get_object_or_404(Sermon, pk=latest_sermon_list[0].id)
    passages = get_passages(s)
    for p in passages:
        esvLookupURL = "http://www.esvapi.org/v2/rest/passageQuery?key=SNIPPED&passage=" + str(p.esvstr()) + "&include-headings=false&include-footnotes=false&include-audio-link=false&include-passage-references=false&include-short-copyright=false"
        p.esv = urllib.urlopen(esvLookupURL).read()

    return render_to_response('facebook/canvas_details.fbml', {'name': 'stranger','key':'','sermon':s,'passages':passages})

...

def post_add(request): #The function that gets called when the user first adds the application
    b = FacebookUser(uid=request.facebook.uid)
    b.save()
    latest_sermon_list = Sermon.objects.all().order_by('-date','-id')[:10]
    fbml = '<br><a href="http://apps.facebook.com/campuschurch/">More about this sermon</a> | \
<br><a href="http://www.campuschurch.org.nz/sermons/">Browse other sermons</a>'
    request.facebook.profile.setFBML(fbml,request.facebook.uid)
    return request.facebook.redirect(request.facebook.get_url('profile', id=request.facebook.uid))
post_add = facebook.require_login()(post_add)

Really that last part should have been done with a template instead of just hardcoding the FBML, and I expect we’ll tweak that as soon as we can be bothered. Looking at it now, we could have saved on code duplication if the ESV lookup was done on the model, rather than in the view. We should probably tweak that too. Overall the process was pretty straight forward, apart from some trickiness with updates. Because of the way the Facebook application model works, we have to keep a list of all the UIDs of Facebook users who add our app. We’d be happier to not to have do that and would be quite content with some sort of iframe arrangement where it just blindly loads in some page. But iframe apps can’t be displayed inline on the user’s profile page for security reasons. So every time there’s an update, we now have the sermondb application go out and ping Facebook with a list of all our registered users and the updated FBML to display. It’s an inelegant solution, but these are the early days still of web service to web service integration I guess.

The finished app, inline on my profile page:

sermonbook.png

Yahoo has blocked the Pirate Bay from its search results

Try searching for The Pirate Bay on Yahoo!. Try it. As of this moment, you won’t see the actual site anywhere in the index. Not on Yahoo.com, and it’s not working on the NZ version, which suggests an internal policy rather than a censorship request from a particular government, as Yahoo and the other engines already implement. I cannot fathom what they’re currently thinking.

It’s not as if Yahoo hasn’t got other things to worry about at the moment. This has “bad idea” written all over it in such very large lettering, for multiple reasons:

  • Search engines have often pleaded something akin to a “common carrier” status in court as reason for why they shouldn’t be expected to police their results. The precedent this sets for Yahoo is potentially going to hurt them a lot in such cases in the future.
  • Of the big three search engines (Google, Yahoo, MS), Yahoo is easily the one with the most bad press in the West around censorship, mostly due to its rather dubious policies in China. By implementing a blackout on such a notoriously popular site is only going to pour oil on the fire.
  • Why go out of their way like this to give Google good publicity by relation? There are so many bad ways this can be spun through the Chinese Whispers game in between the type of people who care about TPB learning about this, and the type of people who make up Yahoo’s core audience hearing about it. “Yahoo blocked a site world wide because there was a lawsuit in Sweden against it.”

WikiStat, a Greasemonkey extension for viewing Wikipedia edit distributions

In my previous post I wanted to make a pie graph quickly, and so for the first time used Google’s relatively new charting API. It’s a pretty neat little concept, taking in all the data and parameters for the chart in a single URL and then giving you the resulting image. I thought there had to be a better use for that than a static joke pie chart. Then in a meeting of my quiet interest in information visualization and some previous experience at writing a Greasemonkey script (if you know Javascript but haven’t tried GM, do so. It’s actually quite fun and easy), I decided to have a stab at using the API in a slightly more dynamic and useful way.

It’s called WikiStat and shows the time distribution of up to 250 of the most recent edits on an article, giving you a quick insight on how recently and intensively edited a page has been edited.

WikiStat screenshot

You’ll need Firefox and Greasemonkey to use it.  If you’ve got those, then click here to install it.

I haven’t tested it with anything other than GM 0.7 and FF3b3. I do recall that Opera can do user scripts these days, so it might be able to do it too. No promises though.

I’m most interested in suggestions for improvements.

Update:

WikiStat pie chart screenshot

Added a pie chart to show approximately how many of the edits were reversions. Makes for somewhat depressing viewing when you then consider there’s an invisible but equally large and ultimately non-productive segment of the pie.

Jamming with Django

Django Reinhardt

Update: Part 2 with views and a Facebook app

As I mentioned in my previous post, Cameron and I recently did the University of Canterbury Campus Church website using the Django application framework. The sum total of our experience with it going in was that I already knew Python and had gone through half the Django tutorials a couple of years ago when Django was first announced. So this was going to be an interesting experience. We sat down one day, SSHed into the shared hosting account at Bluehost, and opened part 1 of the Django tutorial.

I kept notes as we  went, so we could work out where our time was spent.

Breakdown of development time

Well actually I didn’t keep notes, but that chart isn’t as inaccurate as it may first seem. As you can see, coding time itself was pretty minimal. The rest of the time was spent on wondering why something wasn’t working, then realising we had been using a feature that wasn’t in our version of Django because we were reading the wrong version of the documentation (that their site makes this an easy mistake to make is one of the few criticisms I have of them. And at least they have comprehensive, well-written documentation…). We also spent quite a bit of time bringing down all the sites on our account because we did something stupid with .htaccess files, trying to get Bluehost to do sane things (there’s some very odd and inconsistent configurations over there), and then writing code we then realise we didn’t need. Often because we hadn’t read the Django documentation beforehand, where we would have learnt it already had what we wanted. And Django is pretty cool, and not just because of its namesake.

The majority of the site functionality we knocked out in a single day, producing the content management system for all the static pages, and producing the first version of the sermon database. Django is a model-view-controller framework much like Ruby on Rails, but it’s much more suitable for content-oriented sites like ours. It encourages a modular design of several applications. In our case, the sermon database was an application by itself. If you’re familiar with Django you can skip this next section. It’ll be old news to you.

Modelling the application

Inside the sermondb directory we had two Python files, models.py and views.py. views.py is just full of a couple of functions that render out the Sermon model into HTML by passing them to templates. It’s pretty simple, but I won’t go into it yet because there aren’t any sermons in the live Campus Church site for me to screenshot (I know, I know, we need a development server. We’re working on that). In Django a model is just a class used to describe something you store. There’s often no need to write your own SQL code, although you’re quite free to do so. We decided that there would be two types of objects needing storing in the sermon database. One class we called Sermon, to represent each sermon. The other was Passage, to represent each passage referenced by a sermon. We separated this because we didn’t want to impose any limit on how many passages a sermon could have.

I’m stripping out some helper functions and declarations, we added later, but here’s the contents of models.py for the sermon database.

class Sermon(models.Model):
    speaker = models.CharField(maxlength=50)
    date = models.DateField()
    title = models.CharField(maxlength=150)
    mp3file = models.FileField(upload_to="sermons/",help_text=_("This should be a low quality version (preferably 18kbps), because it will be used for the Flash player."))
    largemp3file = models.FileField(upload_to="sermons/",blank=True,help_text=_("Optional. Preferably a 64kps version."))
    outline = models.TextField(blank=True)
    def __str__(self):
        return self.date.isoformat() + " - '" + self.title + "' - " + self.speaker
    class Admin:
        js = ['js/tiny_mce/tiny_mce.js','js/textareas.js']

class Passage(models.Model):
    book = models.CharField(maxlength=3, choices=BIBLE_BOOKS, core=True)
    startchap = models.PositiveIntegerField()
    startverse = models.PositiveIntegerField()
    endchap = models.PositiveIntegerField()
    endverse = models.PositiveIntegerField()
    sermon = models.ForeignKey(Sermon, edit_inline=True, num_extra_on_change=3)
    def __str__(self):
        if self.startchap == self.endchap:
            return BIBLE_BOOK_DICT[self.book] + " " + str(self.startchap) + ":" + str(self.startverse) + "-" + str(self.endverse)
        else:
            return BIBLE_BOOK_DICT[self.book] + " " + str(self.startchap) + ":" + str(self.startverse) + "-" + str(self.endchap) + ":" + str(self.endverse)

If you’re not familiar with Python, don’t be intimidated by the above. The first thing you need to know is that Python is whitespace sensitive; that indentation is acting as the scope declaration that languages like Java provide with { and }. The second is that it’s dynamically typed. Combined it makes for pretty clean reading code. So in the above file we have two classes inheriting from a class called Model, provided by Django. The first thing we do in each is declare what properties each one has. For example, we declare that the sermon has a speaker (or rather, their name in a string of maximum length 50), a date, a title, two mp3 files (low and high quality) with some help text describing them, and an outline. Some of them we allow to be empty by also declaring blank=True. Then we define a method called __str__(), which just returns a string describing the object. This is very useful to have when you actually want to look at a list of such objects. Finally we declare another class inside the Sermon class called Admin. This is Django’s way of letting you control the way the sermon appears in the administration pages. In it we tell Django to include the TinyMCE text editor javascript file (this will enhance our text editing).On the Passage class we declare that it comes from a certain book of the bible, a choice from a big list called BIBLE_BOOKS that I omitted from this sample just for length. It has a starting chapter and verse, and an ending chapter and verse. All these values are just numbers. It also declares a many-to-one relationship with the Sermon class, declaring a It also has a __str__() to describe itself.By running python manage.py sql sermondb, we can see the SQL code that’s getting generated for us from this.

BEGIN;
CREATE TABLE `sermondb_sermon` (
    `id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
    `speaker` varchar(50) NOT NULL,
    `date` date NOT NULL,
    `title` varchar(150) NOT NULL,
    `mp3file` varchar(100) NOT NULL,
    `largemp3file` varchar(100) NOT NULL,
    `outline` longtext NOT NULL
);
CREATE TABLE `sermondb_passage` (
    `id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
    `book` varchar(3) NOT NULL,
    `startchap` integer UNSIGNED NOT NULL,
    `startverse` integer UNSIGNED NOT NULL,
    `endchap` integer UNSIGNED NOT NULL,
    `endverse` integer UNSIGNED NOT NULL,
    `sermon_id` integer NOT NULL REFERENCES `sermondb_sermon` (`id`)
);
COMMIT;

We didn’t have to write that. If you’re a PHP coder used to cranking out your own SQL tables and haven’t used any sort of ORM system before, you may not be convinced that we’ve actually saved time here. After all, there’s not a lot of difference in length between the two code samples. Hopefully a single screenshot will be enough to convince you of the merit of the Django approach (and even those who’ve used other MVC frameworks).
Django administration screen
Once you reach this point in the Django tutorial, there’s a pithy little line that says something like “Take a moment to marvel at all the code you didn’t have to write”. This is probably the biggest selling point of Django for sites like this. It comes with an administration section already built in, complete with a user system and a nicely fine-grained permissions system for them. Just look at that form. The date field automatically has a Javascript date selector appear beside it, and the input string will be validated as  proper date. The MP3 file uploads are handled automatically.I’m not sure I can stress enough how much fun Django brought back to web development for us on this project by removing all this tedium. For a site like this where you would otherwise be looking at a content management system, Django gives you the flexibility to custom design your own CMS suited for your own needs. Though I should stress that this doesn’t at all preclude it from non-CMS-like roles. It’s just that this was the role we were asking of it for this site, and it played its part with aplomb.

Then and nofollow

It’s been a long time since I was last involved in making or promoting a website. I did several as an occasional and casual job during high school of varying quality for businesses that ranged from very small range to small. The most dynamic they ever got was one site with a single page that could be edited by the owner, courtesy of a hacked TinyCMS.  But it managed to turn into my first ‘proper’ job when I worked over one summer for a local software development firm, which at the time had a headcount of about 200. It was a pretty normal office boy job, except for my primary task which was to work on ways of improving the company’s website ranking in search engines.

My thinking was at the time that all that mattered was Google, but as this was when they weren’t yet a household name I wasn’t able to easily convince my boss of this. So most of my time was spent browsing the forums at Webmaster’s World and the affiliated SearchEngineWatch, and given how little influence I would have on the site itself, learning how inadequate all the measures I could recommend or implement would be. Quickly retreating was the age when meta tags in a page actually meant something. Remember the advertisements for programs that would automatically submit your website to hundreds of different search engines? This was the tail end of that era. Anything beyond those sort of frivolities was not something I could really change. I remember spending a lot of time after that on tasks like writing Python scripts to check the site’s ranking on various search queries, filing papers in binders, and burning CDs. Everything but improving search presence. And in fact the Python was just an excuse to program, to do something to keep my mind interested.

That was 5 or 6 years ago. So when Cameron told me that he was going to be working on the website for the new Campus Church at the University of Canterbury, I was more than happy to offer my support. Collaboration on a project’s easy when your bedrooms share a wall. And then in a completely objective manner, I pushed him towards using a web framework I had been wanting to try out on a real project for some time. As it turned out, Django’s a rather fantastic little framework, so that turned out well and we’re very happy with the results. I might have to dedicate another post to singing its praises. jQuery too.

It’s five or six years since that summer job and the trends that were emerging then have solidified. And by trends I really mean just one trend, and that was that Google is all that matters in search, particularly since the others have all since adopted variations of the PageRank algorithm into their own engines. Before I continue it’s probably worth noting that search exposure is only one prong of a publicity campaign. Our target demographic (university students at a particular university) is extraordinarily suitable for a Facebook campaign, especially with how cheap Facebook ads are at the moment, and of course we’re planning for more traditional advertising (read, talking to people) come the rush of Orientation Week. At the moment we’ve had more visits from Facebook than from search engines. But onwards with the search angle.

So we designed the site all the right ways. Clean URLs, semantic HTML with a CSS design, Javascript that degrades gracefully, an RSS feed (well, that’s more of a nod to other trends, I guess), everything that took far too long to come into vogue in web design. The first two particularly we knew from the start would be good to have if we wanted to maximize our Google exposure. We had picked our set of keywords that we wanted to focus on. And then we did the other Google things you do these days, like submit a sitemap.xml to Google, add a Google Local entry (pending), add Google Analytics (because the shared host is $7/month and you get the traffic analysis you pay for there), and today I thought I should probably go see if Yahoo and MSN search have some sort of submission feature. Not that we care about them particularly.

But of course, as the recent Scientology Google-bombing shows (FYI, Scientology is a dangerous cult), by far the most important factor in your Google PageRank is still inbound links. This I feel is still the case despite the mass PR drop late last year. I remember from my summer job suggesting ways to get links to the site on more popular sites. It didn’t happen. But something has slightly changed since then which can hamper such efforts, and that’s the emergence of nofollow.

If you’re not familiar with it, the short of it is that if you add “rel=’nofollow'” to an HTML link tag, it acts as a flag to Google that the link shouldn’t count as any sort of recommendation of quality/vote of confidence. Google of course uses links as its primary measure of a site’s value. The more links to a site, the more importance it has. Links from important sites are more valuable than links from non-important sites. The primary rationale at the time was to take the reward out of blog comment spamming, which was rather big back then.

It’s also made for a hidden world of trust and distrust on webpages. As I was considering the ways to get links pointing to the new site, like the rather blatant and keyword-laden one earlier in this post, I decided to spend a few minutes writing a bookmarklet to highlight links tagged with nofollow. You’ll have to go to this page first to install it, because WordPress is paranoid about Javascript and strip it out of posts.

There’s some already bookmarklets like this already available when you search Google for “nofollow bookmarklet”, but mine’s slightly more flexible because it allows for the rel attribute to have multiple values as well as nofollow (as it increasingly will have as more semantic linking is adopted).

So install it, and see the hidden web of distrust in front of you. The user-content driven sites are most interesting.

YouTube doesn’t trust its users. I can’t really blame them:

Youtube comments

Neither does Wikipedia, which makes a lot of sense for them:

But legendary geek news site Slashdot’s a bit more ambiguous. I can understand the nofollow on links in comments (although maybe it’d be a good motivation for users if they were to remove it on comments moderated above a certain level), but if they allow it on the standard user profile Homepage link, why not the front page link in the user’s name?:

Slashdot nofollow examples

The Slashdot has a PR of 9/10. That’s massive, and it seems somewhat of a wasted resource that they not use it better. Its younger competitors Reddit and Digg (both PR 8/10) don’t use it, and they’re even automated unlike Slashdot, who still use human editors to approve all front page items.

Maybe the web’s ready for a finer-grained method of trust annotation?

And how do I bring this ramble full circle? Well we go back to my summer job and one thing I tried to suggest several times. Well there were several things I tried to suggest several times, including maybe not storing the list of all the client website usernames and passwords in plaintext for all – myself included – to see on the internal network. But another thing I did suggest was a press release designed to appeal specifically to the Slashdot crowd, and thus earn a juicy front-page link. Announce their then-secret work on making a Linux port. Get that burst in traffic and the link. It would’ve worked. But it didn’t happen.