django

Automatically creating MySQL FULLTEXT indexes with django syncdb

Solr's great, and I depend upon it for a lot, but sometimes it's just an extra moving piece that's not worth the deployment overhead. I'm working on a project where a simple MySQL FULLTEXT index will probably suffice instead, since we're already using MySQL.

The trick, though, is that Django doesn't create FULLTEXT indexes for you. Which means you have to drop down into the mysql shell and create your index by hand, or script it and remember to run the script, etc., etc. Or you could write a custom manage.py command, but you still have to remember to run that separately. But who remembers anything anymore?

I wanted this to be automatic, so that with any manage.py syncdb, it would ensure that the fulltext index is present without any extra step. To do this, the best option I've come up with is to use django's post-syncdb signal. Unfortunately that doc is in a state of flux, so I had to poke around at it all to get this straight, and I'm not sure I have it right yet, but I have it working, at least. Here's the script, which lives in the "management" module of the app's directory next to the models.py with the model I want the fulltext index on.

This is using django's svn trunk, updated today.

from django.db import connection
from django.db.models import signals
 
from myproj.myapp import models as myapp_models
 
 
def create_fulltext_indexes(**kwargs):
    """ 
    Check whether myapp_searchindex has a fulltext index,
    and create it if it doesn't yet exist.
    """
    cursor = connection.cursor()
    try:
        cursor.execute("""
            SELECT * FROM information_schema.statistics 
            WHERE table_name='myapp_searchindex' 
            AND index_name='str_value_fulltext'
            """) 
        rows = cursor.fetchall()
        if len(rows) == 0:
            print 'Creating fulltext index on myapp_searchindex.str_value'
            cursor.execute("""
                CREATE FULLTEXT INDEX str_value_fulltext 
                ON myapp_searchindex (str_value)
                """)
    except:
        import traceback        
        print traceback.print_exc()
 
 
signals.post_syncdb.connect(create_fulltext_indexes, sender=myapp_models)

The key thing here is that the post_syncdb signal seems to get sent to this function more than once. I read some comment threads on this and used that advice to get the number of signal calls down from several to two, but I didn't want the index to get built and rebuilt over and over, so this checks MySQL 5.0's information_schema database to see if the index already exists or not before building it.

It seems likely that there's a better way to do this, but I couldn't find an easier solution that didn't require any kind of separate step. If you know one, please let me know! If not, hopefully this will help somebody else.

Update (8/15): This is the same strategy this code uses to swap out MySQL storage engines.

Looking for a python/django consulting gig?

This page is a good one to review in general. Find "Library of Congress" in that page in particular.

We'd need you here, so, willing relocators only, please.

djangopac - the 20-minute library OPAC in django

Just for fun, here are screenshots of what a 20-minutes-to-develop OPAC might look like. It just uses the Django project's built-in admin screens over a simple mysql db and offers simple access to about 250,000 records from the medical and historical medical libraries I work for.

This was literally a 20-minute project. You can search for records by title, author, or imprint information, and filter by local location codes. This screenshot is the first result page from a search of all locations for "darwin":

djangopac-darwin-all

And this is the same search, restricted to the location code for "historical medical, locked stacks" (i.e. the really cool but complicated stuff):

djangopac-darwin-hist-locked

The data is from an extract from voyager generated using a tool written by staff at the Yale Libraries' Integrated Library Technology Services group.

suki, the repository into which I'm putting the full bib records in multiple formats for all this data, will soon actually own a copy of all this data and I can hook a more-than-20-minute search interface up to it for real. Still, cool, eh?

Syndicate content

This site is Copyright (c) 2005-2008 by Daniel Chudnov. All rights reserved.

All opinions stated here are my own, and do not reflect those of my employer.