The University of Queensland Homepage
Matthew Smith Library Systems Programmer

Announcing the CSS Zen OPAC

Ah if only libraries were interested in standards complient web design. Or even getting real designers…

Announcing the CSS Zen OPAC | One Big Library.

One thing we could do to push the boundaries further is a CSS Zen OPAC (Garden) – think about if the HTML sitting behind the CSS Zen Garden were an OPAC screen? Think about all the crazy ideas that might bubble up if we could throw a cleanly-designed, thoughtfully semantic-html OPAC screen up and let the world’s best graphic designers and CSS gurus explore new directions and designs.

The Fez XSD Problem

I’ve been thinking about a little problem to do with our Fez XSDs mappings in Fez. In a nutshell, we need a way of copying XSD mappings between different Fez instances.

The first iteration of this was to make an XSD exporter which reads the XSD mapping database and exports the structure to an XML file.

However, the complication is how to merge conflicts when importing XSDs? And also how to know what is in the new XSD XML files?

Part of the complication is that each XSD Display Mapping has an xdis_id which is stored in the record to tell Fez which content model to use to handle the record. But what if a Fez user ‘out there’ creates a new XSD Display and uses an xdis_id that we have used here at UQ? When they get our Fez upgrade, they will no doubt want to upgrade to the new UQ XSDs but not want to lose the ones they’ve made. So I’ve made the Fez XSD importer try to detect when the new XSD doesn’t match the existing one and save the new one to a new slot.

However, this collision detection is unhelpful in some ways. For example, what if the user has just changed the name of the XSD from english to spanish? The XSD importer will not upgrade their XSD because it will see a difference. How is the user supposed to get the updates? How are they to know what is in the updates? In many cases, XSD mappings need to be changed due to changes in the mapping API.

I’ve also had issues of users sending me exported XSDs that have used different xdis_ids for some of the core Displays like Generic MODS resulting in a mess of unresolved XSD Relationship links.

I had thought about reserving a block of xdis_ids for core Fez stuff and marking some XSDs as core Fez so that they will have a kind of ‘brute force’ power that makes them just overwrite whatever’s there and always work. However, this still excludes the possibility of users being able to send each other XSDs that are safe from xdis_id conflicts.

I think the solution to this problem is to have namespaces for the xdis_ids. We could prefix ours with fez_core. Perhaps for our special UQ eSpace, we could use UQeSpace as the prefix. In the fedora records FezMD, we would store the xdis_id_prefix as well as the xdis_id. For records that are in the wild lacking the prefix, we could default to fez_core until the records are updated. I guess there might be performance issues having to query the xdis_id and the prefix every time but something along these lines is what I’m thinking.

This doesn’t solve the problem of being able to compare old and new XSDs but takes us some of the way towards a sustainable solution.

Google Scholar

have just been having a light hearted Google Scholar vs “real” information sources (including use of librarians) in our development team. In the red corner, our digital repository manager was linking to articles like this one about the deficiencies in Google Scholar:

CSA – Discovery Guides, Publish or Perish: Afterlife of a Published Article

It is a common experience. A University professor recalls there was an important paper that was published a few years back that applies directly to the proposal he is writing. He remembers what the paper was about but is utterly clueless as to the title, author, and publication title. It’s late and the proposal deadline is only hours away but this article really must be cited in the literature survey. What to do? A quick literature search based on what the professor remembers about the article should do the job. Among the available options are the tried-and-true abstract databases that he has been using since undergraduate days or the all new, highly touted Google Scholar. Does it matter which choice he makes?

In the blue corner, I was arguing that you have to take articles like this with a grain of salt as there are lots of people who are threatened by Google’s growing accesibility and the public perception of it as the ultimate information source (or more importantly, the perceptions of people who are not information professionals but might be making decisions affecting the employment of information specialists).

I came around in the end though. No matter what their motives (and I’m sure that there are many different responses amongst information professionals), calling out Google on missing information and lack of transparency is a good thing. Google Scholar needs to open up more about who it is indexing and how often.

Deduping in Fez

For those curious about the new record duplicates workflow I’ve been developing. Here is a screencast of what it vaguely looks like at this stage. Still a proof of concepty thing but will be tested with fire in the near future.

I used Wink to make this as I find it pretty easy to use and it is open source. We also have a licence for Adobe Captivate so maybe I will pitch them head to head and put a review here in case anyone is interested.

Patterns I Hate #2: Template Method via Pure Danger Tech

A friend emailed me this blog post about the template design pattern. Some good points made here and I wish I’d read this before I implemented class BackgroundProcess in Fez.

Usually, the best way to address the addition of functionality in orthogonal domains under a template class is to define an interface for each kind of functionality and inject an instance for each.

Oh well, I’ll have to remember this for Fez 2.0.

Warning of data ticking time bomb via BBC

Is digital preservation the next Y2K? It is certainly a problem for more and more ‘normal’ people, not just libraries. The other half of the problem is for John and Joan user to be able to deal with the Gigs of digital images and videos that they are now able to produce.

BBC News Online | Technology | Warning of data ticking time bomb

“If you put paper on shelves, it’s pretty certain it is going to be there in a hundred years.
“If you stored something on a floppy disc just three or four years ago, you’d have a hard time finding a modern computer capable of opening it.”
“Digital information is in fact inherently far more ephemeral than paper,” warned Ms Ceeney.
She added: “The pace of software and hardware developments means we are living in the world of a ticking time bomb when it comes to digital preservation.
“We cannot afford to let digital assets being created today disappear. We need to make information created in the digital age to be as resilient as paper.”

Digital repositories to the rescue…

Chickens in the Library via BiblioTech Web

Chickens in the Library » BiblioTech Web
A pair of chickens walks up to the circulation desk at a public library

Using SVN for Websites

Coming from a compiled C++ background, I’ve always used revision control software to track changes to source code. Arriving at UQ I assumed that I could set up a subversion server and do the same for web application development. But there are some significant differences when working with svn on websites and here are some of the things I’ve learned.

  • HTML and PHP files can be easily stored in subversion but how do you track changes to your database schema? A way around this is to use a migrations scheme similar to Ruby on Rails which can store your DB changes in a SQL script which can be put under revision control. It requires a lot of discipline but means that in a small team, others can do an svn update and then run the a ‘db upgrade’ script to get the latest database schema and even changes to default config data. Obviously you have to be a bit careful as it’s not simple to do a rollback of your database so it is still suboptimal.
  • File permissions on webservers are often important. It’s useful to have a script that you can run after a svn update or checkout to make sure that files are correctly writable by the webserver when they need to be (It’s also good to reduce the need for this in the first place). If the script is under version control then new server writable files can be added to it as needed.
  • Using subversion to rollout a live site helps you track exactly which version of the site is running. We use a seperate branch in subversion for the live site as it often has a few little customisations that aren’t in the main development branch. We can tag in subversion each time we do a rollout.
  • On a live server, different users might need to update parts of the site at various times but the svn client wants to always set the .svn/entries file to be read only and owned buy the current user. This is a bit annoying especially if you don’t want you svn users to have a root ssh account or ability to change owenership of other users files on the server. One way to get around this is to have a ‘svnuser’ that people can login as to do the updates on the live site. However when commiting from the live site, the real username of the perpetrator is then lost (or you can restrict the svnuser on the live site to svn updates only). So there is extra complexity at this point making use of subversion that little bit hard.
  • Which brings me to my last point which is that subversion has a learning curve for new web developers coming to the team – they need to be trained on how subversion is used in your organisation and get used to the habits of remembering to commit schema changes and comment commits appropriately etc… If your programmers have never come across revision control before, at least subversion has an excellent online book which walks through all the concepts.
  • WebSVN has proven to be a helpful addon as we can subscribe to the commit messages via RSS and keep in touch with what’s going on. It can also link in with our ticket system if we reference a ticket number in the commit message which in turn puts a link to the WebSVN commit record in the ticket.

Even though we only have a small development team, subversion has allowed us to work in parallel on things, using our own staging areas and databases and still share our progress. We still have to think about how things are going to play for the other programmers when we do a commit but most of the time, the benefits of the RCS far outweigh the overheads of spending extra time to write commit messages, writing schema upgrade scripts and mucking about with subversion’s quirky permissions.

Tech Explorer Generation X in Libraries

What Corey said:

Tech Explorer Generation X in Libraries «

Now that the Library community is taking notice of such things as tagging, reviews, and collaborating with users, library vendors are starting to add this type of functionality into their offerings. Oh, but did I mention that these features are still in development, for example Rome and Encore, and no doubt come with a hefty price tag? Are we, as a library community prepared to wait for what is provided? Or are we going to start exploring alternatives such as Scriblio, Koha, or Evergreen?

Digital Preservation of Analog Sound Recordings

Very cool.

Library of Congress Blog » Good Night, IRENE: Technology of Dreams (Library of Congress)

IRENE makes a high-resolution digital image of a disc record. The key is found in creating a digital audio file from the analog information in the disc’s grooves. IRENE can efficiently extract sound from an image of a fragile or damaged disc, “heal” scratches or digitally “reassemble” a broken phonograph record. The extracted sound is converted to standard digital files and stored for purposes of digital access and preservation.