A couple of weeks ago I finished moving a site of mine from a Drupal 5.1 codebase to a new one written in Rails so I figured I’d throw up a quick run down of the issues encountered and the solutions I employed to deal with them. What’s covered in this post is the basic mechanics involved in the migration and contains very little Drupal specific stuff. I’ll follow it up shortly with some other bits that will be handy for anyone wanting to get their data out of a Drupal schema.
All in all though, it was remarkably straight forward and painless. Having had it in the back of my mind for the previous month or two, on the morning I sat down to start working on it I had a pretty good idea of how I planned to tackle the problem and by the evening I had group of 7 rake tasks that did the whole job in about 15 minutes.
What’s involved
I’m not sure how many different ways one might consider attacking this, but I figured since it’s a one off task in a controlled environment I’d sacrifice any concern for speed and efficiency in favour of simplicity and therefore I chose to make use of similarities between the way Drupal models it’s data with ActiveRecord to iterate through the data I wanted in a predictable manner.
Drupal’s node system works such in a way that each node object relates to a row in a table, so having this basic property in common with AR is enough to know that you’ll most like be able to just select a whole load of rows from your Drupal site, loop through it and create new objects in your new app as you go.
Once you know this, assuming your old app’s fundamental modelling isn’t vastly different to your new one’s all you then have to work out is how you’re going to connect to both databases and then just iterate through the old data and create the new data as you go.
The basic rake task
After a bit of research into different methods I found out how to open up a connection to my legacy database by creating an AR class which I could use as my way to get the data out of it, while the rest of my app remained the same around it.
The rake task shown below is the one I used to transfer all my user accounts, albeit with a bit of extra stuff removed which dealt with all my custom user profile data. All the other tasks were based on the same code with the only changes being to the sql statement and the contents of the while loop which iterates through what that pulls back.
BTW, if you’re not familiar with rake tasks, check out the Railscast linked to at the bottom of the article.
namespace :db do
namespace :legacy do
desc "Transfer user data"
task :migrate_users => :environment do
class Drupal < ActiveRecord::Base
establish_connection "legacy"
set_table_name "users"
end
ActionMailer::Base.perform_deliveries = false
User.record_timestamps = false
@old_users = Drupal.find_by_sql "SELECT * FROM users WHERE uid > 0"
for old_user in @old_users
new_user = User.new(:login => old_user.name,
:email => old_user.mail,
:sig => old_user.signature)
new_user.created_at = Time.at(old_user.created)
new_user.updated_at = Time.at(old_user.changed)
new_user.save
end
end
end
end
So, if you’ve been working with Rails for a little while then you’ll probably get all you need to just from reading that snippit of code but it’s still worth reading on to see why I did things a certain way. If you don’t understand what’s going on, here’s an in depth account of what’s happening.
What we’re doing is creating a new class descended from ActiveRecord::Base which, if we left it like that, would be no different from if we’d created a model called Drupal. However, that’s not what we want as we’re using this as our way to connect to the the legacy database, so we have to change where it plans to connect to. For this we have the following lines:
establish_connection "legacy"
set_table_name "users"
The first line here overrides the connection it inherited from ActiveRecord::Base which will already be set to connect to your production or development database and tells it to connect to ‘legacy’ which I have defined in my databases.yml just like the others. In addition to this though, because it’s not a typical AR model, when it looks for our table called drupals it’s not going to find it which is the reason for the second line.
This tells AR that the corresponding table for this class is users which I’ve merely set to satisfy AR. In practice I only ever plan to use find_by_sql so this bears no influence on the way the migration, however if you now did a Drupal.find(:all), you’d get the contents of the user table returned.
I did think there was a better way though as from what I’ve gathered, setting abstract_class to true (i.e. use self.abstract_class = true instead of setting the table name) should tell AR that your class doesn’t have a corresponding table however when I tried this it still looked for a table named drupals, so I just went with setting the table name.
Anyhoo…. after that there are these two lines:
ActionMailer::Base.perform_deliveries = false
User.record_timestamps = false
The first one turns off mail delivery so that things like registration mails working off observers don’t get sent (as will be the case if you’re using AAA or Restful Authentication).
The second one turns off timestamps on the new users I’m creating so that I can preserve these from the old records. Incidentally I experienced some strange behavior with this having initially set this on ActiveRecord::Base which seemed to work for created_at but not the updated_at field (and in addition to this, I only observed this with the User model). Setting it directly on the model in question had the desired effect.
So, once you’ve done this, you can use the Drupal class to do a find_by_sql against your legacy database and pull back any info you like.
It is possible to go into more depth to connect to a legacy schema through AR, defining several classes and defining has_one and has_many relationships by overriding table and foreign key names, but like I said before, for simplicity’s sake I decided I’d rather iterate through tens or even hundreds of individual statements rather than pull my hair out making it more complicated just for the sake of a few minutes gained by the joins.
And so, once you’ve done that find, you have all your legacy data in a nice orderly collection and you can loop through it and create the entries in your new database as you would do normally.
And that, pretty much, is that.
Based on this, you have full access to all your legacy data and can perform whatever sql is needed to pull back other data, nest multiple loops / statements within each other to create your models with has_many relationships, etc. In my next post I’ll cover a few specifics which you’re likely to run into if you’re migrating from a typical Drupal install.

I was looking for something like this to pull info from a legacy Oscommerce database. Good thing I found your blog! Nice post.
Hey Chris, glad to have helped. Should really get off my arse and put together the follow up article for this….