A month or two ago I threw up some notes on how I migrated all my data from a legacy Drupal 5 PHP codebase to my nice new Rails based site.
The first post focused on the fundamental logistics of connecting to both databases at the same time and then how you move the data from one to the other, and in this post I’ll now go over some more specific issues situations I encountered which might be of use to others using Drupal (or PHP in general for that matter).
General Strategy
Primary keys in Drupal are of the form nid, vid, uid, cid (node id, version id, user id, comment id) so you can generally use these to keep track of everything.
At first I toyed with the idea of directly reusing the primary keys from the old database however I soon found this difficult to keep track of once the data structure started to change. It was easy enough following simple data structures which mapped one to one, but as soon as I had a situation like threaded (tree structure) comments which had all sorts of extra fields being populated it got nasty – because I was trying to manually populate all the data I would have had to work out how to fill that in myself/
So, the approach I chose was to simply add columns to my new tables which would hold the legacy primary keys, making it very straight forward to link the new records to their legacy counterpart. A simple find_by_uid or find_by_nid and I knew exactly where I was.
General Tips
First it’s probably good to go over some general things you could do with knowing before looking at specific areas of your site.
Times / Dates
In the case of timestamps, the Drupal convention is to store all it’s dates as epoch dates in an integer field. Once you have your epoch value, it’s as straight forward as:
timestamp = Time.at(my_epoch_date)
And you now have a Time object in timestamp.
PHP serialized fields
Quite a few fields in a Drupal DB use PHP’s serialization so for that you just need a copy of Thomas Hurst’s ruby implementation. Download the php_serialize.rb file, plonk it in lib/ and require ‘php_serialize’ in your rake file then it’s as easy as:
date = PHP.unserialize(serialized_date_string)
And there you have your native Ruby Date in date.
Boolean fields
I had a bit of confusion with boolean fields. Looking back over my old DB all the boolean fields were tinyint, but I found myself having to treat them as strings when doing any comparisons. Memory is a bit hazy as it’s a while ago now, so maybe some experimentation might be needed on your part. While it’s expected that find_by_sql might not return a boolean object from a boolean field I would then expect integers to be returned from a tinyint field, however I distinctly recall getting strings and on looking at the old DB, the fields are definitely tinyint.
Anyway, just a warning really, best test this one for yourself rather than just going on my advice.
Use save!
Lacking the feedback when using a rake task, using save! is an obvious good precaution as it will also help you make sure your data matches your new site’s validation rules and highlight any additional areas where you may have forgotten you need to modify your data to fit the new site. Occasionally though you might find it handy to use save_without_validation, such as if you’ve got validation rules in place that aren’t there specifically for data integrity. My example would be my private messaging which I toyed with having a size limit on, so rather than truncate the old messages I left them there and left the rule in place for any new ones.
I’m sure it goes without saying though that if you’ve got other important validation rules on your model then it’s probably a bit reckless to ignore validation entirely and a good idea look into how to override the validation in question.
Converting from BBCode to Textile
In my Drupal days I used BBCode for my user friendly markup, but since joining the Rails camp I’ve naturally adopted Textile. I simply have a helper m() that I use much like h(), which uses RedCloth to create html from the Textile source, then I run all that through white_list.
For the conversion I found the BBCodeizer plugin. With this I simply ran my fields through it and saved them in the new DB with the basic html in them like so:
string_with_html = BBCodeizer::bbcodeize(string_with_bbcode)
RedCloth and Hard Breaks
Another thing to do with markup that you might well encounter is to do with hard breaks – keeping line break as <br /> tags.
The most basic Drupal filter doesn’t add much markup other than simple addition of line breaks and paragraphs, so for most Drupal people this would be expected.
RedCloth adopts a similar behaviour by default for paragraphs and the line breaks should also work too by passing the :hard_breaks option, however it appears to be broken atm. Adding the following to my environment.rb (after the Initializer.run block) fixed it (source Rails Wiki).
class RedCloth
def hard_break( text )
text.gsub!( /(.)\n(?!\n|\Z| *([#*=]+(\s|$)|[{|]))/, "\\1<br />" ) if hard_breaks
end
end
Passing local files into attachment_fu
I found a helpful blog post from Ben Reubenstein, attachment_fu Now With Local File Fu, which told me all I needed to know.
His guide takes you through creating a model called LocalFile which you can then use like so:
avatar = Avatar.new()
avatar.uploaded_data = LocalFile.new(FULL_PATH_TO_FILE)
avatar.save
Pretty straight forward.
Tips for specific parts of Drupal / modules
User Accounts
I used restful_authentication for my main User model and using the general pointers described up to now it’s a very standard transfer of data. The only thing that gets in your way is the passwords which are encrypted differently, so you have no choice but to force a reset.
One thing to bear in mind here is that you need to manually create the salts yourself. restful_authentication / aaa create these when the account is created only, so unless you do this yourself all your migrated users will have no salt. They’ll work fine – an empty hash is allowed – it’s just a bit pointless from a security point of view.
You can find the code to do this towards the bottom of the user model file, in the encrypt_password method:
self.salt = Digest::SHA1.hexdigest("--#{Time.now.to_s}--#{login}--") if new_record?
Custom user profile fields
A standard module in Drupal offers the ability to add extra fields to the user accounts. Administrators can define them with any name and select various formats (text field, text area, check box, etc) for the user input.
Behind the scenes this is made up of two tables with fairly self explanatory names – profile_fields and profile_values. So, let’s say you’re creating a new user having already populated it with the data from the Drupal user table, you would then look up the data for a profile like so:
data = Drupal.find_by_sql "SELECT * FROM profile_values WHERE uid = #{old_user.uid} AND fid = 20"
user.sex = data.first.value unless data.first.nil?
This works on the assumption that you’re manually going through the fields and writing the code for each profile item manually. If you want you could dynamically loop through the profile fields table, but this was much quicker due to it’s simplicity, not to mention that if you want to do that your internal naming, etc, has to be identical.
Comments on nodes
I was a little apprehensive about this one, what with the tree structure, I could see it getting a tad confusing. No need to worry in the end though.
I’m a big fan of better_nested_set, which builds on the acts_as_tree code that comes with Rails (or is a plugin now I think). I won’t go into detail on how it works, all you need to know is that once you’ve added the better_nested_set declaration to your model, all you have to do when you create an object is first save it, then if it’s a child of another record move it with the method move_to_child_of().
So, because of the way I decided to do things with leaving the legacy primary keys in the new database, all I had to do was loop through the legacy comments, creating them as I go, then check for a legacy parent_id and on finding look it up by it’s legacy uid and do a move_to_child_of() and it’s in it’s right place.
Couldn’t have turned out easier.
Oh, one more little thing to mention, having a field named comment clashes with a reserved word so you’ll need to do a SELECT comment as.
Moving from taxonomy to tags
Drupal’s beloved taxonomy system is quite comprehensive, so depending on how you use it, this may or may not be enough to get you by.
I never used anything more than a few vocabularies with a few terms in each, so I decided to go to tagging, adopting acts_as_taggable_on_steroids and literally was only concerned with keeping the appropriate tags linked to my new models which had taken place of various nodes (all of one type though).
So, there are a good few tables but the main ones I was concerned with was term_data and term_node, and this was the basic idea:
- Loop through my (to be) tagged model
- Select all the appropriate rows in term_node (which funnily enough, links terms to nodes), with a join to term_data so I can get the term names
- Loop through all the term_node rows and on each one use
tag_list.add(old_term_name)
Again, pretty straight forward.
Buddy Lists
My Drupal site used the BuddyList module (5.x-1.x-dev – 2007/02/25), and my Rails one uses has_many_friends.
To simplify things I warned people that any outstanding invites would be wiped (it wasn’t hugely used anyway). In addition, if you used the “Buddy Groups” feature of BuddyList you’ll have to do some more coding as has_many_friends doesn’t have an equivalent feature.
So, assuming then you just want to move the friendships, it’s nice and easy – one table in Drupal, one in has_many_friends. “uid” becomes “user_id”, “buddy” becomes “friend_id”, and then it’s just a few timestamps.
Like everything before – loop….. create…. loop.
Guestbooks
In Drupal I used the Guestbook (5.x-1.0) module but for my new site I just rolled my own – a simple threaded tree using better_nested_set in pretty much the same way as I’d done for comments.
It has to be said, this really was pretty crude – one table with each row containing not just the guestbook post, but the data for the one and only possible reply in it too.
To migrate it I simply went through each user and for each one selected all rows in the guestbooks table related to them and created a new guestbook post for it. I then looked for a reply, and if I found one did the same and used move_to_child_of() to establish it as the child of the other one I’d just created.
I also took this chance to make them all my friend, what with me being the equivalent of ‘Tom’ for my site.
Private Messages
I’ve left this one to the end because I’m not going to go into much detail on what I did as, 1 – there’s a reasonable chance you might want to use a different plugin with more features (such as folders), and 2 – if you don’t, I can’t honestly recommend the plugin I used.
My Drupal site used the Privatemsg module (5.x-1.7) and for my new Rails site I chose, restful easy messages.
If you’re literally only using basic functionality (no sent or trash folders, no user folders, only single recipients) there are a few choices to go for and it’s simply a case of using the techniques used above to transfer the data – it’s all the same, just slightly different format. Loop…. create…. loop.
So, why don’t I recommend restful easy messages? Well, I picked it as I wasn’t keen on the non restful design of acts_as_emailable and easy_messages so at first sight this looked like a nice option. But I soon ran into problems while migrating my data, populating fields such as receiver_deleted & receiver_purged. I naturally set those to true or false, and after a whole load of head scratching realised that both were effectively being evaluated true. Looking in the plugin the select conditions being used were “IS NULL” or “IS NOT NULL” rather than using AR to pass true or false in.
:-s
I also then had problems with the helpers in the generated views – personally feel using a helper to a simple link_to isn’t really necessary unless it’s being done say five or ten times, but I figured since it was in the generated view I’d leave it, but even they didn’t work – the anchor text wasn’t sanatized and the user instance var isn’t passed so the generated path is wrong.
Anyway, I don’t want to go on and don’t mean this to be some sort of flame, but has to be stated that the code is pretty funky.
If you’re only using basic messaging, it’s so bloody simple it’s hardly worth using a plugin anyway as it’s not a complex creation, which is why I just rolled my own next time around. If you want something more capable, the one I thought looked very complete and nicely done was Phil Sergi’s acts_as_messageable. I almost used it for another project but eventually ditched it for my own creation as it just seemed like overkill. has_messages sounded quite nice, but I didn’t like the look of having to install half a dozen other plugins just to get it working.
But what about nodes?
Oh yeah, nearly forgot them :-p
I didn’t actually migrate them as my site’s focus was entirely on the functionality provided by my custom module and there was very little else (a good indication as to why Drupal wasn’t suitable for me). Still, if you’ve understood everything so far it’ll be no different to everything else. If you need to preserve versioning, I’d imagine Rick Olson’s acts_as_versioned will no doubt be of use (can’t say from experience, but you can’t go wrong with the ‘weenie).
Wrapping up
Long post, no wonder I put it off for ages.
Hopefully if you’re trying to migrate from Drupal it’ll be of some help. Even if I’ve not covered the specific modules it should give you enough of info to tackle any other part of it.
Oh, and if you want to see the finished product, the site’s pearsontowers.com.
