Using Ruby to Migrate Databases

If you deal with databases for a living, eventually you’ll come across cases where you’ll need to migrate a lot of data from one schema to another. I am not just talking about migrating from one different type of database to another, like from Oracle to MySQL, but from, for instance, a badly-designed schema to one more expertly crafted.

If there are minor differences between the source and target schema, this is a trivial affair. On the other hand, if the schema is completely different, this can be quite a challenge. Moreover, the database being migrated might represent a high-demand website that will need to be done with little or no downtime, with lots of planning and preparation to boot. You may be interacting with the application developers, the systems crew, and juggling tight deadlines as well.

Well, as you may have guessed, I have described some of the roles I now play at a leading social networking company. We are indeed in the midst of creating the “NextGen” product — a complete rewrite and redesign. The new system is designed with modularity and scalability in mind. The old system we are transitioning from was created when the company was much smaller and had 2 orders of magnitude or more less demand. Suffice it to say, it has all the appearances of being crafted by a bunch of “juniors” that just quickly browsed through “PHP for Dummies”, “Database Design for Idiots”, and the like the night prior. That the aging application still works at all is seen as the “8th Wonder of the World”, but to it’s credit it brings in millions in revenue despite all of its faults.

I am an “old veteran” when it comes to software development. In my “advance age”, I’ve decided to do databases as something that I’ve not done before in my 30-year career as software developer. The nice thing is that I find much I’ve learned about algorithms and data structures can also be applied to schema design. It also helps with interacting with the applications development team as I can relate to what their needs are and “bridge the gap”, as it were between the code and the database.

I have chosen Ruby out of all the languages I know — Python, Perl, PHP, C++,Java, etc. — because of it’s expressive power and meta-programming capabilities that most of the other languages don’t either do well, or lack  a clean syntax to accomplish the same.

First, let me speak of my general approach to data migration. You have your source and destination databases. Of your source databases, you will obviously have the main database containing the enterprise’s lifeblood information. Some of that data will relate directly to customer/account activity; some may relate to configuration of how that data is handled; other data may serve as a reference, such as a zip-code database.

Similarly, you will also have target databases, with the same type of data, but organized differently — hopefully more efficiently. Also, what may have been denormalized in the source database you might choose to normalize it in the target, or vice-versa. Perhaps password for user accounts were in plaintext in the source and now you need to md5 them in the target.  Perhaps there were a fixed number of columns in the source tables representing some resource that you wish to store as separate rows in the target for added flexibility and expandability. Again, if you are only dealing with a couple of tables, it’s trivial to do the migration. If, on the other hand, you are dealing with dozens of tables, the problem explodes in complexity.

Since I want to illustrate doing a migration, I don’t want to bog you down with a complex schema; instead, I will take a simple example. Suppose you have a picture display site where each picture was represented by a column in the users table, and you need to migrate this to a more flexible system that will allow any number of pictures per user. If you have 10 million users in this table, doing a ALTER TABLE every  time needed to expand on the number of pictures would be just plain silly.

1
2
3
4
5
6
7
8
9
10
CREATE TABLE old_accounts (
  id INT auto_increment primary key,
  name varchar(100) not null,
  email varchar(100),
  picture1 varchar(100),
  picture2  varchar(100),
  picture3  varchar(100),
  picture4  varchar(100),
  picture5  varchar(100)
) ENGINE=MyISAM;

And here is the new schema we wish to migrate this to:

1
2
3
4
5
6
7
8
9
10
11
12
13
CREATE TABLE new_account(
  userID INT auto_increment primary key,
  given varchar(50) not null,
  sur varchar(50) not null,
  email varchar(100)
  ) ENGINE=InnoDB;
 
CREATE TABLE pictures (
  pictureID int not null auto_increment,
  userID INT not null,
  url varchar(100) not null,
  unique index(userID, url)
) ENGINE = InnoDB;

I have deliberately left out the foreign key specifications for clarity — and some would argue it would be a nasty performance hit under some circumstances, though I’ve not run into that problem personally.

I have written a complete Ruby framework specifically for migration, but as of the time of this writing, that code is proprietary and not yet released to open-source, though eventually I may do that if I get clearance. But basically, I use Ruby classes to represent a “unit” of migration — normally a single source table to one or more target tables. So, using my Migration framework, here’s what this migration would look like in Ruby:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class UserMigration < Migration
    def migrate_map
        @src_table = {
             :old_accounts => {:PK => :id}
        }
        @dest_table = {
            :new_account = {
                :PK => :userID,
                :id => :userID,
                :name => :given,
                :email => :email,
            },
 
            :pictures => {
                :PK => :pictureID,
                :FK => {:new_account => {:userID => :userID}}
                :picture1 => :url,
            },
 
           :pictures => {
                :PK => :pictureID,
                :FK => {:new_account => {:userID => :userID}}
                :picture2 => :url,
            }, ... 
        }
    end
end

Well, that’s it — almost, and there’s a problem in the Ruby code that you will catch right off the bat if you know Ruby — and I think that if you look at it for a bit, you can figure out what’s going on here. So I’ll leave that as an exercise for you to mull over. You don’t really need to know Ruby at all to understand what’s going on here, and that’s the bit I like about Ruby. You can use it as a type of “meta-language” if you know what you’re doing.


KDE Konsole Backgrounds and ssh

If you are a GUI-oriented person, you need not read this. But if you are like me, you make heavy use of the console. If you are managing many machines as well as your own Linux workstation, it’s VERY important to know where your console session is.

Too many times in the past I had wanted to bring down my workstation, and would type “shutdown” or “reboot” in the console window, only to find out to my horrors that the console was really a remote session to one of my web servers serving up hundreds of web sites.

Whoops!

Well, that prompted me into developing a solution where I can tell at a glance where I happened to be logged in. This way,  I wouldn’t be in danger of issuing dangerous commands on the wrong server. And if you are working for someone else, it also keeps you from being FIRED!

I use KDE to do my development and administration, and I have fallen in love with Konsole. Konsole, despite its quirks, has a lot of nice features that makes it a shoe-in for what I am talking about.

My approach now is to create login bash scripts to begin a session with whatever machine I need to ssh into, and have that script also do something nice to the Konsole background in the process.

I also develop a lot of websites, as well as other things. Sometimes, it’s helpful to change the background when I go into emacs so that I have the proper contrast for syntax coloring, etc. Same approach works there as well.

kschemaset_konsoles.png The secret to my machinations? A little script I wrote called kschemaset. It does all the “magic” in  resetting the Konsole background, and is actually derived from a similar script that didn’t do everything I needed. But in the fine tradition of opensource, I grabbed it and enhanced it over time. It currently does require you to either create images or acquire images for your backgrounds, but eventually I wish  to create a more comprehensive opensource package that will do all that magic for you automatically. But for now, it’s a lot of fun to come up with a cute background that represents the server you are working on! I recommend choosing either very light pastel colors or very dark colors with low contrast, because you want to be able to read the text without killing your eyes.

You don’t have to create any artwork at all, actually, and many may elect to do it that way. It’s all up to how you want to proceed.

The first step is to install the kschemaset script somewhere where it can execute. There are actually other related scripts involved, and they are all available from here. Typically, I create a local bin directory to my login account and alter the .bashrc file to add it to the PATH environment variable. Since you are obviously well adept at such things if you are interested, I won’t bother with holding your hand here.

The next step involved is to create Konsole schemes to reflect the servers you wish to work on. All kschemaset does is invoke a schema for the duration of the session, and restore the old schema when the session is concluded. Eventually, not only do I want to automate the creation of new Konsole schemas, but also the images used for the backgrounds. But for now you can do this by hand — or automate these steps yourself. If you do, let me know so I don’t wind up reinventing your wheel!

Konsole has one annoying problem that seems not to have been fixed in recent times — when you add a new schema, any Konsoles that were open prior to the addition tend to get “confused” and may start displaying the wrong schema. The workaroud for this is to either to manually select the schema for those Konsoles out of sync, or to restart them. Hopefully this problem will be fixed soon.

Now, you simply create scripts based on kschemaset that will launch your new sessions, and change not only the background, but the tab text as well, on the fly, and to reset everything to the pre-existing schema and tab text when you’re done. I’ve even did this with emacs to give me a flat-black background to do my editing on.  The possibilities are endless.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/bin/bash
# Set the schema of the currently-running Konsole
# Based on konsoledcopschema
 
 
# So running script can know this is where it's running
# (to enable scripts to enable kschemaset via recursion)
export kschemaset=1
 
. kconfuns
getKonsoleInfo
getAppInfo ${*}
shift
 
if [[ ! ${appSchema} ]] ; then
    echo "(schema $appName not found)"
fi
 
export -f ksetSchema getKonsoleInfo getAppInfo
 
if [ "${inKonsole}" == "1" ] && [[ ${appSchema} ]] ; then
    # dcop $konsole konsole reparseConfiguration
    dcop $konsole $session setSchema "${appSchema}"
    dcop $konsole $session setSchema "${appSchema}" # testing -- may not have gone through the first time
    if [ -n "${appName}" ] ; then
        dcop $konsole $session renameSession "${origSession}: ${appName}"
    fi
 
    if [ -n "${*}" ] ; then # run the command and reset to prveious schema.
        ${*}
        dcop $konsole $session setSchema "${origSchema}"
        dcop $konsole $session renameSession "${origSession}"
    fi
else
    kschemaset=2
    ${*}
fi

And now for the next script:

[-]?Download kconfuns
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/bin/bash
# Konsole functions. Include with ". kconfuns"
 
kloc=~/.kde/share/apps/konsole/
 
getKonsoleInfo()
{
    # Make sure we're in Konsole
    if [ -n "${KONSOLE_DCOP}" ] ; then
        export inKonsole='1'
        export konsole=`echo $KONSOLE_DCOP | cut -d\( -f 2 | cut -d, -f1`
        export session=`dcop $konsole konsole currentSession`
        export origSchema=`dcop $konsole $session schema`
        export origSession=`dcop $konsole $session sessionName`
    else
        export inKonsole='0'
    fi
}
 
getAppInfo()
{
    export appName="${1}"
    export appSchema=`ls -1 $kloc | egrep "${appName}\."`
}
 
ksetSchema()
{
    dcop $konsole $session setSchema $1.schema
}
 
ksetSessionName()
{
    dcop $konsole $session renameSession "$*"
}

And here is an example of lauching an ssh session using kschemaset:

[-]?Download polaris
1
2
3
#!/bin/bash
# Log on to the Polaris Web Server
kschemaset Polaris  ssh -t youraccount@yourserver.yourdomain.com $*

And it’s that simple!


Welcome!

Welcome to my Linux site, the Linux Bloke.