MySQL to MongoDB Migration with Django
Over the past week we migrated a big part of the Glossi data from MySQL to MongoDB and I’d like to share the process while it’s still fresh in order to help others facing the same issues.
Motivation
For the past 6 months, we’ve been running on a small Linode instance that contained our MySQL database, RabbitMQ and Celery, and Apache. Whenever we hit a problem, which was usually memory related, we’d just upgrade our instance. A typical problem was us getting Linode disk I/O alerts when our database could no longer fit in RAM and had to be offloaded to disk. After meeting with many people who are smarter than we are, we made the decision to move the bulk of our data, consisting of our users’ social media activities, to MongoDB. We decided to keep the more structured data, including our user profiles and credentials, in MySQL since it took up significantly less space and we didn’t want to significantly disrupt our code. Multiple people suggested using MongoHQ to host our MongoDB instance and we decided to give them a try. This would free up a ton of memory and also start moving us to a more scalable solution where the various system components are separated.
Approach
We dove right and set up a free instance at MongoHQ to use as our test environment. We researched various Python/MongoDB libraries and settled on using mongoengine since its model definitions were the most similar to Django and we felt it would be the quickest and least disruptive approach. As an example, we were able to transform the following simple Django model:
class Text(models.Model):
text = models.CharField(max_length=400)
into
class Text(mongoengine.EmbeddedDocument):
text = mongoengine.StringField()
After creating mongoengine versions of our models, we wrote a short script to copy a few objects from our MySQL database into MongoDB to make sure the models lined up and we were able to confirm that everything looked good through the MongoHQ web interface. The next step was to create MongoDB versions of our retrieval and insertion methods and make sure that they behaved identically. After that, we removed all references to the old models and made sure that we still had a functional product. As a final test, we renamed the to-be-deprecated MySQL tables in our development environment and made sure that there were no errors when we ran through our code. This led to us to discover a few edge cases that were missed in the earlier steps. The final data migration involved a script that pulled data from the MySQL database and then load them into MongoDB. Unfortunately, it took me too long to realize that we should have parallelized the migration process so it took longer than it should have.
Last Thoughts
In general, it was a relatively painless migration that was greatly simplified with the really helpful people at MongoHQ and our decision to use the mongoengine library. MongoDB does take a bit of time getting used to but it’s a mind-blowing experience, at least for me, writing your first functional map/reduce job. Another issue to be aware of is that mongoengine/pymongo and Celery may not get along. We ran into a few issues where our MongoHQ connections were not available but we were able to resolve them through some hackish means.
If you have anything to add or have any questions, please feel free to ask me at dan@glos.si





