Title: Project Dirt Date: 2014-12-09 10:00 Author: eamonnfaherty Category: projects Slug: project-dirt
Description of the Work
- Requirements analysis using UML
- Data modelling using UML
- Data migration using PHP and Python
- Server provisioning using Fabric
- Template integration
- Backend programming using Django
I used nginx, gunicorn, django to serve the application. I used redis to store user session data and mysql to store the application data. I used celery as a task queue with redis as the broker. I used elasticsearch to power the search functionality on the site and I used the excellent sentry for exception monitoring.
Within the Project Dirt platform as with most social networks you can be friends with people and you can follow them. Becoming a friend/follower means you subscribe to see updates on what they are doing within the platform on your feed. You can also visit their feed to see who they are following. I had to decide whether to have complex, expensive queries when you view a feed and have a simple update query when you become a friend or follower or to do the inverse. Given the frequency of feed views is far higher than the number of friendships/follows created I decided to have a complex, expensive async task to update feeds using celery instead so that when a feed is viewed it is a simple, cheap query.
Search was a key feature for the Project Dirt platform. To create a great user experience it was imperative to be able to run searches quickly. I decided to use elasticsearch to gain this performance. We were able to deduce a lot of the searches on the site were people looking for others by name or by skill. To help these searches I decided to use the boost functionality from elasticsearch to alter the relevance of results and thus change the order in which they are returned. I also wrote custom search classes that returned snippets of where the results were found.
Project Dirt have been featured in the media several times, appear at community events regularly and have a weekly email newsletter. These real life events cause a spike in traffic. The majority of the traffic in these situations are from users that are not logged in. To cope with the load I decided to use varnish cache as an upstream cache. I setup last modified and etag headers on the pages and used nginx to strip cookies so each non logged user sees the same page - as cached by varnish. This means users can view the pages within the server having to run any python code or contact the database. This strategy was planned from the start and so the models within the site were primed to help with the caching. This was the most rewarding part of the project for me.