Skip to main content

Have I started hating mysql and falling in love with distributed databases

It seems Mysql is rock solid if you want:
  1. Transactions
  2. ACID support
So I would still recommend mysql for any thing that is mission critical data and is the primary datastore for your transactions. But what about derived data or analytical data?

I had built large scale cluster of mysql server storing metadata about billions of files and folders used by tens of thousands of customers daily and its scaling fine and working good, its still growing at a healthy rate and holding up.  But this requires a lot of baby sitting if you have 100s of nodes and you need to do
  1. replication
  2. add more nodes
  3. rebalancing data
  4. monitoring entire cluster
  5. Sharding
  6. Backup/restore
You have to write a lot of tooling and lot of monitoring/babysitting to scale the cluster. Plain stock Mysql will scale up to a limit but vertically scaling has its own issues. So +1 for Mysql but not everything should be stuffed there.

But recently me and my team built full text indexing on same dataset using elasticsearch and it seems so far it hasn’t disappointed me. With just 1 engineer and 1 devops guy we are able to build a cluster per datacenter
to store same data. The thing I liked most about elastic search was half way through migration we started facing performance issues and we just added more nodes and the cluster rebalanced itself.  Also elastic search has tools like kopf/HQ where I can monitor all nodes in one places.  For e.g. this is one of the smallest cluster that we just started migrating and as it grows if we see high load averages then we can add more data or client nodes.















I dont need an army of dbas to manage the cluster as elasticsearch has built in support for
  1. replication
  2. add more nodes
  3. rebalancing data
  4. monitoring entire cluster
  5. Sharding
I had earlier built an event store to store events on top of mysql but I was storing only few months of events into it. Now I have to build a store that can store events for 7 years and I dont want to use Mysql for it as I dont want to baby sit it. Our events data is way way more than the metadata. Because every change to a file generates an event and over 7 years this could be a huge no of records. I dont want to manage an army of mysql servers so researching for some database that has good querying support and can store long lived data with eventual consistency and rock solid durability.

Comments

Popular posts from this blog

Docker Aha moment

I had read many articles about docker before but today was the day when I realized the aha moment.  I had designed database structure of a new application where different micro-services uses their own schemas and each can be located on its own server depending on the scale requirement. Locally I had kept them on same mysql host but on production it will be used on different Google CloudSQL instances.  The aha moment for me was when I was able to fire up multiple mysql containers and map them to a different port and do end to end testing of my sharding code all on local box.

All I had to do was

docker run -p 3306:3306 -d  gcr.io/xx-us/mysql:5.6
docker run -p 3307:3306 -d  gcr.io/xx-us/mysql:5.6
docker run -p 3308:3306 -d  gcr.io/xx-us/mysql:5.6
docker run -p 3309:3306 -d  gcr.io/xx-us/mysql:5.6
docker run -p 3310:3306 -d  gcr.io/xx-us/mysql:5.6
.....

and I had many mysql servers ready to be tested.   In the VM world I would not even had dared to start 5 vm on my laptop.

IPhone will beat DSLR in long run

I started taking interest in photography recently and have accumulated a decent amount of gear but I am realizing that the ease of taking out your phone and clicking picture will beat the DSLR in long run. A friend recently visited me from NY and we wanted to take a family picture and I was setting up Tripod and Flash and doing settings changes and he was like leave all this, lets take a Selfie and that’s it, in 2 second the picture was done and he shared it on facebook in another 1 second.  Now one can argue that DSLR would have clicked a better picture but DSLR has many things going against it:-
Learning curve : I must have spent 200+hours on reading about photography but still cant take decent pictures as my bar is high. Not everyone is interested in spending this much time.Amount of gear to be carried : On hikes its a pain to carry your DSLR whereas your phone has to be anyway with you.Cognitive effort of tweaking the gear: You have to have a different lens/settings for different s…

Embracing "Deep Work" for productivity

Do you have that feeling where you worked all day but at the end it feels like you got nothing done?  In past 7 years at Egnyte to reduce burnout every 6 months to an year I have to fight this constant productivity battle. I have accumulated several habits in the process to increase productivity, some of them are:-
Give yourself scheduled time: Allocate 3 hours for creative work, I had my calendar open whole day and I would get interviews and meeting scheduled randomly all over the day. When you are on a maker's schedule this is disastrous. I recently allocated 3 hours of calendar time and I reject meeting invite unless its absolutely urgent and I am contributing to it.Walk in middle of the day: by the middle of the day brain feels tired and I cant code or focus so I started 30 min walk. I tried listening to podcast or listening music on the walk but that felt more work so I stopped doing it. I just walk and think on the current problem on hand.5 minute rule: If you can finish the …