Skip to main content

And you thought Memcached calls are cheap?

We use Memcached as a distributed cache. We also use sharding in MySQL and each customer data is on different shard. The information about which customer is stored on what shard is static and doesn’t change unless we do a manual customer move.  Because we would do an automatic move a customer in future I started storing this in memcached instead of JVM so I don’t need to coordinate in memory cache flush across multiple boxes.

We recently moved a large scale system from python to Java and we were getting close to 2K request per minute on every machine. On checking APM tool I found that a lot of the time is spent in memcached.getObject. Checking one of the memcached box out of many I would found CPU on one of the cores would be pegged at 100%.  I installed  mctop by Etsy and found that the most used key was the key that would lookup this customer is on what shard

sudo /opt/mctop/bin/mctop --interface=eth0 --port=23456 


Inspecting all other data centres showed similar symptoms. So I added code to cache the information in a concurrent hashmap in JVM instead of Memcached. I handled move customer by implementing a distributed flush. Last week the fix went live and immediately I can see the response times improved overall across all services by 1-2 ms. I saw Get Operations drop from 15K to 5K per seconds on Memcahed instances, I also saw a drop in CPU usage on memcached box.

Lesson learnt in high scale systems even Memcached call is not that cheap.

Get operations drop on one instance

CPU Usage drop on one instance


Total Operations drop on one instance


We observed a similar issue in another part of the system with another out of process call. Each sync request was making a call to a remote Authentication service. We were getting close to 7K request per minute does token based authentication on each pod. Under huge burst that service starts taking 35-200ms instead of 30ms.  We made a fix over weekend to remove that call and we can see that system is calm now after the fix went live (vertical line in graph denotes when fix went live).

Comments

Popular posts from this blog

IPhone will beat DSLR in long run

I started taking interest in photography recently and have accumulated a decent amount of gear but I am realizing that the ease of taking out your phone and clicking picture will beat the DSLR in long run. A friend recently visited me from NY and we wanted to take a family picture and I was setting up Tripod and Flash and doing settings changes and he was like leave all this, lets take a Selfie and that’s it, in 2 second the picture was done and he shared it on facebook in another 1 second.  Now one can argue that DSLR would have clicked a better picture but DSLR has many things going against it:-
Learning curve : I must have spent 200+hours on reading about photography but still cant take decent pictures as my bar is high. Not everyone is interested in spending this much time.Amount of gear to be carried : On hikes its a pain to carry your DSLR whereas your phone has to be anyway with you.Cognitive effort of tweaking the gear: You have to have a different lens/settings for different s…

Embracing "Deep Work" for productivity

Do you have that feeling where you worked all day but at the end it feels like you got nothing done?  In past 7 years at Egnyte to reduce burnout every 6 months to an year I have to fight this constant productivity battle. I have accumulated several habits in the process to increase productivity, some of them are:-
Give yourself scheduled time: Allocate 3 hours for creative work, I had my calendar open whole day and I would get interviews and meeting scheduled randomly all over the day. When you are on a maker's schedule this is disastrous. I recently allocated 3 hours of calendar time and I reject meeting invite unless its absolutely urgent and I am contributing to it.Walk in middle of the day: by the middle of the day brain feels tired and I cant code or focus so I started 30 min walk. I tried listening to podcast or listening music on the walk but that felt more work so I stopped doing it. I just walk and think on the current problem on hand.5 minute rule: If you can finish the …

Can you remain a fullstack developer?

I started as a full stack developer 14 years ago but these days its becoming more and more difficult to remain a one. Back in those days all you needed to know was html/css/Js/jsp/java/sql/ant/xml and some tools like tomcat, svn, eclipse and some shell scripting and you are a full stack developer. Being full stack developer means you can code from UI layer to server to database and peel any layer of onion to trace an issue.

Now a days you may need to know 20 different technologies in each area before you can easily navigate between layers. Life becomes difficult if its a distributed system. In UI you may need to know
ReactAngularJquerySASSHTML5JavascriptNode.jsGrunt and many more In server you need to know
JavaSpringHibernate or any OR toolGuavaNginxHaproxyMemcached and many more.  In Database you may need to know
MysqlNOSQL databases like Cassandra or MongoDBShardingAWS Aurora or RDSElasticSearchRedisOpenTSDBHadoop Big data services like BigQuery and many more On top of that …