Skip to main content

And you thought Memcached calls are cheap?

We use Memcached as a distributed cache. We also use sharding in MySQL and each customer data is on different shard. The information about which customer is stored on what shard is static and doesn’t change unless we do a manual customer move.  Because we would do an automatic move a customer in future I started storing this in memcached instead of JVM so I don’t need to coordinate in memory cache flush across multiple boxes.

We recently moved a large scale system from python to Java and we were getting close to 2K request per minute on every machine. On checking APM tool I found that a lot of the time is spent in memcached.getObject. Checking one of the memcached box out of many I would found CPU on one of the cores would be pegged at 100%.  I installed  mctop by Etsy and found that the most used key was the key that would lookup this customer is on what shard

sudo /opt/mctop/bin/mctop --interface=eth0 --port=23456 


Inspecting all other data centres showed similar symptoms. So I added code to cache the information in a concurrent hashmap in JVM instead of Memcached. I handled move customer by implementing a distributed flush. Last week the fix went live and immediately I can see the response times improved overall across all services by 1-2 ms. I saw Get Operations drop from 15K to 5K per seconds on Memcahed instances, I also saw a drop in CPU usage on memcached box.

Lesson learnt in high scale systems even Memcached call is not that cheap.

Get operations drop on one instance

CPU Usage drop on one instance


Total Operations drop on one instance


We observed a similar issue in another part of the system with another out of process call. Each sync request was making a call to a remote Authentication service. We were getting close to 7K request per minute does token based authentication on each pod. Under huge burst that service starts taking 35-200ms instead of 30ms.  We made a fix over weekend to remove that call and we can see that system is calm now after the fix went live (vertical line in graph denotes when fix went live).

Comments

Popular posts from this blog

Compartmentalization helps with Deep Work

I had been trying to learn Solidity/Ethereum over the weekends for the past few months and the first 3-4 weekends were a drag as no matter what I do I wasnt able to focus and getting no where. The problem was not with motivation as I was trying to do it for many weeks but all I was able to do was read 100s of blog posts about it but not able to code anything.

Aparently I realized that on weekdays most of my work is in the "study" room whereas on weekend I was trying to do it in the living room. Now working in study on wekeend was an issue as it felt more like work than fun so last 2 weekends I tried changing the schedule and went 3 hours every Sunday to library with my son and while he was reading books I was coding in solidity.

I also had trouble writing code after 7:00 PM as I thought my brain was tired but last week I tried sitting in study around 10:00 -11:00 in night and boy I was able to focus and code.

Net Net I realized that:
"Having a consistent Routine help wit…

Adventures of a nature lover - 5 national parks in 14 days

To unplug from work and recharge myself I do a 2-3 week trip every year where I am unplugged. Few of the reasons I can totally unplug from work is
Unlimited Vacation policy of Egnyte, Excellent support by the Infrastructure team Our ethos of pro-actively fixing issues before they become nuisance.
TLDR; It's a long post so you can scroll down and first see see images if you need motivation to read it entirely.

Me and my family like national parks and camping to recharge us as there is no cell phone  coverage in parks and you are completely unplugged from technology most of the times. We have done many of the national parks nearby and this year we want to see glacier national park as the glaciers may disappear in 10-15 years so see them before they are gone. Behind every successful trip is a "Trip planner" and for our family its my wife, she researched  and made a trip itinerary book.

She booked camp sites 6 months in advanceShe researched trails and as days are few she pres…

Seven things doing a 1000 piece puzzle has common with complex engineering projects

I was doing grocery shopping during the New Year holidays and the store had a lot of 1000 piece puzzle on sale for $11. My son had never done more than 100 pieces and  I was like hey this seems interesting for him, so I bought one. We started working it on Jan13th and finished between 4 people on Jan25th. During the journey of finishing I saw a lot of similarities with complex engineering projects. I think everyone in engineering should do one of these and here are some of the things I learned.


Underestimating the task: I grossly underestimated the task and amount of time it would take for my son to do it.Teamwork: After a day or two I realized my son lost interest, the whole family had to be involved to keep him motivated on it.Prep work: Like engineering projects, you need to do a lot of prep work like:Turn  the pieces down Study the patternsSort the piecesDivide and rule: Like engineering projects you need to pick some quick wins initially to get off the ground and start assigning t…