DDoS Attack on Bitotsav '19 Website

This is not a technical write up. This is a story that I want to share which might be a lesson for several Web & App developers out there. Bitotsav is the annual cultural fest of BIT Mesra and the Tech Team was responsible for developing the Website and the Android App. There were 46 events in the fest and we were responsible for sending out updates (announcements, results, etc.) for each event to the app.

16th February 2019, on the 2nd day of the fest, the website started experiencing problems. The website got slower and slower. The website was hosted on Microsoft Azure Virtual Machine. It was evening around at 5:00 PM. I checked the logs for the Virtual Machine. In the past 6 hours, 700 GB of data was uploaded and downloaded from the server. The Virtual Machine had 4 VCPUs and all of them were being rigorously used. Due to such huge traffic, all the APIs stopped responding. The app was no longer usable. No updates could be sent to the participants. What was happening? In the last one month, hardly 10 GB of data was transferred and all of a sudden, in the last 6 hours, 700 GB of data was transferred. Clearly, the site was receiving way too many requests that it was capable to process.




The first solution that came to my mind was to increase the capacity of the server. I doubled its size (Cloud lets you scale as per your needs). Now, we had 8 VCPUs and 16 GBs of Memory. I expected that this configuration might easily handle all the requests. I was wrong! The incoming requests were so fast that there were approximately 2000 requests pending in the pool at a time. The server crashed and restarted repeatedly. It was failing to process any request.

I came back from Infocell to meet my team - Ashank and Aakarshit. Ashank was clear that it was one of the API that was failing. An API is used to extract information from a database. There were several APIs running on the machine. For example:

/getEventById - This API gives out the details of an event. The requester has to send Id of the event.
/getTeamDetails - This API gives out the details of each member of a team which is registered for an event. The requester has to send Event Id and Team Leader Id to get the details.

These are just two of several other APIs. The two APIs above had a specialty. They don't require Authentication for their use. Anyone can post a request to these APIs. The second one, i.e /getTeamDetails, involved heavy operations on the database and required some time to complete the processing on the server. We were aware of the fact that these APIs can be exploited.

 Ashank asked me to start logging the API which was being requested. Our guess was correct - it was /getTeamDetails which was being repeatedly requested. There were more than 1000 requests per second. Humans can't generate such huge requests. They were being generated by Bots. They were automatically being generated by code to nail the server. But who would do that? We started logging the IPs from which the requests originated. We checked the IPs and realized that all the IPs belonged to Asia Pacific Network. The IPs were distributed from Brisbane (Australia).

What! Who on earth has so much time to find the API and repeatedly send requests? First of all, the APIs were not publicly available. The app needs to be decompiled to find the API. Second, its Bitotsav! Who on earth wished to target such a small application, when there are millions of other important websites to target. Whoever, he/she was, he/she has invested a lot of time in setting up the attack against the website. We were not convinced that the attack could originate from a foreign location. Maybe, he/she was someone from India who was using a proxy from Australia. 700 GB of data! It should have cost him/her a fortune to use such proxy servers. We decided to call Paritosh. Paritosh is an expert in handling nginx (the server that we used). He told us to limit the number of requests by changing the server configuration. We limited the number of acceptable requests to 10 requests/second. That worked. But not for long. We were ignoring too many important queries that must be answered, without which the app won't function properly. We decided to remove the limit. 

Next, We started logging the number of unique IPs from which we were receiving the requests. There were over 300 unique IPs from which we were receiving requests. Oh My God! It was a proper DDoS attack; an attack suffered by many popular websites around the world. 300 computers around the world were sending out thousands of requests and killing the entire machine.

Suddenly, we realized that the location of the distribution of IPs was a bad measure to find the location of the requester. We must instead find the telecommunication carrier of the IP. We checked all the IPs from which we received the requests. It turns out, the major carrier was Reliance Jio Infocomm. The circle from which the requests originated was Jharkhand, India. Bang! We were 100 steps closer to the culprit now. He/She was within the state.

"Let us block those IPs" suggested Paritosh.

"Not possible, as the IPs are dynamic and we might end up blocking some genuine devices".

The next one hour went around guessing the attacker who could have initiated this attack. Was it the guy from junior year, whom we rejected in the interview for the Tech Team? Or was it Ayush Gupta (K16) from the Tech Team itself. Apart from us, he was the only one who knew the API and he was not picking up his phone since morning. Or was it some guy from NIT Jamshedpur (NIT Jamshedpur is our rival college and they don't have a website for their cultural fest.)?

The guessing game ended. We had no time to find the culprit. We have to find a remedy to counter the attack. We decided to use Azure DDoS protection. Azure DDoS protection uses Machine Learning to detect an attack. Unfortunately, the attack was fast enough to kill the server and slow enough to bypass Azure DDoS protection. What an intelligent attack!! It was 10:00 PM at night. All three of us were still struggling to get everything up again. Ayush Raj and Aakarshit suggested us to use CloudFlare Protection. You must have seen CAPTCHA verification on some websites. It is provided by CloudFlare and it provides good protection against DDoS. It took 2 hours to update the sites' DNS. We expected a slowdown. But the attack multiplied. We were receiving more and more requests.

Ah! Nothing was working. What was wrong? It is an apocalypse. It is a nightmare for every programmer to see his/her program failing in production. Calm down! We had to do something. We couldn't have let things slip from our hands.

We decided to log the results of each request. As I said, the API required Event Id and Team Leader Id to be sent by the requester for processing. We realized that each request was successfully delivered, i.e, the attacker was sending correct Event Id and Team Leader Id and such a team for the given event was always found in the database.

"Since the number of teams is limited, let us find the number of unique combinations of teams being requested!", I suggested.

71. There were 71 unique requests for the combination of Event Id and Team Leader Id. Why 71? Why would the attacker choose such a number?

Next, we logged the number of unique Event Ids. 24 unique Event Ids!

26 events were marked complete on Day 1 and Day 2 of the fest. The results of 2 events were not yet declared. There were 3 winning teams for each event which makes a total of 24 x 3 = 72 winning teams.

"No, the event Mr. and Ms. Bitotsav has no third position", said someone. That makes a total of 71 winning teams. The attacker is requesting the details of all 71 winning teams.

"What did you say?" asked Ashank. 

"What?". 

"There is no team who came 3rd in Mr. and Ms. Bitotsav?". 

"Yes! It's Mr. and Ms. Bitotsav. Only one guy and one girl could have won it.". 

"The app requests the details of all the winning teams every time it starts up. Whenever some team is not found, it fails and sends all the requests again till eternity!" said Ashank.

We felt the earth slip. There were over one thousand people who were using the Android app. All the apps were requesting the details of 72 teams. One of the team detail was missing. The app crashed and requested again. The details were missing again. The app crashed and requested again. We were the creators of our own attack. It is us who programmed the attack against our own server. Each installation of the app was acting as a source of an attack and flooded the server with requests. 

"You never told me that there can be no third position!" argued Ashank. 
"Why are you requesting again and again if the API told you that the team was not found", I argued. And we blamed each other for the next hour. 

We added a dummy team for the third position of Mr. and Ms. Bitotsav. I started a cluster of servers balanced by a load balancer. The app was updated to not request the details of a team which was not found in the database. Finally, everything was under control and working again. It took us 10 long hours to find out the cause of this chaos. The app functioned smoothly on Day 3. We were not satisfied with what we have done. The 700 GB of data transferred on cloud cost us a lot.

Key takeaways:
  • Always start your API server in cluster mode with proper load balancing.
  • Cache the results of expensive functions.
  • Always authenticate an API if the API involves heavy processing.
  • Go Serverless. Make use of modern technology whenever possible.
  • Never make a manual entry to the database (as I did 🙈).  It might result in inconsistency and the APIs might fail.
  • Design your app in such a way that the number of requests it sends out to the server is minimum.

Comments

  1. Hahaha!!! you guys. Btw inspiring story for all of us.

    ReplyDelete
  2. Very well written, engaging and universal in appeal

    ReplyDelete
  3. Felt as if I am watching a suspense thriller while reading. Received goosebumps in the end of the article. Very well written.

    ReplyDelete
  4. I started reading this article in half-sleepy mode after seeing this blog's link in one of my friend's whatsapp status. It woke me up with eyes wide open and me seating on my bed and reading it with full concentration. Although reality, it was a like a movie with a great plot and a very thrilling climax. Being a part of Bitotsav, I was able to visualize every scene and your dilemma very well.
    Thank you for sharing your experience. It was worth giving 5 minutes of one's life.

    ReplyDelete
  5. Bhai bhai itna thrill to actual "attack" ke time pe bhi nahi tha 😛.

    "Design your app in such a way that the number of requests it sends out to the server is minimum." 🙄 More like "Handle remotest possible api failures in your app even when you trust it not to fail" 🙃

    Some more takeaways:
    - Read the docs of whatever tool you use, don't just go with examples/ so answers.
    - Write tests !! That could've helped us figure out what was failing much sooner.

    ReplyDelete
  6. Exclusive behind the scenes clip of the developer during the "attack".

    [image src="https://media.giphy.com/media/4ZgJBfqgDVLmM15gxK/giphy.gif"/]

    ReplyDelete
  7. It seems like reading a suspense novel! Much inspiring:)

    ReplyDelete
  8. 2 things that I learned the hard way which might be of some relevance here:
    1. Start with assumption that `I did something wrong. Question is where?`. (Usually saves a lot of time)
    2. Always have a terminating condition to your loop / recursion.

    ReplyDelete
  9. Oh My God!!
    A short movie could be made on this event. Plus, the way it's written is awesome❤

    ReplyDelete
  10. *organise the event,
    Hack yourself,
    Blame others*

    Le real life hacker: _am I a joke to you?_

    ReplyDelete

Post a Comment

Popular posts from this blog

Setting up Machine Learning environment on High Performance Computing Server

Architecture of High Performance Computing Server at BIT Mesra