Detecting Domain Generation Algorithms

Malicious actors often use Domain Generation Algorithms (DGA) to exploit the DNS protocol and execute command-and-control (C & C) malware attacks. In this webinar, threat researchers Manoj Ajuhe and Chris Gong from Tigera’s Threat Detection Team will be sharing the latest insights into DGAs, the risks they present, along with best practices to speed detection and mitigation.

Complete Transcript

Hello and welcome to our Kubernetes Threat Intelligence webinar. Today’s topic of discussion is detecting Domain Generation Algorithms, or DGA for short. My name is John Armstrong and I’m product marketing director for Tigera, and glad to be on this call today. DGA is an automation technique that attackers use to make it harder for defenders to protect against attacks, and malicious actors often use DGA to exploit the DNS protocol and execute command and control malware attacks. DGA has been used for 10 years now. It’s still a potent technique that’s been a particular challenge for defenders. That’s because DGAs are very difficult to detect due to their dynamic and unpredictable nature. Now, traditional approaches to data security can’t contain DGA threats with the problem complicated by the migration to Kubernetes and the cloud. But fortunately, there are solutions now that can better counter DGAs and we’re going to discuss one of those today. In this session, Tigera’s threat detection team will be sharing the latest insights into DGAs, the risks they present along with best practices to speed detection and reduce risk. Today’s session will be led by two of our experts from Tigera’s threat defense research team. And after the presentation and demo, we’ll have a Q&A session where you can submit your questions to us. You can use the questions interface on BrightTALK to submit those questions. Our presenters today are Manoj Ahuje, who is a threat intelligence research engineer with Tigera. His focus is on vulnerability research, malware reverse engineering and exploit development. And he’s had previous roles at Juniper and Intel. Our other presenter this morning is Chris Gong. He’s a threat intelligence software engineer with Tigera whose specialty is in machine learning and data science for threat defense, and Chris is a former physicist. Now at this point, I’d like to pass things over to Manoj for the presentation. Good morning, good evening. So today’s agenda is crisp and short, so we’ll be looking at the cyber-attack chain in the cloud. Then briefly, we’ll look at the DNS and how it is used in DGA. Then we will deep dive into Domain Generation Algorithms, where we’ll look at what the Calico machine learning approach is. And then my colleague, Chris, will do a demo and go through how Calico machine learning works, and then we will look through some numbers and results. So with that, let me start with our first slide. When attacker sets a target, he usually has an organization name or domain name. And from that, he starts his reconnaissance and starts to look at what the organization is, who are the people working for it? Can he use phishing or social engineering to get that initial foothold? He can start to look at what are the resources the organization is exposing to internet. Is there any cloud presence? If yes, then which cloud? What are the integrations there? What are the services being exposed? Is there any vulnerable piece of code which is being exposed there? An attacker can exploit it. There could be numerous configuration mistakes. As for most of the organizations, Kubernetes is relatively new and it’s about turning on the lights. There could be an insecure API, as cloud is API driven, and there could be vulnerabilities in your FCD, your Kubernetes API, your cloud metadata API, your custom resource APIs and numerous things. There could be cloud infrastructure vulnerabilities, notably Spectre and Meltdown, where an attacker is a tenant like other organizations on a public cloud and rather than exploiting a targeted organization, attacker exploits a flaw in underlying hardware layers. And not only with that, he ends up compromising the organization he wants, he compromises all the organizations which are present at that time on that particular hardware piece. Once an attacker has this kind of foothold, he will try to phone back home with command and control servers. So we’ll talk about command and control servers in next slide more. Then an attacker can start to do lateral movements in search of the pot of gold, if he doesn’t have it already. So those could be cloud account keys, it could be computer resources, it could be intellectual property. And data is new oil, so he can be after that classified data and then he will try to exfiltrate that in some way. At Tigera, we assume that an organization is already breached because there are N number of scenarios where this can happen. So you can see most of our defenses are designed towards detecting an attacker and they can be used for prevention as well. So today’s focus is command and control server. The command and control server itself is a multifaceted topic. In a recent time, command and control servers could be anywhere in the globe. There could be a server up at any geo, there could be proxies behind proxies, and a command and control server can be behind one of those. Attacker could use data or VPN networks in between and attacker can also bounce the connections out of our level and try to reach the C&C ultimatum. But security mechanisms, security companies and traditional mechanisms have become so strong in detecting these as they come, attackers started to feel the need to have some dynamic layer on top of it. And from there, they elevated the concept of a Domain Generation Algorithm. Before we deep dive into Domain Generation Algorithms, let’s look at DNS and how it is used in Domain Generation Algorithms. So DNS, as you know, is the basis of our modern internet. And how it works is your local machine queries a DNS server, which is configured on your machine or public DNS server and you get a response for the query you made. This is a relatively simple behavior, but monitoring this behavior can be really difficult as there could be millions of queries per day. But the good news is that logging DNS is possible because it is such a small payload, it just fits under your MTU size. So let’s look at what happens when we try to query a domain which is not registered anywhere. So I picked up just a random string, which is not used anywhere. So you see a response is decline. Basically, the DNS server is saying, “I didn’t find any IP address anywhere, so here is blank response.” So this exact behavior is exploited in DGA. So basically DGA, in a nutshell, is an algorithm which is capable of generating thousands of domains depending on a seed it is using. So basically you can look at the DGA as depending on day of the week, I can generate thousands of domains and try to connect to them and see if one of them responds or one of them is registered. As you can see in the graph at my right hand side, malware continuously tries to do that. If there is no response, it stops. Another day, it uses its configured seed and then makes these queries. And the way it works for attackers is they go ahead and register one of these domains in advance, and on particular day when malware queries that domain, attacker can alter further commands to the malware. So in this process, the seed is the only thing which is common between attackers and its APT or malware. So let’s take a closer look at the seed. The attacker usually uses a pseudo random string generator, which gives him a really random string. And to get that string, he needs to feed an input and that input should be something controlled by the attacker and malware code itself. It should be common to both. So he ends up using something like system date and time. It could be a currency pair, it could be a temperature of a particular city, or even the trending topics on Facebook or Twitter. And once he has that random string, he needs to concatenate a top-level domain. And the way he does that is, he can use that TLD depending on even/odd day scheme or depending on a day of the week, he can choose to use particular TLD, and then he’d get a FQDN which he can query. On the right hand side, you can look at the DGA generated for the notorious Necurs botnet. If you are following the news, just a few days back Microsoft took down the Necurs botnet. Microsoft had to coordinate with 36 countries to do that. Just to give you a perspective of Windows, what are the complexities involved here? Let’s say if you have a domain which is registered in Taiwan and you have command and control which is up and running in Russia, it’s almost impossible for US law enforcement agencies to take down a set up like this. And not only that, DGA malwares are extremely difficult to reverse engineer. Most organizations won’t have skills to do that. Another thing is taking a seed out of malware is extremely difficult. Even if, let’s say for assumption, you were able to reverse engineer the malware and you took out the seed from the malware, still you will need to blacklist hundreds of thousands of domains which are not even registered yet. So this is where our traditional mitigation approaches fail really miserably. So this is where we started looking at a machine learning approach, which can help with this dynamic nature. And at Tigera, we spent a lot of time in researching a machine learning algorithm and we’ve come up with a model which is highly tuned to target this DGA use case. The goal was to trigger an anomaly with such a high confidence that once you have that particular anomaly trigger, you can consider your cluster compromised. So we were able to trigger anomalies with north of 99% confidence. With this, we provide a list of DGA domains detected in the cluster, so that incident response teams can quickly zero-in on compromised workloads and then reduce the dwell time. So with this, my colleague Chris will be doing a demo for us and he will be showing us how easy it is to configure Calico Enterprise machine learning algorithms. And he will be showing us a more reduced seed. So Chris, over to you. Okay. Thank you very much, Manoj. Just give me a moment to share my screen here. All right. Hello, everyone. My name’s Chris. I am an engineer with Tigera’s threat defense team. And in the next couple of minutes I will be giving you a demonstration of Tigera’s DGA detection and how to use it. So to begin with, I have a Kubernetes cluster with Calico Enterprise installed. You can see here all the pods currently existing on the cluster. So the first thing that we want to do is to actually install the DGA detection, and to do that is actually very simple. We simply apply a manifest like this. And if you’re interested in trying out a preview of DGA detection, please contact a Tigera representative and they can provide you with this manifest. So let’s see exactly what this did. So that manifest created an installer pod, which has completed the installation of the detection onto our system and so, that’s great. Now next, if we look back at our cluster, I have here an attacker pod. So this represents a pod in our Kubernetes cluster that is compromised and has some malware installed. Let’s then examine exactly what the attacker pod will try to do. Okay. So a brief overview of DGA. Manoj talked about this earlier, but I think it’s worthwhile just mentioning it again. So this malware, once it’s been installed onto the clutter, will attempt to phone home to some domain that is controlled by the attacker. And once it establishes a connection with external domain, it creates a command and control server where it can then begin either installing ransomware or bit-mining or vast data exfiltration. The simplest method of phoning home to this external domain, it’s simply by hard coding the domain. But that makes it very easy to detect because once we know what this external domain is, you can blacklist on the system that completely makes the malware harmless. Then the next step is, instead of hard coding the domain, why don’t we create a bunch of randomly generated domains using some seed? Well, this works okay until you reverse engineer the seed, in which case you can now predict exactly what domains will be generated and blacklist those as well. The very next step, and this is what DGA is all about, is to generate random domains using a randomized seed. And this will eventually create a whole bunch of domains, most of them are simply garbage domains that are completely unregistered. But every once in a while, one of those domains will be registered by the attacker and it will attempt to establish connection. So because both the domain is randomly generated and because the seed is random as well, that makes the Domain Generation Algorithms very difficult to detect. What I’m going to do next is to generate some data from this compromised pod using a malware or using the seed from a particular malware family called the Gameover Zeus malware. So this malware uses the DGA to create a C2 connection. And so, let’s just take a look at what that would look like. So here, I’m just going to generate 150 DGA domains. So just once again, these are generated with a seed from Gameover Zeus. And so, you can see that the names of these domains are basically completely garbage and most of them are going to be unregistered. But once on a while, one of these will be registered, and that’s what will be compromising our system. Okay, now our system has that generated a bunch of domains. Let’s see what happens with Calico Enterprise, right? Here’s our Calico Enterprise user interface. So first of all, let’s just make sure that all those domains were actually registered into our system. If I just go to here… Okay, give it a second. Now if I go to the Discover tab, and let’s just take a look at the DNS logs. Okay, let’s look at the last 15 minutes and I want DNS logs here. And you see this spike that occurred just a couple of minutes earlier? This is the spike caused the delays that I’ve generated from the attacker pod. And this is one of the things that makes Calico Enterprise extremely useful for these sort of detections, is because the data has already been centralized and we can easily make use of it. Once the malware has generated all these different domains, well how does detection work, right? The detection algorithm that we have just installed works on some interval. But here, just to give you a demonstration, I’m actually going to trigger it manually. And just to save you from a lot of output here, just redirect it to [techno inaudible 00:19:54], and let’s just give it a few seconds to run. All right, so it has completed running. Now two things can happen here. If DGA was not detected, then you’re totally fine, just move on. But if DGA were detected, how do we know of that detection? So once again, we go back to Calico Enterprise user interface and we can go to the alerts. And there you go. This is our alert triggered by DGA based on very high confidence. And let’s just take a look at the contents of that alert. Here you see that it has flagged all these different domains within our system. And now, just a note of caution, these domains listed here do not necessarily represent all DGA domains. These are simply the domains that were used as part of the detection that were used to trigger that particular computer. Once we have this alert, the next thing to do is to troubleshoot. And this is where the particular administrator of the cluster and the domain knowledge of the workloads on that cluster comes in handy. So if I’m the owner of this cluster, I know the workloads on the cluster and I know the expected traffic. And by looking at this list, I should be able to easily find out, well, hey, look at this. This domain does not look like it should belong in my cluster. Now, let’s just take a look a little bit further at this particular domain that I just am using as an example. Once again, we go back to Kibana and let’s just take a look at this particular domain. And there it is. So we have one single DNS query going to this domain. And if I expand this document just a little bit more, I find out all the different name spaces, IP address and names of the pods that this query was actually generated from. And from there, I can perform further digging. So this is a quick demonstration of DGA detection. Let me stop sharing my screen and go back to the slide. Okay, so as Manoj mentioned earlier, we have trained this algorithm to produce alerts only at very high competence. And we’ve done this also by validating it against many different malware families that uses DGA as part of this algorithm. And so here, you can see a list of all the different malware families that were used. I do not believe that this is an extensive list. So we could have actually tested against possibly more. Just to give you an overview of some of these malware families and how difficult it would be to actually detect some of them. The first one on the list, Gameover Zeus, that’s the malware family whose seed that I was using to generate the data for the demo. Gameover Zeus goes all the way back to 2014. It’s a malware that establishes a command and control server and is used mainly in bank fraud. And different variants of a Gameover Zeus were classified. And one of the variants uses DGA to create up to 1000 domains a day. And another variant of Gameover Zeus uses DGA to create over 10,000 domains a day. If you go further down the list, let’s look at Configure. Configure goes all the way back to 2008. The first version of Configure actually only generated around 250 domains a day and it was using a very predictable seed. So what the FBI did was they reverse engineered a seed, made a list of all the domains that would be generated and registered those domains before the attackers registered them, which means that it basically caught them, completely made the malware harmless. However, later generations of Configure started to use DGA to evade blacklisting. And so, one of those versions of Configure actually ended up generating over 50,000 different domains a day, which makes the previous method that the FBI utilized to be not ideal. Now if we go further the down the list, let’s take a look at number six, Ramnit. So Ramnit was first discovered in 2010 and uses a DGA method to generate domains that are anywhere between eight to 19 characters. And each one of those characters is chosen based on a uniform distribution between the letters A to X, right. All these are actual real malware and they are extremely hard to detect. All right, so thank you. Manoj, do you want to finish this off here? Yeah. Sure, Chris. As we saw in this talk, how DGA works, what are the complexities associated with that, and how resilient a command and control server can be using a DGA. With that, we want to stress on logging the DNS traffic as it is such a small payload, and we can trace a number of attacks including DGA and tunneling mechanism using just DNS. We saw a great demo from Chris for the Calico Enterprise machine learning algorithm. And as you saw, we generate an anomaly with such a high confidence so that incident response teams can quickly zero in on that particular compromised workload and reduce the dwell time to minutes. And with that, the last thing we want to say is you can schedule this algorithm to run on a daily, weekly or hourly basis. It’s up to you. It’s really flexible. So with that, I want to hand it off to John. Thank you, Manoj and thank you, Chris. That was a really excellent demo and excellent presentation. We’re looking for some questions from our audience, I can see a couple up here. We’re going to give folks in the audience a little bit more time to ask a couple more questions. In the meantime, I just wanted to share some information about some upcoming Tigera online events. Of course, all of our events are online for the foreseeable future. We have some interesting ones next week as well as later this week. Running Calico for Windows. And another one, Running Calico on Your On-Premises Kubernetes Cluster, and one on implementing network policy on Google Kubernetes to secure your cluster. All of those events can be viewed online at www.tigera.io/events. Yes, we do have some questions now. So let’s get right to those. And first question is, “what is the recommended frequency for running Tigera’s DGA detection algorithm?” Okay, I can take that one, John. So as Manoj mentioned, the interval patch which we run detection is tunable. So if we don’t want to run this very often, we certainly don’t have to. But I think to start off with, running it once a day is a good place to start. Okay. Thank you, Chris. The next question I have here, the audience member says, “I’m concerned about false positives. What’s the rate of false positives for this solution?” Oh, okay, I’ll take that one again. So that’s a very good question. So during our training, our false positive rate, or I should say our precision is close to 99% which means that our false positive rate is just slightly over 1%. Now, doing the implementation and deployment, we further reduce that number down by using a highly curated white-list, and also just by applying our knowledge of the DNS protocol. So for certain types of DNS queries, we know what the expected response should be. And based on that, we can further filter down the list and all these things help to reduce the false positive rate to below 1%. Very good. Thank you, Chris. We do have two more questions. So the next one is “does ransomware make use of DGA?” I can take that question, John. Yeah, the ransomware usually is a payload packed in DGA malware. So DGA malware’s first goal is to evade the traditional defense mechanisms. And once the malware has evaded it, it tries to talk to the command and control server. And from there on, it can download a ransomware as a payload and execute on a compromised organization. But ransomware itself doesn’t care about evading as ransomware starts to encrypt the system and then blatantly shows the message that “we have encrypted all your files and we need a ransom for that”. So as you see, there is a use case there, how to transfer the ransomware using the malware, which can efficiently evade security mechanisms. Very good. Thank you, Manoj. One more question we have here, “I’m also looking at Necurs.” And the question is, “After the DGA command and control is taken down, is the malware completely removed or is it still active?” Oh yeah, that is interesting, actually. Actually, we would think that once the command and control infrastructure is taken down, malwares won’t work. But that is a wrong assumption most of those time because the malware itself keeps on running and tries to phone home and tries to see all the domains that it generated, tries to connect to them and tries to see if any one of them is responding. What law enforcement can do is control those domains and try to see how much of an infection there is in the entire region or world. But malware still will be active and eventually, what can happen is malware can have an embedded payload. At any moment, if it can’t find any CNC which is live, it will end up executing that payload and can still harm the organization. So malware will be active and still can do harm. Very good. Thank you, Manoj. As we have no more questions, we’re going to wrap this up now. And just another reminder before we close out of upcoming Tigera online events. I’d like to thank you for joining us on our webinar today, I hope you found it informative and helpful. And again, we have many upcoming events, just check out our website at www.tigera.io for a listing of all future events and webinars that are happening. Thank you for being part of our webinar.