The software-as-a-service (SaaS) company is a global provider of collaboration software that helps teams organize, discuss and complete shared work. The company is publicly traded with revenues of more than $600M and has more than 100,000 customers, across large and small organizations. The company delivers project tracking, content creation, and sharing, real-time communication and service management products. With these solutions, teams can work better together and deliver quality results on time.
The company undertook a microservices initiative that was aimed at deploying enhanced platform-as-a-service (PaaS) offerings for customers built on top of Amazon Web Services (AWS). As part of this initiative, the company wanted to minimize costs while delivering the high levels of performance and reliability that a demanding customer base requires.
Conventional architectures would require on-premises networking, which entailed high network infrastructure and per-gigabyte traffic costs. By having all nodes in the AWS-based microservices environment communicate directly with the internet, the team could realize several advantages. First, they could avoid having to funnel traffic through local physical data centers, which is where core networking and firewalling had previously been performed. This approach also meant that they could avoid having to replicate data center infrastructure around the world and significantly reduce their AWS networking costs. Finally, by enabling the company’s application in AWS to communicate directly with the internet, the team could establish a more efficient and stable network infrastructure with fewer moving parts.
The project was kicked off when the company’s Site Reliability Engineering (SRE) organization alerted others to the fact that the on-premises firewall infrastructure was introducing platform reliability issues. One of the infrastructure architects suggested working with Tigera and the company’s Network Services team designed a new architecture built on the Tigera application connectivity architecture. The project team included stakeholders across the organization including network services, platform as a service group, security, compliance, infrastructure, and the line of business team.
As they set out to move to their new architecture, the team needed to address three network security objectives: ensuring compliance, enabling fast threat response, and establishing optimal security through defense in depth approaches.
To meet their compliance requirements, the company needed to run firewall tasks in AWS. By doing so, they could ensure that traffic did not go to countries that US export regulations embargoed. This blocking function had previously been provided by company data center core routing infrastructure and needed to be established in the AWS environment that communicated directly with the internet. As a senior network engineer at the company, commented, “We asked our legal department if we need to control connections to and from embargoed countries, and they said ‘yes’. So we built something that does that.”
Second, the company needed the ability to block suspected bad IP addresses. Previously, the team was able to blacklist bad IP addresses by modifying on-premises firewall rules and remotely triggered blackhole routes. The firm needed a way to establish and distribute firewall rules to all nodes in AWS to enable rapid response to security incidents. The network engineer commented, “If security identifies bad IPs or bad actors that are trying to exfiltrate data, we want to be able to block that type of activity.”
Third, the company also had to achieve defense in depth for their platform-as-a-service (PaaS) offering in AWS. The firm already had security groups, access control lists (ACLs), and a variety of other measures, but wanted another layer of security. This was critical because the new AWS architecture obviated the need for a network address translation (NAT) gateway, which had previously provided a measure of network security by blocking inbound traffic at multiple layers.
As the network engineer explained, “We needed a firewalling solution that we could deploy on a highly dynamic group of Amazon EC2 instances. These EC2 instances come and go and scale up and down in rapid fashion. As a result, we wanted the ability to change the policy dynamically without having to rerun a deployment script.”
The company’s microservices-based application nodes run in Docker containers, and the firm’s architecture required policing traffic going between a particular interface, a host, and a Docker container. Most of the traffic in the existing application was shipped to the data center, where firewall rules were run in the core routing infrastructure. The network engineer explained, “We’re moving away from physical data centers and deploying in regions where we don’t have any physical presence. That is why we needed this firewall functionality into AWS.“
The company looked at the native AWS security group functionality and decided that security groups were useful but inadequate. To achieve defense in depth, the team opted to use both Tigera-based network policies in conjunction with AWS security groups. Said the network engineer, “We needed to block thousands of prefixes, and you cannot do that with AWS Security Groups. If we had to block less than a hundred prefixes or IP addresses, we could definitely do that with security groups or ACLs. But when we need to scale to thousands of prefixes, that is where security group functionality fell short. We need to block multiple thousands of prefixes.”
When the team started discussing the enforcement of firewall rules on a set of hosts, an infrastructure architect pointed to Tigera as the best way forward. The company had previous experience with Tigera Calico from other projects inside the company and was confident about the technology. The network engineer commented, “We have a team doing Kubernetes, and they are using Calico. Calico is now effectively the standard when deploying Kubernetes clusters.”
The network services team needed the ability to rapidly push rules across their fleet of nodes. The network engineer highlighted the fact that, “We need to have the same policy across the entire fleet, which may be comprised of thousands of instances. We needed the ability to push rules without a deployment script that touches every node. Alternative approaches we considered had no dynamic updating or correction.”
“The firewalling happens in the data center, we could not replicate that approach in AWS. By moving to having public IPs on all of the nodes, and having Tigera do the extra firewalling and provide defense in depth, we solved all of the problems at once.”
Tigera also provided the simplest solution, and it can also be reused for other projects within the company’s AWS infrastructure. In the future, the team intends to standardize on Tigera to ensure EC2 security and compliance. The network engineer recounted, “When we told people that policy changes would be reflected immediately on all nodes, they were very happy because they were expecting rule changes to require a painful deployment and management process. With Tigera, we can scale painlessly. Instead of pushing rules to eight core routers, we now dynamically push them to thousands of EC2 instances.”
Tigera employees assisted the company throughout the design and deployment process. Commented the network engineer, “The responsiveness of people on the Tigera team was quite impressive.“
The company is in production with some elements of their application portfolio and in the process of testing others with the intention to roll into production in the coming months. Commented the network engineer, “The Tigera solution allows us to go into regions where we do not have a hardware presence. If we want to open in India, we don’t have to send people to deploy core routers in data centers. We can deploy without any hardware by using AWS.”
In AWS, public IP addresses are assigned to every instance, which eliminates the need for NAT gateways – and their associated cost.
By adopting the Tigera-based approach, the company will be able to realize cost savings in a number of ways. The firm avoids the cost of deploying routing and firewall infrastructure in new geographies. Much greater savings come from avoiding the cost of the AWS managed NAT gateway and Direct Connect. The network engineer revealed, “We pay on a cost-per-gig basis when we have traffic go from Amazon to the data center through Direct Connect, or when we go through the [AWS] NAT gateways. By avoiding NAT gateways and Direct Connect, and going straight out of AWS, we get our cost as low as we can bring it in terms of cost per gig of network traffic.”
The team intends to deploy the Tigera-based architecture across thousands of AWS EC2 nodes. Describing the new AWS microservices connectivity infrastructure using Tigera, the network engineer commented that, “It worked as advertised. We realize improved cost savings, reliability, and performance, so it is a win all across the organization.”