Skip to content
Michel Machado edited this page Jun 5, 2024 · 81 revisions

Welcome to the Gatekeeper wiki! Check out the available pages on the list on the right of this page.

Table of Contents

What is Gatekeeper?

Gatekeeper is the first open source DDoS protection system. It is designed to scale to any peak bandwidth, so it can withstand DDoS attacks both of today, and of tomorrow. In spite of the geographically distributed architecture of Gatekeeper, the network policy that describes all decisions that have to be enforced on the incoming traffic is centralized. This centralized policy enables network operators to leverage distributed algorithms that would not be viable under very high latency (e.g. distributed databases) and to fight multiple multi-vector DDoS attacks at once.

Latest news

June 5th, 2024. The first release candidate of version 1.2 of Gatekeeper is out. If you have a deployment in production, now is the time to test and prepare for an eventual upgrade of your deployment.

More news is available on page Other news.

How does Gatekeeper work?

Gatekeeper has two components: Gatekeeper servers and Grantor servers. Gatekeeper servers are deployed throughout the Internet at locations called vantage points (VPs). Vantage points are Internet exchange points (IXPs), border and peering-link routers, and (potentially) cloud providers. The aggregated bandwidth of all Gatekeeper servers is what enables a Gatekeeper deployment to scale its incoming bandwidth to match the peak bandwidth of DDoS attacks.

Gatekeeper servers use BGP to announce the network prefixes under their protection. Thus, each traffic source is bound to a VP. Gatekeeper servers' primary function is to enforce network policies over flows; a flow is defined by the pair source and destination IP addresses. An example of a policy decision is for IP address A to be allowed to send packets to IP address B at 1Gbps or less. An analogy that may help some to wrap their head around Gatekeeper is to think of Gatekeeper servers as reverse proxies that work at the IP layer.

When a Gatekeeper server does not have is a policy decision in its flow table to enforce over any given flow. Instead, it encapsulates the packet of that flow using IP-in-IP, assigns a priority to the encapsulated packet based on the rate of the given flow (higher priority for lower rates), and forwards it through the request channel. The request channel is reserved 5% of the bandwidth of the path that goes from a Gatekeeper server to the Grantor server responsible for the policy decision. Whenever a router forwarding the packets in the request channel needs to drop packets due to the limited bandwidth, it drops the packets of lowest priority in its queues.

A network policy is a Lua script that runs on Grantor servers. Grantor servers are co-located near the protected destination; typically in the same datacenter of the destination. One can deploy Grantor servers in other locations and even employ anycast to reach varied destinations, but we assume here (for the sake of simplicity) that the destination prefix is deployed in a single datacenter.

Grantor servers are responsible for making a policy decision on each flow in the request channel. These policy decisions are sent to the corresponding Gatekeeper servers to enforce them. As policy decisions are installed into Gatekeeper servers, the flows of legitimate senders get moved to the granted channel, in which bandwidth is allocated according to the policy. Similarly, identified malicious hosts would be blocked. This, in turn, would reduce the delay experienced by legitimate flows waiting at the request channel.

Summarizing, a Gatekeeper deployment consists of a number of vantage points forming a shield around the protected networks. Grantor servers, which reside inside of the shield but before the final destinations of the packets, run the network policy to decide the fate of all incoming traffic. The policy decisions are installed at Gatekeeper servers, which enforce these policy decisions.

Anatomy of a Gatekeeper Response

For an example of what all this means, consider the nearly 1Tbps SYN flood that Cloudflare faced in 2018. A Gatekeeper deployment with enough bandwidth at VPs and enough Gatekeeper servers to process the incoming packets would, before any policy evaluation or priority adjustments, reduce that incoming flood to 50Gbps (i.e. 5% of 1Tbps).

How? Since it was a direct attack and the attacker was targeting bandwidth capacity -- that is, there was no attempt to establish connections -- the request channel would start to automatically filter the flood before reaching Grantor servers in a couple of seconds. This couple of seconds would become the delay experienced by legitimate senders to have their policy decisions installed into the Gatekeeper servers.

The key points to understand how such an attack would be mitigated by Gatekeeper are: (1) ensuring that 5% of the path bandwidth is reserved for requests, (2) assigning priorities to those requests based on the time between requests in the same flow, and (3) dropping packets of lower priority when the bandwidth of the request channel is overflowing. These properties guarantee the results of Theorems 4.1 and 4.2 presented in Section 4.2 of the Portcullis paper (SIGCOMM 2007). In plain English, Theorem 4.1 states that if a network policy cannot identify malicious hosts, a legitimate sender will wait, in the worst case, a time proportional to the number of malicious hosts to have a policy decision installed at the corresponding Gatekeeper server. Theorem 4.2 states that under these conditions, that result is optimal. If a policy can identify malicious hosts, the waiting time becomes, at worst case, proportional to the number of undetected malicious hosts.

What about flows that start behaving well to receive a favorable policy decision and then abuse? There are two parts to this answer. The first is that all policy decisions eventually expire. The second is that policy decisions are BPF programs, so they can limit or act on changes of behavior. The folder bpf in our repository shows examples of BPF programs that limit the bandwidth of a flow (e.g. granted.c), penalize unwanted traffic with less bandwidth (e.g. grantedv2.c), and enforce behaviors expected by simple Web servers (e.g. web.c).

What about abuses that cannot be identified by a BPF program? If the abuse can still be identified by an intrusion detection system (IDS), one will eventually be able to combine IDSes with Gatekeeper; track the status of this feature at issue #298. Another approach is for applications to identify abuses and to feed their results into policies. For a simple example, consider a WordPress plugin that identifies a number of abuses on a WordPress site. The list of abusers from this plugin can be fed into a policy to enable Grantor servers to either bluntly reject abusers or to apply stricter rules on them.

Finally, what about abuses that cannot be identified by an IDS or even an application itself? In this case, all that can be done is to schedule the flows in such a way that enables the network and servers to serve the flows at the best rate possible. Without the scheduling of the flows, the infrastructure is going to serve nobody under such an attack. Gatekeeper is not ready for this scenario. We have many milestones in front of us before we can work on this last milestone.

Where to go from here?

The page Publications has several resources to help you understand Gatekeeper. Newcomers will find our NANOG 82 a good starting point. Those interested in learning from a large-scale Gatekeeper deployment will have plenty of information in the NextHop 2020 presentation. The technical report "Gatekeeper: The Design and Deployment of a DDoS Protection System" covers the underpinnings of Gatekeeper, and is a recommended read for those looking for the big picture.

The page Tips for Deployments describes a simple deployment and includes practical details to help those getting started. The page Overview is a quick reference of the components of Gatekeeper. Much more information is available on the pages of this wiki.

Where to find help?

The primary point of contact for bugs and issues in general is the issues page here on GitHub. The milestones page doubles as our roadmap. Our research/development group is reachable at Google Groups Linux XIA.