Just Because It Says It's an ID, Doesn’t Mean that It’s Reliable

For almost as long as we have networked computers, we have used the network location or address as their primary identity. Using network location might have made sense when there was a 1:1 mapping between a computer and the thing it was acting for (user, application, etc.) and when the environment was fairly static. Those computers ran the same software, sat connected to the same network switch, and maintained the same IP address for years.

However, this was never really a primary identity. The address used as an identity had no actual relationship with the thing being identified other than by the accident of provisioning. It carried no real description or context. It was up to some user or system to imbue that secondary identity with meaning. That is inherently a manual process. The mapping that provides that meaning is both fallible and not amiable to rapid change.

Times have changed, both of these assumptions (static environment and 1:1 mapping) are no longer valid. This is not a new revelation or development. They started to crumble more than a decade ago when VMs hit the scene, but the current evolution of containers and dynamic orchestration completely puts paid to those assumptions, let alone the oncoming serverless or Function as Service development.

However, the developments that have brought the era of the secondary ID to an end also have brought the technological developments that allow us to create primary identities and use them in the definition of network security policies.

Since we now have real-time control loops built into scaleable orchestrators and schedulers that are driven by metadata, such as in the Kubernetes environment, we can reliably use descriptive metadata as a primary identity. That metadata may describe the provenance of the entity, its build or release version, its vulnerability or trust state, the roles or personalities that the entity meets or fulfills, etc. Since these are descriptive, they are intrinsically reliable as a source of identification, or at least as reliable as the entity that applied those descriptive metadata labels.

As we have discussed in many other blog posts, we use these metadata labels in Tigera’s solutions to apply policies and make log files meaningful, even after the secondary IDs like IP address have lost all meaning. An example of this is that in a container environment IP addresses are reused as containers are created and destroyed. Therefore, IP addresses can’t be used as a vehicle for policy, as the policy would have to be changed every time an IP address changed (which is constantly in a container environment). Similarly, a log entry where the identity is keyed to the IP address becomes useless as soon as the IP address is reused.

A corollary to this is the concept of multiple disparate signals that are linked back to the same ID increases the reliability of the overall signal, especially when the signal is used to determine trust. Let’s take, for example, a policy that matches simply on the destination port of a given flow. That is a single signal that I would match to an ID. If we assume that parts of the infrastructure may be compromised (a key concept in the Zero Trust model), making a trust decision (i.e., should a given flow be allowed) based just on that data, there isn’t much trust that can be assigned.

However, let’s say that we have a set of metadata labels that are meaningful. This set of metadata labels is not only to a layer 3 & 4 match, but also to a layer 7 method (such as HTTP get of a given URI) and TLS certificate. A much more substantial level of trust can be achieved with multiple disparate signals from metadata labels, coming from different parts of the application and infrastructure, all agreeing with the same primary ID. In the end, trust and ID are inexorably linked.

————————————————-

Free Online Training
Access Live and On-Demand Kubernetes Tutorials

Calico Enterprise – Free Trial
Solve Common Kubernetes Roadblocks and Advance Your Enterprise Adoption