Aug 7, 2020| Andy Paine
Cloud Foundry has toyed around with the idea of providing service mesh capabilities to end users for a while. As CF transitions to a Kubernetes-based runtime, it looks like operators will soon be able to bring the full power of Istio & friends to their developers. But what if you aren’t an operator and you want all those features today?
Enter Hashicorp Consul. Consul offers service networking features such as health checking and service discovery whilst also bringing all the service mesh goodness of zero-trust networks, weighted traffic routing and circuit breaking. The best thing about it? You can use these features on (almost) any Cloud Foundry deployment as an end-user.
Through some clever usage of container-to-container networking, sidecars and a little bit of metadata magic, you can deploy Consul and the service mesh sidecars to any application.
Normally in Cloud Foundry, applications talk HTTP via the Gorouter which provides all the TLS and layer 7 routing goodness we’ve come to know and love.
For many years Cloud Foundry has also offered direct TCP routing too, so apps can use non-HTTP protocols, or handle their own TLS termination.
One down side of very early versions of Cloud Foundry was that this approach made it difficult to make apps accessible only within the platform, and to avoid traffic ‘going out and back in’.
Thankfully several years ago container-to-container networking functionality was added to Cloud Foundry so that apps can communicate directly to each other using any protocol required. However, as this method does not include the Gorouter, application developers need to implement their own logging and metrics for these internal requests.
Consul requires a control plane made up of Consul agents running in “server” mode and, optionally, a UI. The Consul control plane can be deployed as Cloud Foundry apps, using the binary buildpack.
It is recommended that you run a minimum of 3 instances for high availability which then communicate to form quorum and elect a leader. For this consensus algorithm to work, each instance needs to be able to uniquely address other instances and be allowed to communicate using TCP and UDP.
We can do this in Cloud Foundry by using internal routes as well as container-to-container networking.
There is a special domain name in Cloud Foundry reserved for applications within the platform:
Applications can be mapped to a route such as
consul-server.apps.internal which all other applications on the platform can use to resolve the IP address of the containers running the
Individual instances can also be queried by prepending the instance index before the domain, e.g.
1.consul-server.apps.internal. This allows applications on the platform to uniquely address other instances which ticks our first box.
Access between applications is denied by default but can be modified by adding network policies to permit traffic on certain port ranges and protocols. By creating network policies between instances of the
consul-server application, we can allow both TCP and UDP traffic for the relevant port ranges.
Once the Consul control plane is up and running, you need some applications to register into a service mesh. This is done in Consul by running a couple of other processes on the application instance:
Cloud Foundry has first class support for running these processes: sidecars. These processes are started within the same container as the
process_types to which they are bound and are monitored using the
process health check type the same way the main application is.
These sidecars could even be seamlessly included via a sidecar-buildpack
By creating Consul services to represent each application, the sidecar proxies can be configured with all dependent services as upstreams. This allows an application to access other applications as if they were local with the proxy looking after service discovery as well as connecting to the upstream service over mutual TLS.
Most objects in Cloud Foundry (e.g. apps, spaces, orgs) can have metadata associated with them. This combination of annotations and labels can be used for attaching useful information such as commit hashes or environment.
By labelling all the applications inside our service mesh, not only can we quickly see and query all applications that belong in the mesh but we can also use that information to ensure that networking is properly configured. The included script checks that all applications which are labelled this way may communicate freely with each other across ports 8000-9000 and both TCP and UDP.
To demonstrate all this working together, we can use HotROD, a set of example microservices usually used to demonstrate Jaeger’s distributed tracing. HotROD is comprised of 4 applications:
Normally these would all be run on a single host - as shown by the fact that the address used for the other services is hard coded! However, by deploying each service as a separate Cloud Foundry app integrated into a Consul service mesh, we can work around this.
The customer, driver and route services can all be registered as upstreams for the frontend service, making them appear as if they are running on the same host as the frontend. With Consul instead proxying all the requests to the backend services to their respective sidecars over mTLS, the microservices can continue to communicate, even when deployed on different hosts.
Whilst this is a pretty neat trick, the real benefits of Consul come now that the applications are embedded into the service mesh. Consul can accept configuration to modify how the service mesh communicates, using the same infrastructure-as-code approach that Hashicorp offers with Terraform. These options include:
Consul has a concept called intentions that can be used to block access between specific application instances. These can be quickly applied and removed without having to restart applications thanks to the Consul agents constantly checking whether requests are permitted.
Whether it is canary deployments or A/B testing, being able to fine tune how traffic is routed to applications is a handy tool to have. Consul has a number of layer 7 traffic management features that make new ways of deploying and running applications possible.
When using container-to-container networking, requests no longer pass through the Gorouter meaning we lose all metrics derived from Gorouter data. Consul proxies can be centrally configured to emit metrics in a number of different ways providing extremely fine grained data even for application-to-application communication.
With Cloud Foundry providing an easy way to run applications with health monitoring, logging and rolling deployments out of the box, even complex deployments are quick to develop. There are still many interesting avenues to explore including security, automating metrics and configurable sidecar buildpacks to leverage other features of Cloud Foundry to make the process easier to set up and manage.
Together with all the Cloud Foundry features we know and love, these powerful new capabilities demonstrate just how flexible of a platform Cloud Foundry has become. Getting this project running took less than two days of work (and was all done on the public, multi-tenant Pivotal Web Services!) - a testament to how easy deploying to Cloud Foundry can be without significant platform customisation.
Running a configurable, zero-trust service mesh on top of Cloud Foundry without any “magic” or fuss - who would have thought?