Cloud Foundry Buildpacks in Restricted Networks

Cloud Foundry provides a flexible system called "buildpacks" to handle applications that use different runtimes and frameworks. Traditionally, many buildpacks reach out to public sources on the internet for the various runtimes and other supporting binaries needed to support an application. In on-premise deployments on Cloud Foundry, however, it is quite common to limit the access of Cloud Foundry to the internet. One of the great things about buildpacks are that developers can pull them in without having to work with the Operations/Architecture teams.  Unfortunately, a more secured type of an environment can be problematic for many buildpacks. Luckily, there are some strategies you can employ to make custom buildpacks available for developers to use in a more protected deployment of Cloud Foundry.

NAT/Transparent Proxy

One simple strategy is to allow Cloud Foundry to have access to specific locations on the internet via NAT or some other sort of transparent proxy.  Your network team would typically need to set this capability up for you, and administer the remote sites that your installation is allowed to reach out to.  With this sort of setup, Cloud Foundry would have controlled access to the internet, and buildpacks should be none the wiser that they are accessing the internet through a NAT or Proxy.

The challenge with this strategy is that it may incur a high latency in bringing new buildpacks while also exposing the platform to the raw internet.  You typically would have to wait for the network teams to open up the NAT/proxy to allow access to the remote site for the buildpack.   And even if your network teams put a fairly permissive policy for remote site access for Cloud Foundry you would be relying on a remote site to be available when you need to stage an application.  Reliance on remote sites to host your buildpacks opens your environment up to transient failures at best, and to malicious attacks at worst.

The challenges with this approach effectively render this strategy a non-option in most environments.

Buildpack Inside

You can neutralize the problems with the previous strategy by pulling the buildpack inside your own firewall.  There are a couple of strategies you can use to host buildpacks inside your own network for improved reliability and security.

Custom buildpacks are retrieved from Git repositories just when they are needed in the application staging process. You could host your own Git repository inside your private network, and then provide the "cf" command with the URL to your private repository for that build pack using the -b parameter.

cf push my-application -b https://<private-git-server-address>/<repo>

Hosting the buildpack in this fashion keeps Cloud Foundry from having to go out to the public internet to retrieve the buildpack, and gives you a convenient way to control updates to that buildpack.  The downside is that you have to setup and manage a Git server to host these buildpacks if you aren't running Git internally already.

Cloud Foundry also allows you to upload buildpacks into the platform if you have administrative rights.  You can use the cf create-buildpack command to upload a ZIP archive of a buildpack to the platform to make it available for any developer to use.  This prevents you from having to setup Git repos for the buildpacks you want to use, but now you must get an administrator involved each time you want to try out a new buildpack that isn't in the platform.

Buildpack Dependencies

One major challenge with both of these methods is that buildpacks themselves often reach back out to the public internet to retrieve binaries needed to build a droplet.  So even if you can get the buildpack inside your firewall, you also need to deal with these additional dependencies.

Originally, this problem was left up to the buildpack or the buildpack user to deal with.  There is nothing in the required interface for a buildpack that governs how that buildpack manages dependent resources.  Buildpacks are free to include their own dependencies, or to reach out to remote locations to pull in dependencies.

For instance, the Java Buildpack pulls in a JDK, and a web container like Tomcat to host applications that it deploys.  By default, these resources are retrieved from a public mirror for these dependencies.  The Java Buildpack does provide a way to package the buildpack for "offline" mode, so that in protected environments you can still stage Java applications, but you don't have to reach out to a remote site.  The buildpack simply needs to be packaged up on a machine that does have access to the internet, and then that buildpack can be uploaded to the platform for use.  Other buildpacks may have their own ways to deal with this problem, so read the documentation associated with the buildpack you wish to use.

There has been an effort to try and standardize this process of creating offline buildpacks.  Buildpacks can specify what their external dependencies are in a manifest file and then allow a tool called the Buildpack Packager to capture all those dependencies automatically.  This tool downloads the specified dependencies, and packages them with the buildpack for upload into the Cloud Foundry platform with the cf create-buildpack command.  The Ruby Buildpack is one of the buildpacks that uses this method.

Pivotal Software's distribution of Cloud Foundry, called Pivotal CF, includes offline versions of the Java, Ruby, Python, PHP, and Go buildpacks out of the box so that you can deploy applications that use those technologies in a secure, private deployment of Cloud Foundry with no additional configuration required.


Some buildpacks (like the Java Buildpack) also allow you to simply "point" the buildpack to the place that it should go to get "external" resources.  One example of this method is the Expert Mode in the Java Buildpack.  With this strategy, you could use a simple HTTP server or an artifact respository like Nexus or Artifactory to mirror all the dependencies for your buildpacks inside your private network.  Then, you could clone your chosen buildpack, and configure it to retrieve all its dependencies from your internal artifact repository.  This gives you the flexibility to control what runtimes and containers you allow your buildpacks to us, caches them inside your own network to save from having to use internet bandwidth to retrieve them, and also allows you to secure these external resources from malicious attacks.

These methods allow you to have much more control over the accessibility and security around the external resources a buildpack needs at the cost of a some additional management overhead.


One Off Proxy

There is a way to use custom buildpacks that come from the outside world in a protected set up if you have a more traditional, non-transparent HTTPS proxy server.  This must be set up on a per-app basis, unfortunately, but it does allow you to quickly test out an external build pack without as much fuss as the above methods. (Updated: Info below on setting up a site-wide Proxy setting)

To enable the staging process to clone a buildpack repository through your HTTPS proxy, you need to set the HTTPS_PROXY environment variable for the application, and then stage or start the application.  I don't mention the use of an HTTP proxy because many buildpacks are hosted on github.com, which uses HTTPS.  If your remote buildpack is accessible via HTTP and you want to use that instead, simply change the name of the environment variable to HTTP_PROXY and use the http scheme for your buildpack URLs.

As an example, let's say I want to do this for an app called "test-proxy".  I would execute the following commands:
cf push test-proxy -b https://github.com/my-cloning-account/java-buildpack.git/ -p <path-to-war-file> --no-start
cf set-env test-proxy HTTPS_PROXY http://user:password@myproxy.host.name
cf start test-proxy

It is kind of a pain to have do this each time you push the application, so you could also put this in a manifest.yml file at the root of your project to make this easier:
---
applications:
- name: test-proxy
  path: <path-to-archive-relative-to-this-file>
  buildpack: https://github.com/my-cloning-account/java-buildpack.git
  env:
    HTTPS_PROXY: http://user:password@myproxy.host.name

Site-Wide Proxy

Around v180 of Cloud Foundry, a new feature called "Environment Variable Groups" was added to the platform.  Environment Variable Groups allow you provide a default environment variable setting for any application deployed to the platform.  Further, these environment variables could be explicitly set for either the staging phase or the runtime phase of the application.

This feature allows you to set a Staging Environment Variable Group entry for HTTPS_PROXY, and have it applied automatically to an application's staging process without the developer having to set it explicitly, and without that variable bleeding over into the application runtime environment and causing unintended side effects.

To use this feature, a user with administrative access needs to use the cf CLI to execute the following command (Linux shell form):

cf ssevg '{"HTTPS_PROXY":"http://user:pass@proxy.host.name"}'
Here's the correct form for Windows Command Prompt:
cf ssevg {\"HTTPS_PROXY\":\"http://user:pass@proxy.host.name\"}
Now when you stage your applications, this HTTPS proxy setting will be used automatically.  If you need to override this setting for a specific app, then you can just explicitly set the property using the method detailed above in the "One Off Proxy" section.

Final Thoughts

I commonly see secured Cloud Foundry deployments running in a mode where organizations will host their own Git repo for custom buildpacks, and then also host their own artifact repository to provide any dependencies for those buildpacks.  These sites are then managed by the development team so that new buildpacks can be tested and updated on demand.  Then, more control is applied as those applications move into production to better control what buildpacks are used to deploy applications into production.  Every situation is different, however, so you may have an easier time using one or more of the methods above.

You should realize that none of these methods might not help you if you use an runtime or language that retrieves dependencies at runtime.  For instance, it is common for Ruby apps to retrieve dependencies dynamically when started.  Usually, these technologies give you a way to cache any of these external dependencies before deployment (like the bundle command for Ruby).

Hope this helps explain your options!  Let us know in the comments about other strategies that you might think of or have seen to deal with buildpacks in a secured Cloud Foundry deployment.

Comments

Popular posts from this blog

Ghetto Cloud Foundry Home Lab

Using Snapshot Isolation with SQL Server and Hibernate

Fedora, Ant, and Optional Tasks