January 3, 2019 · Cloud Computing Software Engineering

Configuration as Code

I recently got to work with something colleagues described as configuration as code. A search for configuration as code does not come up with many details but it is a thing, mentioned in airbnb blog.

one (and only) formal definition is given as:

Configuration as code is the formal migration of config between environments, backed by a version control system.

Essentially what this means is that we have a release pipeline for configuration files. We may have processes where each change is tested in different environments before a move to production.

Good, why and when do we need it?

Background

With the advent of cloud computing, we saw a move towards "as service". IaaS (Infrastructure as a Service) and PaaS (Platform as a Service) arrived. As companies started to move towards cloud and even cloud-first, the next logical move was to create one's own Infrastructure as Service offerings, at least internally within a company.

That required a lot of dependence on tools such as Chef, Puppet etc. But these tools were geared more towards configuration management and not toward creating and maintaining large cloud infrastructures. Enter new tools - Terraform, AWS Cloud Formation. These tools allow one to create large clusters in cloud with a few descriptive files. Each environment(Dev, QA, Prod etc) needs its own set of such files, rapidly increasing number of such files one needs to manage.

And we established new set of practices and called them "Infrastructure as Code" in which those files are treated as code, maintained in a repo and are versioned. A good explanation of benefits is here.

The next idea in "as code" paradigm is "Configuration as Code" where we (are supposed to) treat configuration as code and derive the same benefits that come from treating infrastructure as code.

What is Configuration?

The overall application behaviour does not depend on the config, but the config enables an application to

Anything that can be tweaked to change application behaviour is configuration. Few examples:

Saving all these individually as code does not make much sense though. In fact, these individual pieces may already be stored in a store (e.g. sql database) so that they are used in build/deployment process for a given environment or are read by code during execution.

Where does configuration as code fit then?

Application

The idea of configuration as code becomes appealing specifically in big data/hadoop ecosystem based systems where hundreds/thousands of data points are on-boarded, processed and maintained. Overtime field changes, data requirement changes, data type changes are frequent as well. Also, the frameworks (Apache Oozie, Azkaban, NiFi etc) for creating jobs/flows are all configuration based.

It becomes essential to know who did what, when for such systems. Putting hundreds of such "configs" in a version control system makes perfect sense.

Configuration as code is a pragmatic choice when you are dealing with large number of config files., a commonality in cloud & hadoop based work streams.

Configuration as Code in practice

Let's see how configuration as code looks in practice. Configuration becomes first class citizen and:

I have seen two flavours of how this is done in practice:

With the first approach, I have few concerns:

The second approach of having different repos for configs is not practical for non-cloud, non-Hadoop applications where there are a few configs to manage. It only makes sense for where we are talking about several tens to hundreds of configs.

Separate repos for code and config has an edge:

(I have seen both approaches in practice and do not like the first one. The setup in which I had to work with it was a pathetically slow build system.)

Problems?

There are mainly two concerns with this idea:

  1. how we manage and store configs
  2. storing sensitive/secrete information
Storage

The problem (at least personally) is how is this treated in an organization. I have seen two ways:

I like the idea of configuration as flat files and treating them as code for a substantially larger number of configurations.

Configuration as literal code is something I fail to understand and think of it as an idea taken too far because of "developer-high" (decision made by developers thinking about benefits to them and making their lives "easier"). I find similar thoughts here.

Let's take the example of airflow. Each data pipeline needs to be written in python, which I think is not very well thought:

Imagine if apache web server needed a build pipeline (or code change) for each new website that it deployed? Nightmare. It may have worked as an internal tool in some company but would have never had the deep adaptation it has.

Sensitive Information

Another challenge with config as code is storing secret information: usernames, passwords, key files names. This is a problem faced by NiFi templates as well where the templates do not allow sensitive fields to be stored.

These can be read at the deployment time from a trusted store to be put into the config file.

Conclusion

I like the idea of configuration as code for cloud/hadoop based systems but only if the config files are flat. Having them as literal code is taking it too far. It makes more sense to create a parser for flat files and generate deployments for each config then to have thousands of lines of similar code.

Comments: