Azure, DevOps

Co-locate IaC with My Application

Over the last few years there has been a definite increase in the use of IaC (Infrastructure as Code) within cloud development and companies seem to choose a single team to create/maintain their infrastructure, or each application team to create/maintain their infrastructure, or a combination of the two.

No matter which direction is chosen there are some considerations in maintaining infrastructure as code.

  • Source control the IaC
  • Keep plain text secrets out of source control
  • Applying least privilege to who can change the production infrastructure
  • Reviewing the changes to the infrastructure
  • Easy to deploy infrastructure changes
  • Reliable infrastructure deployments

All of these things are easily achieved with any source control, a good review process and a well defined deployment pipeline, but this article is about “Should I co-locate my Infrastructure as Code with my application?

I have certainly co-located IaC myself when working in an applications team deploying to Azure, keeping ARM (Azure Resource Management) templates in a folder along side my application in source control, so when the code is built the infrastructure can also be created/updated during the deployment process.

There are many benefits to this as the team can:

  • Maintain a single repository to hold code and IaC
  • See what infrastructure they are responsible for
  • Decide/change what technology is used for their application e.g. changing the data store type or introducing a message bus, etc.
  • Apply monitoring/alerting to their application easily
  • Deploy to an environment knowing the required infrastructure will be created/updated

Sounds like I’ve answered the question, and that I agree that I should co-locate my IaC with my application, hmmm, well not quite, this only considers a single application and team and does not consider multiple teams or shared infrastructure.

So what if there are multiple teams, each independent and not requiring any shared infrastructure, in this scenario co-locating the IaC would give the same benefits as a single team/application.

Now what if there are multiple teams and shared infrastructure? Co-locating all the IaC in this scenario doesn’t make any sense as the shared infrastructure doesn’t belong to any application. The multiple teams could share this infrastructure and maintain it between them or another team could be responsible, depending on the team makeup.

Summary

I believe that co-locating the application specific IaC with the application code to be a good thing and gives the application team full control over their application. However any non-application specific IaC should be located away from the application, this maybe Virtual Networks, Cloudflare configuration, Application Gateways, etc.

However you maintain your infrastructure at the moment, consider this may change as applications grow or you introduce multiple applications. I suggest reviewing and revising how you manage infrastructure on a regular basis, continue to improve processes and practices and find the best way to build and maintain your cloud infrastructure.

Additional Information

If you are unfamiliar with IaC then I suggested looking at the following links:
Azure ARM templates
Azure CLI
Ansible
Terraform
Pulumi

Azure

Working with Azure Table Storage

I’ve been working with Azure Table Storage for a few years and find it really useful for storing logs or static data or even as a data recovery store. Table Storage is an incredibly cheap way to store data.

The first time I used Table Storage I thought it was great, but there were times it was slow and I had no idea why and then I couldn’t isolate it to perform unit testing.

Research

Why is it slow?

  • First off you need to design your data to be accessed quickly. A good place to start is the Storage Design Guide
  • Nagle’s Algorithm – I really had no idea about this or how much it mattered, fortunately there is a great article to explain (despite being from 2010 it’s still useful)
  • The default connection limit in ServicePointManager is 2

How Do I Unit Test?

  • I could use the Azure Storage Emulator to perform tests, but it feels wrong having an external process for my tests and my build server will need to run this emulator too. On top of that we consider it good practice to not rely on external entities for our tests
  • I could write a wrapper around the Table Storage API and use an interface into my code

Batching?

  • The Table Storage API provides the ability to bulk/batch inserts. But this type of insert requires the Partition Key to be the same for each entry. I have found this to be a problem when there is multiple partitions to insert at once.

Solution

I decided to build a generic wrapper than encompassed being able to isolate the storage and configure the settings e.g. Nagle’s Algorithm.

The Wrapper

The wrapper has a TableStoreFactory method that creates the table store connection, or it allows you to create a TableStore directly.

The code below shows a very small example of injecting the TableStoreFactory and changing the options from the defaults.

public class TableStorageClient
{
    private ITableStore<MyDto> _store;

    public TableStorageClient(ITableStoreFactory factory)
    {
        var options = new TableStorageOptions
        {
            UseNagleAlgorithm = true,
            ConnectionLimit = 100,
            EnsureTableExists = false
        };

        _store = factory.CreateTableStore<MyDto>("MyTable", "UseDevelopmentStorage=true", options);
    }
}

You could also inject the TableStore

public class TableStorageClient
{
    private ITableStore<MyDto> _store;

    public TableStorageClient(ITableStore<MyDto> store)
    {
        _store = store;
    }
}

Or simply create the store in code

var store = new TableStore<MyDto>("MyTable", "UseDevelopmentStorage=true", new TableStorageOptions());

Batching

To handle the batch insert with multiple partition keys, I added the ability to automatically split the batch by key and then insert them in batches of Partition Key and up to the Max 100 records per batch. Now I can just create my list of entries and call insert without having to worry about it.

var entries = new List<MyDto>
{
    new MyDto("John", "Smith") {Age = 21, Email = "john.smith@something.com"},
    new MyDto("Jane", "Smith") {Age = 28, Email = "jane.smith@something.com"},
    new MyDto("Bill", "Smith") { Age = 38, Email = "bill.smith@another.com"},
    new MyDto("Fred", "Jones") {Age = 32, Email = "fred.jones@somewhere.com"},
    new MyDto("Bill", "Jones") {Age = 45, Email = "bill.jones@somewhere.com"},
    new MyDto("Bill", "King") {Age = 45, Email = "bill.king@email.com"},
    new MyDto("Fred", "Bloggs") { Age = 32, Email = "fred.bloggs@email.com" }
};

_store.InsertAsync(entries)

Filtering

Another noteworthy feature is GetRecordsByFilter, this allows secondary data to be filtered before returning the result (the filtering is done by passing in a predicate). The downside here is that it is required to get all records and then perform the filter, testing showed ~1.3 seconds for 10,000 records but when using paging and returning the first 100 it was ~0.0300 seconds for 10,000 records.

// Without paging
_store.GetRecordsByFilter(x => x.Age > 21 && x.Age < 25);

// With paging
_store.GetRecordsByFilter(x => x.Age > 21 && x.Age < 25, 0, 100);

If there is a need to perform an action as the data is read from the table then there is support for Observable

var theObserver = _store.GetAllRecordsObservable();
theObserver.Where(entry => entry.Age > 21 && entry.Age < 25).Take(100).Subscribe(item =>
{
   // Do something with the table entry
});

The end result can be found on github and it is available on nuget. Others have introduced additions to this and they can be found on here and here

Cosmos DB

Table Storage does not support secondary indexes and global distribution, there was a Premium tier for Table Storage but now it is known as Cosmos DB.

The wrapper shown here will work with Cosmos DB but it does not support everything. For more details take a look at the FAQ’s and the new Table API.