Azure

Working with Azure Table Storage

I’ve been working with Azure Table Storage for a few years and find it really useful for storing logs or static data or even as a data recovery store. Table Storage is an incredibly cheap way to store data.

The first time I used Table Storage I thought it was great, but there were times it was slow and I had no idea why and then I couldn’t isolate it to perform unit testing.

Research

Why is it slow?

  • First off you need to design your data to be accessed quickly. A good place to start is the Storage Design Guide
  • Nagle’s Algorithm – I really had no idea about this or how much it mattered, fortunately there is a great article to explain (despite being from 2010 it’s still useful)
  • The default connection limit in ServicePointManager is 2

How Do I Unit Test?

  • I could use the Azure Storage Emulator to perform tests, but it feels wrong having an external process for my tests and my build server will need to run this emulator too. On top of that we consider it good practice to not rely on external entities for our tests
  • I could write a wrapper around the Table Storage API and use an interface into my code

Batching?

  • The Table Storage API provides the ability to bulk/batch inserts. But this type of insert requires the Partition Key to be the same for each entry. I have found this to be a problem when there is multiple partitions to insert at once.

Solution

I decided to build a generic wrapper than encompassed being able to isolate the storage and configure the settings e.g. Nagle’s Algorithm.

The Wrapper

The wrapper has a TableStoreFactory method that creates the table store connection, or it allows you to create a TableStore directly.

The code below shows a very small example of injecting the TableStoreFactory and changing the options from the defaults.

public class TableStorageClient
{
    private ITableStore<MyDto> _store;

    public TableStorageClient(ITableStoreFactory factory)
    {
        var options = new TableStorageOptions
        {
            UseNagleAlgorithm = true,
            ConnectionLimit = 100,
            EnsureTableExists = false
        };

        _store = factory.CreateTableStore<MyDto>("MyTable", "UseDevelopmentStorage=true", options);
    }
}

You could also inject the TableStore

public class TableStorageClient
{
    private ITableStore<MyDto> _store;

    public TableStorageClient(ITableStore<MyDto> store)
    {
        _store = store;
    }
}

Or simply create the store in code

var store = new TableStore<MyDto>("MyTable", "UseDevelopmentStorage=true", new TableStorageOptions());

Batching

To handle the batch insert with multiple partition keys, I added the ability to automatically split the batch by key and then insert them in batches of Partition Key and up to the Max 100 records per batch. Now I can just create my list of entries and call insert without having to worry about it.

var entries = new List<MyDto>
{
    new MyDto("John", "Smith") {Age = 21, Email = "john.smith@something.com"},
    new MyDto("Jane", "Smith") {Age = 28, Email = "jane.smith@something.com"},
    new MyDto("Bill", "Smith") { Age = 38, Email = "bill.smith@another.com"},
    new MyDto("Fred", "Jones") {Age = 32, Email = "fred.jones@somewhere.com"},
    new MyDto("Bill", "Jones") {Age = 45, Email = "bill.jones@somewhere.com"},
    new MyDto("Bill", "King") {Age = 45, Email = "bill.king@email.com"},
    new MyDto("Fred", "Bloggs") { Age = 32, Email = "fred.bloggs@email.com" }
};

_store.InsertAsync(entries)

Filtering

Another noteworthy feature is GetRecordsByFilter, this allows secondary data to be filtered before returning the result (the filtering is done by passing in a predicate). The downside here is that it is required to get all records and then perform the filter, testing showed ~1.3 seconds for 10,000 records but when using paging and returning the first 100 it was ~0.0300 seconds for 10,000 records.

// Without paging
_store.GetRecordsByFilter(x => x.Age > 21 && x.Age < 25);

// With paging
_store.GetRecordsByFilter(x => x.Age > 21 && x.Age < 25, 0, 100);

If there is a need to perform an action as the data is read from the table then there is support for Observable

var theObserver = _store.GetAllRecordsObservable();
theObserver.Where(entry => entry.Age > 21 && entry.Age < 25).Take(100).Subscribe(item =>
{
   // Do something with the table entry
});

The end result can be found on github and it is available on nuget. Others have introduced additions to this and they can be found on here and here

Cosmos DB

Table Storage does not support secondary indexes and global distribution, there was a Premium tier for Table Storage but now it is known as Cosmos DB.

The wrapper shown here will work with Cosmos DB but it does not support everything. For more details take a look at the FAQ’s and the new Table API.