Azure Pipelines

MonoRepos with Azure Pipelines

Talking with a friend of mine a few weeks ago, we discussed monorepos and how to support them in Azure Pipelines using either GitHub or Azure repos. My friend had multiple mirco services in a monorepo in GitHub and wanted to avoid checking out the whole repo each time to build only a part of it.

After lots of research and reading I came across a git command sparse-checkout, this command has been in git since 2.25 and is marked as experimental.

At the time of writing Azure Pipelines ubuntu-latest and windows-latest showed git 2.33.1 as the git version in use on the Microsoft Build agents.

I went back to my friend to tell him what I had found and he had come to the same conclusion and we decided to share this in case anyone else was wondering the same.

So how do we use this command in an Azure Pipeline. First thing to do is stop Azure Pipelines checking out the code automatically using the checkout step with a value of none.

- checkout: none

Then we need to manually setup connecting to the git repo, we can do this in a script step performing the same steps that happen under the checkout step and include the sparse-checkout configuration at the same time.

- script: |
    git init
    git sparse-checkout init --cone
    git sparse-checkout add $(folders)
    git remote add origin $(Build.Repository.Uri)
    git config core.sparsecheckout true
    git config gc.auto 0
    git config --get-all http.$(Build.Repository.Uri).extraheader
    git config --get-all http.proxy
    git config http.version HTTP/1.1

The sparse-checkout add uses a space separated list of folder names for the ones you wish to checkout. Azure Pipelines has some predefined build variables which are really handy for getting the URI, etc.

Now that’s setup we can perform a fetch and a checkout to grab the source code and only get the files we need based on the sparse-checkout configuration.

git fetch --force --tags --prune --prune-tags --progress --no-recurse-submodules --verbose --depth=1 origin
git checkout --progress $(Build.SourceBranchName)

Another handy predefined variable is used to get the branch name.

So, the full pipeline to just perform the checkout part of a monorepo build looks like this:

trigger:
  - main

pool:
  vmImage: ubuntu-latest # or windows-latest

variables:
  folders: 'folder1 folder2'

steps:
- checkout: none
- script: |
    git init
    git sparse-checkout init --cone
    git sparse-checkout add $(folders)
    git remote add origin $(Build.Repository.Uri)
    git config core.sparsecheckout true
    git config gc.auto 0
    git config --get-all http.$(Build.Repository.Uri).extraheader
    git config --get-all http.proxy
    git config http.version HTTP/1.1
    git fetch --force --tags --prune --prune-tags --progress --no-recurse-submodules --verbose --depth=1 origin
    git checkout --progress $(Build.SourceBranchName)
  displayName: 'Clone Partial git repo'

The configuration core.sparseCheckoutCone allows a more restrictive pattern set to be added, for more information see the git documentation.

This all worked fine using a GitHub repository that is connected via the GitHub Azure Pipelines App.

Note: GitHub Apps are the officially recommended way to connect to GitHub (see the docs)

It is a real shame that this just doesn’t work as is for Azure Pipelines but fortunately only a minor change is needed in order for it to work :). You just need to add a git config line or change the git fetch command to include a token and that token can be accessed by a special predefined variable System.AccessToken (see the Microsoft docs for more information).

git config addition:

git config http.$(Build.Repository.Uri).extraheader "AUTHORIZATION: bearer $(System.AccessToken)"

update to the git fetch:

git -c http.extraheader="AUTHORIZATION: bearer $(System.AccessToken)" fetch --force --tags --prune --prune-tags --progress --no-recurse-submodules --depth=1 origin

This was ran using both Microsoft ubuntu and windows agents and it was successful in checking out only the folders specified in the folder list as well as any root files using GitHub and Azure repos.

It has been a lot of fun finding out about the sparse-checkout feature of git and building a pipeline to take advantage of it with GitHub and Azure Repos. Hopefully others will find it useful too and the future of this feature in git is one to watch.

I hope sharing this helps with building monorepos and highlighting the sparse-checkout git feature.