How to join repositories in CMake

Sometimes there is a need in a project to use directly some other repository (local or external). This means, that we want to be able to incorporate parts (or all) of sources of the imported repository into our build system. Usually, in such a case we would also like to track which version is used at a given time.

We can solve this problem in a few ways, e.g. by using:

  • git submodules,
  • git subtree,
  • CMake FetchContent,
  • CMake ExternalProject.

First two are handled by the version control system and the last two are handled by the build system. Each of them has its own strengths and weaknesses, depending on the current project needs.

Today I want to briefly present how CMake allows joining repositories with its FetchContent and ExternalProject modules.

FetchContent

Note

FetchContent is available since CMake 3.11.

The role of FetchContent is to obtain resources from a specified location and make it available to the rest of the build system. Note, that I used the word “resources” which can mean not only source codes, but also toolchains, artifacts from other builds, scripts, application icons etc.

Below you can see a typical usage:

include(FetchContent)

# 1.
FetchContent_Declare(
    platform
    GIT_REPOSITORY  https://github.com/kubasejdak/platform.git
    GIT_TAG         origin/master
)

# 2.
FetchContent_GetProperties(platform)
if (NOT platform_POPULATED)
    FetchContent_Populate(platform)
endif ()

In this example, we are downloading my platform repository and using the master branch. Here we can see, that it consists of two parts:

  1. define (declare) what should be downloaded and where it is,
  2. launch download and make it available (populate) to the rest of the build system.

Note, that platform (which I will reference later as <resource_name>) is later used as prefix in the related variables or argument in related functions, so make it meaningful and unique.

Everything happens at the “configure” (generation) stage, once CMake reaches FetchContent_Populate(platform) command. This is important to remember and understand, because generation step is usually done once and FetchContent can influence this process. Populate command can be used only once for every resource during cmake configuration, which explains the conditional above. Consider it the FetchContent idiom.

Note

Using FetchContent requires including proper module: include(FetchContent).

By default, everything is downloaded into your build directory in a well-defined structure:

${CMAKE_CURRENT_BINARY_DIR}/
  - _deps/
    - <resource_name>-build
    - <resource_name>-src
    - <resource_name>-subbuild

Note

I’m not exactly sure what is the purpose of the subbuild directory. After examining its contents it looks like CMake is generating CMakeLists.txt files there to implement FetchContent in terms of ExternalProject command. Maybe it was the easiest way of adding FetchContent, since ExternalProject was already available for a long time and can be reused at generation stage with a simple trick.

Once CMake successfully downloads our external content, it sets two variables that can be used in CMakeLists.txt to locate the new data:

  • <resource_name>_SOURCE_DIR – specifies the location of the downloaded sources,
  • <resource_name>_BINARY_DIR – specifies where is the default build directory for the downloaded sources.

Typically, we add that repository to our build system as if it was already part of our codebase with the following command:

add_subdirectory(${<resource_name>_SOURCE_DIR} ${<resource_name>_BINARY_DIR})

The second argument passed to add_subdirectory() is optional and tells CMake, that we don’t want to build that directory in a place basing on our repository structure (${CMAKE_CURRENT_BINARY_DIR}) but in a place specified by FetchContent. This nicely separates out compilation products from the external ones.

From now on, included directory is subject to all CMake settings of the main project and from the build system point of view is treated as the local sources.

ExternalProject

Note

ExternalProject is available since CMake 3.0.

ExternalProject command is almost identical to FetchContent in terms of the purpose and available options with, except for one extremely important difference: it is launched at build stage. In my opinion it is a crucial drawback comparing to FetchContent because external content is made available to us after our build system is already generated and all build decisions are already made. There is no way to add sources downloaded with this method into our compilation stage (or at least not a trivial way).

To compensate this problem, ExternalProject allows launching another CMake instance on the downloaded sources and pass custom command to be used to compile it. Let’s see this in an example:

include(ExternalProject)
ExternalProject_Add(
    platform
    GIT_REPOSITORY  https://github.com/kubasejdak/platform.git
    GIT_TAG         origin/master
    CMAKE_ARGS      -DPLATFORM=freertos-arm
)
add_dependencies(<some_target> platform)

Note

Using ExternalProject requires including proper module: include(ExternalProject).

The snippet above will perform the following actions:

  1. It will download specified repository into platform-src path (like in FetchContent).
  2. It will automatically call CMake in platform-build passing -DPLATFORM=freertos-arm:
cd platform-build
cmake -DPLATFORM=freertos-arm ../platform-src

Note

FetchContent offers the same possibility, but it also allows achieving the same thing in a more natural way via call to add_subdirectory().

Making a dependency to some other target with add_dependencies(<some_target> platform) command is required in order to tell CMake when all that actions should be actually done. With FetchContent we didn’t have that problem: generation is done sequentially in order of parsing CMakeLists.txt files. At build stage, CMake is composing a dependency graph of all defined targets in order to determine in which order all targets should be built. That graph is influenced by two commands:

  • target_link libraries(), where we implicitly say that in order to build one target we need to link with another one so it would be nice if it was already built,
  • add_dependecies(), where we explicitly say, that one target should be built before another one (even if they are not using each other).

Without making an explicit dependency between ExternalProject target and <some_target> CMake would simply ignore that step.

To be honest, once FetchContent became available in CMake I completely lost the purpose for using ExternalProject.

Customization

Both FetchContent and ExternalProject are highly customizable. Moreover, most of the options are available in both of them. Let’s see some examples along with short comments on usage.

Download methods

Both commands support the following download types:

  • Git,
  • SVN,
  • CVS,
  • Mercurial,
  • HTTP.

Each method has a typical set of settings you would normally expect, like branch name, commit/revision number, URL etc. I have personally used only Git in this context.

While doing so, for a long time I have specified master branch as the content source. The problem with such declaration is that every time something changes in the content’s upstream, it immediately gets populated to the places that use it. In my case, I had a few “utility” repositories, that were reused with FetchContent in highly active product repositories. For sure, using branch name relieves you from remembering to update FetchContent declaration in every repository once a change is introduced. But on the other side, if you are doing some breaking change, then it can create a lot of chaos. Especially if you have an active team. Trust me, you don’t want to hear all those complaints. So instead of branch name, specify tag or commit hash. The only drawback of this approach is that you need to remember to update that hash whenever necessary. It is also not obvious at first sight if the changed hash is newer or older in history without checking the Git log.

include(FetchContent)
FetchContent_Declare(
    platform
    GIT_REPOSITORY  https://github.com/kubasejdak/platform.git
    GIT_TAG         ee3ce49227e809aa3ef0f6270b4c996b4808cddf
)

FetchContent_GetProperties(platform)
if (NOT platform_POPULATED)
    FetchContent_Populate(platform)
endif ()

Fetching without network

Most fetching methods naturally assume, that you have an Internet/Intranet connection to your remote content. However there are situations, where you are totally offline (business trips, network problems, etc). This could literally block your work.

Fortunately there are two options you can use to work without connection to the upstream:

  • using local content copy on you hard drive (e.g. clone of the repo done while connection was still available),
  • forcing CMake to not try updating (checking for changes) of the content already downloaded by the previous FetchContent/ExternalProject run.

In the first case, you can simply replace repository URL with a path to the local repository copy. It works very well. The only observable difference is the fact that content will not be physically copied into you build directory – CMake will use the existing files that you are pointing to. This should hardly ever be an issue.

In the second case, you rely on the fact that you have successfully run FetchContent/ExternalProjecy at least once. Every time CMake is processing it, it checks if the content is up-to-date. In the offline scenario that check will result in an error. You can explicitly ask CMake to skip that part by adding additional parameter – UPDATE_DISCONNECTED ON:

include(FetchContent)
FetchContent_Declare(
    platform
    GIT_REPOSITORY        https://github.com/kubasejdak/platform.git
    GIT_TAG               origin/master
    UPDATE_DISCONNECTED   ON
)

FetchContent_GetProperties(platform)
if (NOT platform_POPULATED)
    FetchContent_Populate(platform)
endif ()

Note, that this will not skip downloading content for the first time. It will also not update your content if you have a healthy connection.

Hint

Local directory turned out to be the perfect solution if you need to make changes in fetched repository and check if it breaks dependent projects. It would be really annoying to do this using Git (it could generate a lot of trash commits).

Choosing the right place to run FetchContent

FetchContent runs at the generation stage, so choosing the place where it is declared determines which parts of the build system could interact with the fetched content.

For example, FetchContent can be run before project() function (both declare and populate statements). This could allow you to setup custom toolchain (via CMAKE_TOOLCHAIN_FILE) which is downloaded from the company’s server. On the other hand, utility codebases (extracted to the separate repositories just to avoid duplication) can be fetch just before the place, where you would like to call add_subdirectory() on it.

The second scenario is so popular, that CMake provides a special command FetchContent_MakeAvailable() which populates and calls add_subdirectory() for you. This assumes of course, that you want to enter to the root directory of the downloaded repository:

include(FetchContent)
FetchContent_Declare(
    tools
    GIT_REPOSITORY        https://github.com/kubasejdak/tools.git
    GIT_TAG               origin/master
)

FetchContent_MakeAvailable(tools)    # Populate and add_subdirectory() in one command.

If you need to get to the specific directory, then you have to do it by hand, e.g. my HAL project requires, that clients should use only lib directory, while root path is used to launch tests, generate documentation etc.:

include(FetchContent)
FetchContent_Declare(
    hal
    GIT_REPOSITORY        https://github.com/kubasejdak/hal.git
    GIT_TAG               v1.5b
)

FetchContent_GetProperties(hal)
if (NOT hal_POPULATED)
    FetchContent_Populate(hal)
endif ()

add_subdirectory(${hal_SOURCE_DIR}/lib ${hal_BINARY_DIR})   # <--- Note "lib" in the first argument.

I also like to put FetchContent commands in the separate files called <resource_name>.cmake and include them in other CMakeLists.txt where needed:

# cmake/platform.cmake

include(FetchContent)
FetchContent_Declare(
    platform
    GIT_REPOSITORY        https://github.com/kubasejdak/platform.git
    GIT_TAG               origin/master
)

FetchContent_GetProperties(platform)
if (NOT platform_POPULATED)
    FetchContent_Populate(platform)
endif ()
# cmake/osal.cmake

include(FetchContent)
FetchContent_Declare(
    osal
    GIT_REPOSITORY        https://github.com/kubasejdak/osal.git
    GIT_TAG               origin/master
)

FetchContent_GetProperties(osal)
if (NOT osal_POPULATED)
    FetchContent_Populate(osal)
endif ()
# cmake/hal.cmake

include(FetchContent)
FetchContent_Declare(
    hal
    GIT_REPOSITORY        https://github.com/kubasejdak/hal.git
    GIT_TAG               origin/master
)

FetchContent_GetProperties(hal)
if (NOT hal_POPULATED)
    FetchContent_Populate(hal)
endif ()
# Root CMakeLists.txt

# ...

include(cmake/platform.cmake)
include(cmake/osal.cmake)
include(cmake/hal.cmake)

# ...

add_subdirectory(${platform_SOURCE_DIR}/app ${platform_BINARY_DIR})
add_subdirectory(${hal_SOURCE_DIR}/lib ${hal_BINARY_DIR})

# ...

Summary

There are at least a few ways how you can join repositories in your source code. CMake offers FetchContent at generation stage and ExternalProject at build stage. Each comes with a rich set of options, which you can check in the official docs. Choose the right method for your specific case. In most cases FetchContent would suit you better.

If you liked this article, then please share it on the social media you use. It will greatly increase the chance that someone else will benefit from it. It will also motivate me to write more and better articles in the future.


Thanks for taking your time to read it. If you like it or have other opinions, please share it in the comments. You can also subscribe to my Twitter account to get the latest news about the articles. Happy coding!