CMake configuration for reproducible builds of C++ projects with gcc/Linux

This is a quick guide for configuring CMake such that every rebuild of the same C++ code generates bit-by-bit identical binaries. This is useful in verifying if a given binary is indeed generated from the given sources and comes in handy for configuration control. This article is not a detailed documentation on the reasons or mechanisms that cause binaries to differ from build to build nor the rationale behind the countermeasures explained in great detail. That information is already covered elsewhere and a few helpful links are provided in the end. This post is made because all of this information was not available in one place.

In general, the same C++ code should generate the same executable (or library) every time it is compiled using the same build system on the same platform. But in practice, several factors cause the final binaries to differ. There are a few things developers need to take care of to avoid these:

  1. Eliminate differences due to build path.

    target_compile_options(projfoo PUBLIC "-ffile-prefix-map=${CMAKE_SOURCE_DIR}=.")

  2.  Generate compile errors if macros like __DATE__ are used in the code.

    target_compile_options(projfoo PUBLIC "-Werror=date-time")

  3. Use filename as gcc's random seed (instead of random numbers) when it needs to generate unique symbol names

    foreach(_file ${SOURCES})
        set_property(SOURCE ${_file} APPEND_STRING PROPERTY COMPILE_FLAGS "-frandom-seed=${_file}")
    endforeach()


  4. For static library projects (.a), pass 'D' arguments to ar and ranlib for deterministic output

    set(CMAKE_CXX_ARCHIVE_CREATE "<CMAKE_AR> qcD <TARGET> <LINK_FLAGS> <OBJECTS>")
    set(CMAKE_CXX_ARCHIVE_FINISH "<CMAKE_RANLIB> -D <TARGET>")
Notes
 
  • Point no. 2: if macros such as __DATE__, __TIME__ etc. cannot be avoided in the code, then setting the environment variable SOURCE_DATE_EPOCH can force them to deterministic fixed values.
  • Point no. 3: The value for -frandom-seed needs to be unique for each source file. Some examples use a hash of the file contents. But if your project does not have source files with the same name, then the file name should suffice. This way, adding comments in the source files will not change the argument values. The example here assumes that the CMake project file has all the source files added to a CMake variable named SOURCES.
  • Point no. 4: This is required only for projects that generate a static library file (libfoo.a). For C projects, use CMAKE_C_... instead of CXX equivalents in the example.

These instructions are specifically for gcc & Linux. Please refer to the other articles linked here for more comprehensive documentation.

 
References
 
 
All comments are most welcome.

Comments

Popular posts from this blog

Qt - Enabling qDebug messages and Qt Creator

പേര്

ബിഗ് റേഡിയോ ഇതിഹാസം