CMake configuration for reproducible builds of C++ projects with gcc/Linux
This is a quick guide for configuring CMake such that every rebuild of the same C++ code generates bit-by-bit identical binaries. This is useful in verifying if a given binary is indeed generated from the given sources and comes in handy for configuration control. This article is not a detailed documentation on the reasons or mechanisms that cause binaries to differ from build to build nor the rationale behind the countermeasures explained in great detail. That information is already covered elsewhere and a few helpful links are provided in the end. This post is made because all of this information was not available in one place.
In general, the same C++ code should generate the same executable (or library) every time it is compiled using the same build system on the same platform. But in practice, several factors cause the final binaries to differ. There are a few things developers need to take care of to avoid these:
- Eliminate differences due to build path.
target_compile_options(projfoo PUBLIC "-ffile-prefix-map=${CMAKE_SOURCE_DIR}=.")
- Generate compile errors if macros like __DATE__ are used in the code.
target_compile_options(projfoo PUBLIC "-Werror=date-time") - Use filename as gcc's random seed (instead of random numbers) when it needs to generate unique symbol names
foreach(_file ${SOURCES})
set_property(SOURCE ${_file} APPEND_STRING PROPERTY COMPILE_FLAGS "-frandom-seed=${_file}")
endforeach() - For static library projects (.a), pass 'D' arguments to ar and ranlib for deterministic output
set(CMAKE_CXX_ARCHIVE_CREATE "<CMAKE_AR> qcD <TARGET> <LINK_FLAGS> <OBJECTS>")
set(CMAKE_CXX_ARCHIVE_FINISH "<CMAKE_RANLIB> -D <TARGET>")
- Point no. 2: if macros such as __DATE__, __TIME__ etc. cannot be avoided in the code, then setting the environment variable SOURCE_DATE_EPOCH can force them to deterministic fixed values.
- Point no. 3: The value for -frandom-seed needs to be unique for each source file. Some examples use a hash of the file contents. But if your project does not have source files with the same name, then the file name should suffice. This way, adding comments in the source files will not change the argument values. The example here assumes that the CMake project file has all the source files added to a CMake variable named SOURCES.
- Point no. 4: This is required only for projects that generate a static library file (libfoo.a). For C projects, use CMAKE_C_... instead of CXX equivalents in the example.
These instructions are specifically for gcc & Linux. Please refer to the other articles linked here for more comprehensive documentation.
References
- An introduction to deterministic builds with C/C++ - an excellent resource on the subject
- Reproducible builds - a comprehensive effort to ensure a verifiable path from source to binary code. They have excellent documentation.
- CMake source from github
- gcc documentation for -ffile-prefix-map and -frandom-seed
- Documentation of ar and ranlib from binutils
- There are some Stack Overflow questions that have useful information
All comments are most welcome.
Comments
Post a Comment