C++20 Modules, CMake, And Shared Libraries

      No Comments on C++20 Modules, CMake, And Shared Libraries


C++20 Modules, CMake, And Shared Libraries

April 4, 2024 by Craig Scott

CMake 3.28 was the first version to officially support C++20 modules. Tutorials and examples understandably tend to focus on the fairly simple scenario of building a basic executable, perhaps also adding in a static library. The import CMake; the Experiment is Over! blog article from Kitware is perhaps one of the best known, and it covers exactly these things. And while these are important first steps, stopping there sells the reader short. The real fun starts when building, installing, and consuming shared libraries. This quickly exposes current limitations of toolchains and the build system, and it also highlights common misconceptions about what modules provide.

Estimated reading time: 13 minutes

Table of Contents

Linker Symbol Visibility

NOTE: For a detailed background on linker symbol visibility and shared libraries, see my Deep CMake For Library Authors talk from CppCon 2019. This article here assumes the reader is familiar with a number of topics from that talk.

Consider the following example:

cmake_minimum_required(VERSION 3.28.2)
project(cxx_modules_example LANGUAGES CXX)

# These will be discussed later in the article
set(CMAKE_CXX_EXTENSIONS FALSE)
set(CMAKE_CXX_VISIBILITY_PRESET hidden)
set(CMAKE_VISIBILITY_INLINES_HIDDEN TRUE)

add_library(Algo SHARED)

target_sources(Algo
    PRIVATE
        algo-impl.cpp
    PUBLIC
        FILE_SET CXX_MODULES
        FILES
            algo-interface.cppm
)

# CMake requires the language standard to be specified as compile feature
# when a target provides C++20 modules and the target will be installed 
target_compile_features(Algo PUBLIC cxx_std_20)

A common misconception among developers is that C++20 modules replace the need to handle symbol visibility, but this is not the case. Modules still require symbol visibility control, just like non-module code. This likely springs from confusion between reachability, which is a compile-time concept, and symbol visibility, which is more a link-time constraint.

A shared library might want to define and use a C++20 module in its implementation, but not expose that module as part of the shared library’s public API. This is an example of where a module needs to be reachable when building the shared library, but not visible at link time. On the other hand, a library might want a C++20 module exposed in its public API, and for that it needs to export symbols to make them visible, just like non-module code.

The generate_export_header() command provided by CMake’s GenerateExportHeader module is the method of choice for managing symbol visibility in most CMake projects. This remains the case for C++20 module code as well. The CMAKE_CXX_VISIBILITY_PRESET variable is set to hidden and CMAKE_VISIBILITY_INLINES_HIDDEN to TRUE to ensure that only the symbols we want to export are made visible as part of our public API. The generate_export_header() command then gives us a header that provides compiler definitions we can use to explicitly annotate the things we want included in our API. The following demonstrates how one might add that to the CMakeLists.txt of the previous example:

include(GenerateExportHeader)
generate_export_header(Algo)
target_sources(Algo
    PUBLIC
        FILE_SET HEADERS
        BASE_DIRS ${CMAKE_CURRENT_BINARY_DIR}
        FILES
            ${CMAKE_CURRENT_BINARY_DIR}/algo_export.h
)

The above is exactly the same as how we might do it for a project without C++20 modules. Perhaps surprisingly, we add the generated header to the PUBLIC HEADERS file set rather than a PRIVATE file set, even though nothing outside the Algo target should need to read the header. If the shared library is only ever linked to by things in the same build (i.e. not from a scenario where the library has been installed), then this could indeed be a PRIVATE file set. Things change once the library is installed though, as will be discussed later in this article.

The C++ source code also follows a similar pattern to the non-module case. The interface source file brings in the generated header via #include, and we annotate the things within that interface file that we want to make part of the public API. The implementation source file doesn’t need to contain any visibility-related items. Here’s how these might look for the current example:

algo-interface.cppm

module;

#include <algo_export.h>   // <-- Generated header added to the global fragment

export module algo;   // <-- Annotation not currently required, but see discussion below

export class ALGO_EXPORT Algo    // <-- ALGO_EXPORT annotation added to the class definition
{
public:
    void helloWorld();
};

algo-impl.cpp

module;

#include <iostream>

module algo;

void Algo::helloWorld()
{
    std::cout << "hello world\n";
}

The generated header algo_export.h is included in the global fragment of the interface file (after the module keyword given on its own at the top of the file). This makes everything the generated header defines available for use in defining the interface itself, but consumers of the module cannot directly use anything defined by the header. The global fragment is not directly part of the interface exposed by the module. But as will be seen later, everything in the global fragment may need to be available to module consumers.

We put the ALGO_EXPORT annotation on the Algo class, just like we would if Algo was defined outside any C++20 module. This annotation makes the whole class visible by exporting all its symbols for the linker. The fact that the class is inside a module doesn’t change how we do that, although it will likely change the mangled name of the class’ symbols as seen by the linker (see this article for a basic introduction to name mangling). For example, with clang 17 on macOS, the mangled name for Algo::helloWorld() inside the algo module is __ZNW4algo4Algo10helloWorldEv, whereas if the class is defined outside any module, its mangled name is __ZN4Algo10helloWorldEv. Note how the mangled name includes the algo module name for the symbol defined inside the module. This follows the strong module ownership model, where the same symbol defined in different modules declares different entities (i.e. void doSomething() in module A would be distinct from void doSomething() defined in module B). The alternative (weak module ownership) has essentially been abandoned by toolchain vendors, and it may eventually be removed from the standard due to it being problematic to implement.

Currently, the clang and Visual Studio compilers don’t require annotating the export module algo line that declares the start of the module interface. There is some debate over whether an annotation should be required there, or whether to work it out automatically based on whether any part of the module is made visible. Toolchains seem to be favouring working it out for themselves at the moment, but this might not remain so.

Consequences Of BMI Implementation Details

Some may ask why the Algo::helloWorld() implementation is split out to a separate file. This looks an awful lot like the existing header files pattern, but weren’t modules meant to free us from all of that? Conceptually, a module could be fully defined in a single source file, and its interface extracted from that for consumers to use without needing access to the source file. Separating the interface and implementation should not be necessary. But in practice, toolchain implementation details mean if we don’t separate them, we would have to install that source file along with the library (for reasons to be discussed shortly). Providing the full source files would be a non-starter for most closed-source applications, so the separation is usually desirable. When separated, only the file containing the module interface needs to be provided along with the shared library, since that’s all the toolchain needs when compiling a consumer. This also has the added advantage that the consumer doesn’t have to be recompiled if the implementation of the module changes, but not its interface. This is just like the analogous case with non-module code and headers defining interfaces but not implementations.

The BMI (built module interface) file for the module is what drives the above need for separation. The BMI is a binary representation of the module’s interface. When a compiler is compiling source code that uses the module (“consuming” the module), it needs the module’s BMI to know the module’s interface and how to use it. This is analogous to the role of header files in the non-C++20-module world. A key difference is that BMIs capture not only the code’s API (i.e. what header files traditionally do), they also capture binary aspects of how the module was built. They can embed details like the language standard used, compiler definitions, flags that affect the binary layout of symbols or how code is generated, and so on. These are all determined when the shared library is built. But when a compiler is consuming a module, it needs a BMI that matches the settings of the consumer. And as we will see shortly, when that module is delivered by a shared library, the BMIs are pretty much never the same between the producer and consumer of a module.

A critical problem is the compiler definition used to mark symbols for export. That definition needs to evaluate to different values depending on whether you are exporting (building) or importing (consuming) the module. For example, with the Visual Studio toolchain, our ALGO_EXPORT annotation might need to evaluate to something like __declspec(dllexport) when building the library, and __declspec(dllimport) when consuming the library (issue 25539 in the CMake issue tracker goes into more detail on this specific topic). A direct consequence of this is that when packaging the shared library for distribution, the package cannot simply provide the BMI it used when building the shared library. It would have to generate a new, separate BMI for installation, and CMake doesn’t currently provide such a facility.

Even if CMake could provide a separate BMI for consumers, it is still unlikely that it would be useful. Currently, toolchains are typically very sensitive to any differences that affect the BMI. Even a minor or patch difference in the compiler version could be enough to render a BMI unusable by a consumer. The package vendor could never hope to provide BMIs for all the different variations of compilers, flags, etc. that consumers may use. Because of this, the consumer effectively always has to create its own BMI with the compiler settings it needs. In order for it to do that, it needs the source code that defines the module interface. So we’re back to essentially having to provide some sort of equivalent to header files for our module interface definition to avoid exposing the implementation details! That sounds like modules are not delivering on their promise, but they do still provide benefits, just maybe not all the ones you thought you were getting. 😉

Installing Shared Libraries With C++20 Modules

Before C++20 modules, installing an ordinary shared library target in CMake mostly meant installing the shared library binary and the headers that define the library’s public API. When a shared library includes C++20 modules in its public API, we now also need a way to provide BMIs for the library’s C++20 modules. As discussed in the previous section, we can’t reliably provide the BMIs directly (although CMake does provide the ability to do that). Instead, we install the interface sources for the module and let CMake generate a BMI on-the-fly for the consumer.

One consequence of this is that while headers included in a module interface’s global fragment are not directly part of the module interface, those headers must still be available to module consumers. Without them, the consumer cannot create their own BMI for the module. This is why the generated algo_export.h header was put in a PUBLIC file set rather than a PRIVATE one earlier in this article. This ensures the header is installed along with the target, and made available to consumers, just like any other public header that forms part of the shared library’s API.

Continuing our earlier example, the installation commands might look something like this:

include(GNUInstallDirs)

install(TARGETS Algo
    EXPORT my_package-targets
    # ... a few details omitted, see the "Deep CMake For Library Authors" talk
    FILE_SET CXX_MODULES
        # There's currently no convention for this location, see discussion below
        DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/my_package/src
    FILE_SET HEADERS
        DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}   # Same as default, could be omitted
    INCLUDES
        DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
)
install(EXPORT my_package-targets
    DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/my_package
    CXX_MODULES_DIRECTORY .
)
# The following file includes the my_package-targets.cmake file
install(FILES my_package-config.cmake
    DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/my_package
)

The DESTINATION for the FILE_SET CXX_MODULES in the install(TARGETS...) call tells CMake where to install the module interface sources. There’s currently no strong convention for where to put these in the installed layout, but putting it somewhere below where the CMake config package file is located is a reasonable choice. If the module name is part of the source file name, then it may be reasonable to install the module interface sources to a common directory, as shown in the above example. If file name clashes between modules are a potential issue, module-specific subdirectories might be necessary.

The CXX_MODULES_DIRECTORY option to install(EXPORT...) tells CMake where it should install some additional files it generates for each module in the export set. Again, there’s no strong convention yet for this location, but since the files are strongly coupled to the exported targets file (my_package-targets.cmake), they should go in the same directory as that exported targets file, or a subdirectory below that. A relative path given to CXX_MODULES_DIRECTORY is treated as relative to where the export file is installed to, so that’s a strong hint not to install these to somewhere outside that directory. CMake 3.28.2 and later generate files with names that include the name of the EXPORT set. This ensures names don’t clash if including multiple export sets in the one CMake package. Earlier CMake versions did not include the name of the EXPORT set in the generated file names, which is why we took the unusual step of including the CMake patch version in the minimum CMake version requirement at the very beginning of the first example (3.28.2 instead of just 3.28):

cmake_minimum_required(VERSION 3.28.2)

The relative layout of the installed files resulting from the above might look something like the following (this would be for installing a Release build on macOS):

<BASE>
 +-- include
 |    +-- algo_export.h
 +-- lib
      +-- cmake
      |    +-- my_package
      |         +-- src
      |         |    +-- algo-interface.cppm
      |         +-- cxx-modules-my_package-targets.cmake
      |         +-- cxx-modules-my_package-targets-Release.cmake
      |         +-- my_package-config.cmake
      |         +-- my_package-targets.cmake
      |         +-- my_package-targets-release.cmake
      |         +-- target-Algo-Release.cmake
      +-- libAlgo.dylib

Consuming Installed C++20 Modules

The following is a minimal project demonstrating how to consume the installed shared library with C++20 modules from the earlier examples:

cmake_minimum_required(VERSION 3.28)
project(cxx_modules_consumer LANGUAGES CXX)

# Hard-coded path just for illustration. Prefer relying on CMAKE_PREFIX_PATH instead.
find_package(my_package REQUIRED PATHS /path/to/<BASE>)

# Neither of these two are technically needed, but they make the expectation clear
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_EXTENSIONS FALSE)

add_executable(app main.cpp)
target_link_libraries(app PRIVATE Algo)

The above example looks no different to any other consumer project without C++20 modules. But a shared library that provides C++20 modules as part of its public API places stronger conditions on its consumers than for the non-module case. One of the more important additional constraints is that consumers must generally use the exact same language standard version as the one the shared library was built with. The BMI that CMake generates on-the-fly for the consumer uses the same C++ language standard as the shared library, not the one used by the consumer. This is essential, because the BMI must correspond to the built module binary. If the consumer tries to build its own module-consuming C++ code with a different language standard, toolchains will usually reject the BMI provided by CMake due to the mismatch. Change the CMAKE_CXX_STANDARD to 23 in the above example to see this in action. With clang 17, an error like the following would result:

error: C++23 was disabled in PCH file but is currently enabled
error: module file CMakeFiles/Algo@synth_838524ceec9e.dir/163bc42a95ce.bmi cannot be loaded due to a configuration mismatch with the current compilation [-Wmodule-file-config-mismatch]

This means the consumer cannot use a newer C++ standard than the shared library. Such a restriction is not typically present for non-module cases (yes there are caveats, but in general, you’ll likely get away with it for average code).

CMake also enforces another related condition on projects that install C++20 modules. For non-module code, you can control the C++ language standard used to build the module’s shared library target by setting the CXX_STANDARD target property (this is normally set project-wide with the CMAKE_CXX_STANDARD variable). However, this property is not sufficient for targets that will be installed. The CXX_STANDARD target property only affects building the shared library, it does not get carried through when the target is installed. Compile features do propagate though, and CMake relies on this for BMI creation for the consumer. CMake therefore requires installed targets to specify the language standard with a compile feature if they provide any C++20 modules. The example at the beginning of this article included the following line to achieve this:

target_compile_features(Algo PUBLIC cxx_std_20)

A project will build and even install successfully with just the CXX_STANDARD property and no compile feature. But when another project tries to consume the installed module target, CMake will halt with a fatal error. The installed target lacks the information CMake needs to be able to provide a BMI for the consumer.

Perversely, the same is not true for the CXX_EXTENSIONS target property, which does propagate to the installed target. There is no equivalent compile feature for this property, so the target property is the only way this setting can be specified. This property setting also needs to be consistent between the installed shared library and its consumer, otherwise most toolchains will once again reject the consumer’s generated BMI.

There is one more scenario worth highlighting, and this only appears to affect the Clang toolchain, and only Clang 18.0 or older (18.1 has a fix). If the consumer also wants to define its own shared library target and set CXX_VISIBILITY_PRESET to hidden on that target, a problem arises with the BMI generation for the consumer. The internal Clang invocation that CMake uses to generate a BMI will have a different visibility setting than the consumer, and Clang 18.0 and older will fail with an error similar to this:

error: default visibility for functions and variables [-fvisibility] differs in PCH file vs. current file
error: module file CMakeFiles/Algo@synth_838524ceec9e.dir/163bc42a95ce.bmi cannot be loaded due to a configuration mismatch with the current compilation [-Wmodule-file-config-mismatch]

The fix added in Clang 18.1 removes the visibility settings from the things that must be consistent for a BMI. A potential workaround if you need to use Clang 18.0 or older is to set the CXX_VISIBILITY_PRESET property to hidden on the installed imported target (Algo in this example):

# Algo is an imported target that we shouldn't touch, but we need this if
# consumers have this set on the consuming target(s)
set_target_properties(Algo PROPERTIES CXX_VISIBILITY_PRESET hidden)

Normally, a consuming project shouldn’t modify the target defined by an installed dependency like this, but there doesn’t seem to be an alternative workaround at this time other than use Clang 18.1 or later. You can track progress on the CMake side of this particular problem in issue 25868 in CMake’s issue tracker.

CMake Generator Limitations

The previous sections paint the picture that, provided some basic guidelines are followed, projects can install shared libraries, and they can be consumed much like any other non-module case, perhaps with one or two minor caveats or workarounds. At the time of writing, this is only true if the installing and consuming CMake projects are both using either the Ninja or Ninja Multi-Config generators. The Visual Studio 17 2022 generator handles part of the picture, but it cannot install a shared library that provides C++20 modules. It also cannot consume modules installed by a project built with one of the Ninja generators (the Visual Studio generator lacks support for generating the BMIs for the consumer). These are gaps in CMake’s implementation, not in Visual Studio. None of the other CMake generators have any support for C++20 modules at all.

Over time, we would expect CMake’s generators to improve their support of C++20 modules. The toolchains themselves still have limitations as well, so it will take some evolution of all the moving parts together before we finally have more or less full functionality available. See the cxxmodules manual for the latest status of CMake’s C++20 modules support.

Leave a Reply

Your email address will not be published. Required fields are marked *