Cross-Platform C++ for ML Inference¶
Reference for building C++ desktop applications with ML inference that target both Windows and macOS. Covers ONNX Runtime execution providers, GPU API differences, filesystem quirks, build system setup, and deployment.
GPU API Landscape¶
| API | Windows | macOS |
|---|---|---|
| DirectX 12 / DirectML | Primary for ML | N/A |
| Metal | N/A | Primary GPU API |
| CoreML | N/A | ML framework (GPU + Neural Engine) |
| CUDA | NVIDIA only | N/A (Apple removed NVIDIA) |
| Vulkan | Native | Via MoltenVK (slower than native Metal) |
| OpenCL | NVIDIA/AMD/Intel | Deprecated since macOS 10.14 |
| OpenGL | Legacy | Deprecated, frozen at 4.1 |
ONNX Runtime Execution Provider Selection¶
Ort::SessionOptions session_options;
#ifdef _WIN32
// Windows: CUDA > DirectML > CPU
try {
OrtCUDAProviderOptions cuda_opts{};
session_options.AppendExecutionProvider_CUDA(cuda_opts);
} catch (...) {
OrtSessionOptionsAppendExecutionProvider_DML(session_options, 0);
}
#elif defined(__APPLE__)
// macOS: CoreML > CPU
uint32_t coreml_flags = COREML_FLAG_ENABLE_ON_SUBGRAPH;
OrtSessionOptionsAppendExecutionProvider_CoreML(session_options, coreml_flags);
#endif
DirectML: works with any DX12 GPU (NVIDIA, AMD, Intel). One binary for all vendors. In maintenance mode (2026) - stable, no new features. Falls back to shared system memory when VRAM exhausted.
CoreML: uses Metal GPU + Neural Engine on Apple Silicon. Neural Engine is optimal for CNNs, GPU via Metal is faster for transformer inference. Must be explicitly registered at session creation.
Apple Silicon Specifics¶
Unified Memory Architecture:
- CPU and GPU share the SAME physical memory
- No data copying between CPU/GPU - pointer passing
- No separate VRAM - ML model and app share all RAM
- Bandwidth: 200-800 GB/s (vs 32 GB/s PCIe 4.0)
Neural Engine:
- Accessible ONLY through CoreML (no low-level API)
- Optimal for: quantized models, convolutions, matrix ops
- NOT optimal for: arbitrary compute, custom operators
- CNN inference: NE faster; Transformer inference: Metal GPU faster
M1 8GB: ~5-6 GB available for ML (OS + apps take the rest)
Rosetta 2 removal: Apple announced removal in macOS 28 (2027). Native arm64 build is mandatory.
Filesystem Differences¶
| Aspect | Windows | macOS |
|---|---|---|
| Path separator | \ (but / works in most APIs) | / |
| Max path | 260 chars (32,767 with \\?\ prefix) | No hard limit |
| Case sensitivity | No (NTFS default) | No (APFS default) |
| File locking | Mandatory (cannot delete open files) | Advisory (can delete open files) |
| App data | %APPDATA% | ~/Library/Application Support/ |
| Cache | %LOCALAPPDATA%\Temp\ | ~/Library/Caches/ |
| Config | %APPDATA% | ~/Library/Preferences/ |
// Always use std::filesystem::path - handles separators automatically
namespace fs = std::filesystem;
fs::path config = get_app_data_dir() / "MyApp" / "config.json";
// NEVER concatenate strings with "/" or "\\"
MAX_PATH fix: enable longPathAware in application manifest, or use \\?\ prefix for Win32 API. std::filesystem handles this automatically on newer SDKs.
Library: PlatformFolders abstracts standard directories cross-platform.
DLL/dylib Loading¶
Windows search order: EXE dir > System32 > Windows dir > CWD (dangerous!) > PATH. Fix: SetDllDirectory("") removes CWD from search.
macOS search: @executable_path > @loader_path > @rpath > DYLD_LIBRARY_PATH (disabled in Hardened Runtime!). Use @rpath and set via CMake:
Code Signing and Distribution¶
Windows (Authenticode + SmartScreen):
- OV or EV certificate (~$200-600/year)
- EV no longer gives instant SmartScreen trust (since Aug 2024)
- SmartScreen builds reputation over time - new publishers always get warnings
- Azure Trusted Signing available for US/Canada (since Oct 2025)
macOS (codesign + notarization + Gatekeeper):
- Developer ID via Apple Developer Program ($99/year)
- Hardened Runtime required for notarization
- Notarization: upload to Apple for automated scan, receive ticket
- Without notarization, app will not launch (macOS 10.15+)
- Hardened Runtime blocks
DYLD_INSERT_LIBRARIESinjection, disables ASLR override
Build System (CMake)¶
cmake_minimum_required(VERSION 3.20)
project(MyMLApp LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 20)
if(WIN32)
add_definitions(-DUNICODE -D_UNICODE)
set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>")
elseif(APPLE)
set(CMAKE_OSX_ARCHITECTURES "x86_64;arm64") # Universal Binary
set(CMAKE_OSX_DEPLOYMENT_TARGET "13.0")
endif()
find_package(onnxruntime REQUIRED)
target_link_libraries(myapp PRIVATE onnxruntime::onnxruntime)
if(WIN32)
target_link_libraries(myapp PRIVATE DirectML)
elseif(APPLE)
target_link_libraries(myapp PRIVATE
"-framework CoreML" "-framework Metal" "-framework Foundation")
endif()
Universal Binary pitfalls: all dependencies must also be universal. Use lipo to combine single-arch libraries. Check ONNX Runtime provides universal C API build.
Compiler Differences¶
| Aspect | MSVC (Windows) | Apple Clang (macOS) |
|---|---|---|
| C++ ABI | Microsoft ABI | Itanium ABI |
| stdlib | Microsoft STL | libc++ (LLVM) |
| Warnings | /W4 | -Wall -Wextra |
| Sanitizers | ASan | ASan, UBSan, TSan |
| Exception handling | SEH | DWARF/zero-cost |
| Debug symbols | PDB | dSYM (DWARF) |
Linking: static for C++ deps (onnxruntime, curl), dynamic for system frameworks and GPU runtime (CUDA, DirectML, CoreML).
Unicode Handling¶
// Internal: UTF-8 everywhere (std::string)
// Windows API boundary: convert to UTF-16
#ifdef _WIN32
std::wstring utf8_to_wide(const std::string& utf8) {
int len = MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), -1, nullptr, 0);
std::wstring wide(len - 1, 0);
MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), -1, &wide[0], len);
return wide;
}
// Use wide API: CreateFileW, LoadLibraryW
#endif
wchar_t is 2 bytes on Windows (UTF-16), 4 bytes on macOS (UTF-32). Never transfer wstring across platforms via IPC/files.
Library: Boost.Nowide - UTF-8 aware I/O on Windows.
Credential Storage¶
| Aspect | Windows | macOS |
|---|---|---|
| System | DPAPI + Credential Manager | Keychain Services |
| ACL per item | No - any app of same user can read | Yes - prompt on foreign app access |
| API | CredWrite/CredRead | SecItemAdd/SecItemCopyMatching |
Library: keychain - cross-platform C++ abstraction.
Networking (libcurl)¶
Use libcurl with native TLS backend per platform:
- Windows: libcurl + Schannel - uses Windows Certificate Store
- macOS: libcurl + Secure Transport (or OpenSSL with bundled CA)
This allows corporate proxy SSL inspection to work transparently. If using certificate pinning, add fallback to system store on verification failure.
Auto-Update¶
| Component | Windows | macOS |
|---|---|---|
| Framework | WinSparkle (Sparkle port) | Sparkle |
| Update file | Appcast XML | Appcast XML |
| Restart | Helper process needed (mandatory locking) | Can update in place |
Crash Reporting¶
Crashpad (Google, cross-platform) + Sentry (SaaS):
# Windows: upload PDB
sentry-cli debug-files upload --include-sources ./build/Release/*.pdb
# macOS: upload dSYM
sentry-cli debug-files upload --include-sources ./build/Release/*.dSYM
Installers¶
| Platform | Recommended | Notes |
|---|---|---|
| Windows | WiX (MSI) or NSIS (EXE) | MSI for corporate GPO deployment |
| macOS | DMG with drag-to-Applications | pkg inside DMG if system components needed |
Cross-Platform Pitfalls¶
| Pitfall | Details |
|---|---|
| Line endings | CR+LF (Win) vs LF (Mac). std::getline leaves trailing \r |
| Sleep precision | Sleep(1) = 15.6ms on Windows without timeBeginPeriod(1) |
| Locale decimal | atof("1.5") may parse as 1 in non-C locale. Force setlocale(LC_NUMERIC, "C") |
| Close window != quit | macOS: closing last window does NOT quit the app |
| Quarantine attrs | Downloaded files get quarantine mark: Zone.Identifier (Win) / com.apple.quarantine (Mac) |
Testing Matrix (Minimum)¶
| OS | Architecture | GPU | Priority |
|---|---|---|---|
| Windows 10 22H2 | x64 | NVIDIA (CUDA+DirectML) | Primary |
| Windows 10 22H2 | x64 | AMD (DirectML) | Secondary |
| Windows 10 22H2 | x64 | Intel iGPU (DirectML) | Edge case |
| Windows 11 24H2 | x64 | NVIDIA | Primary |
| macOS 14+ | arm64 (M1-M4) | CoreML/Metal | Primary Mac |
| macOS 14+ | x86_64 (Intel) | CPU | Legacy |
Critical test cases: first launch (SmartScreen/Gatekeeper), GPU OOM graceful fallback, unicode in user paths (Cyrillic), sleep/wake GPU recovery, multiple GPU selection (dGPU vs iGPU on Windows).
Gotchas¶
- Mandatory vs advisory file locking: on Windows, you cannot update a DLL/model while it is loaded. Need: close file, update, reopen, or staged update with rename on restart. macOS allows replacement but process keeps old inode until close
- DirectML is in maintenance mode (2026): stable API, no new features. Production-safe but no performance improvements coming
- Apple Silicon apps must be arm64 native: Rosetta 2 removal in macOS 28 (2027) means x86_64-only binaries stop working. Build Universal Binary now
- CoreML conversion can lose precision: not all ONNX ops supported in CoreML. Some models need the ONNX Runtime CoreML EP as fallback instead of direct CoreML conversion
std::stringSSO buffer size differs: MSVC STL and libc++ have different small string optimization implementations. Never assume binary layout of std::string across platforms
See Also¶
- cmake build systems - CMake configuration patterns
- concurrency - platform threading differences
- error handling - exception handling across compilers
- Low VRAM Inference Strategies - GPU memory optimization techniques