Skip to content

Cross-Platform C++ for ML Inference

Advanced

Reference for building C++ desktop applications with ML inference that target both Windows and macOS. Covers ONNX Runtime execution providers, GPU API differences, filesystem quirks, build system setup, and deployment.

GPU API Landscape

API Windows macOS
DirectX 12 / DirectML Primary for ML N/A
Metal N/A Primary GPU API
CoreML N/A ML framework (GPU + Neural Engine)
CUDA NVIDIA only N/A (Apple removed NVIDIA)
Vulkan Native Via MoltenVK (slower than native Metal)
OpenCL NVIDIA/AMD/Intel Deprecated since macOS 10.14
OpenGL Legacy Deprecated, frozen at 4.1

ONNX Runtime Execution Provider Selection

Ort::SessionOptions session_options;

#ifdef _WIN32
  // Windows: CUDA > DirectML > CPU
  try {
    OrtCUDAProviderOptions cuda_opts{};
    session_options.AppendExecutionProvider_CUDA(cuda_opts);
  } catch (...) {
    OrtSessionOptionsAppendExecutionProvider_DML(session_options, 0);
  }
#elif defined(__APPLE__)
  // macOS: CoreML > CPU
  uint32_t coreml_flags = COREML_FLAG_ENABLE_ON_SUBGRAPH;
  OrtSessionOptionsAppendExecutionProvider_CoreML(session_options, coreml_flags);
#endif

DirectML: works with any DX12 GPU (NVIDIA, AMD, Intel). One binary for all vendors. In maintenance mode (2026) - stable, no new features. Falls back to shared system memory when VRAM exhausted.

CoreML: uses Metal GPU + Neural Engine on Apple Silicon. Neural Engine is optimal for CNNs, GPU via Metal is faster for transformer inference. Must be explicitly registered at session creation.

Apple Silicon Specifics

Unified Memory Architecture:
  - CPU and GPU share the SAME physical memory
  - No data copying between CPU/GPU - pointer passing
  - No separate VRAM - ML model and app share all RAM
  - Bandwidth: 200-800 GB/s (vs 32 GB/s PCIe 4.0)

Neural Engine:
  - Accessible ONLY through CoreML (no low-level API)
  - Optimal for: quantized models, convolutions, matrix ops
  - NOT optimal for: arbitrary compute, custom operators
  - CNN inference: NE faster; Transformer inference: Metal GPU faster

M1 8GB: ~5-6 GB available for ML (OS + apps take the rest)

Rosetta 2 removal: Apple announced removal in macOS 28 (2027). Native arm64 build is mandatory.

Filesystem Differences

Aspect Windows macOS
Path separator \ (but / works in most APIs) /
Max path 260 chars (32,767 with \\?\ prefix) No hard limit
Case sensitivity No (NTFS default) No (APFS default)
File locking Mandatory (cannot delete open files) Advisory (can delete open files)
App data %APPDATA% ~/Library/Application Support/
Cache %LOCALAPPDATA%\Temp\ ~/Library/Caches/
Config %APPDATA% ~/Library/Preferences/
// Always use std::filesystem::path - handles separators automatically
namespace fs = std::filesystem;
fs::path config = get_app_data_dir() / "MyApp" / "config.json";
// NEVER concatenate strings with "/" or "\\"

MAX_PATH fix: enable longPathAware in application manifest, or use \\?\ prefix for Win32 API. std::filesystem handles this automatically on newer SDKs.

Library: PlatformFolders abstracts standard directories cross-platform.

DLL/dylib Loading

Windows search order: EXE dir > System32 > Windows dir > CWD (dangerous!) > PATH. Fix: SetDllDirectory("") removes CWD from search.

macOS search: @executable_path > @loader_path > @rpath > DYLD_LIBRARY_PATH (disabled in Hardened Runtime!). Use @rpath and set via CMake:

set_target_properties(target PROPERTIES
    INSTALL_RPATH "@executable_path/../Frameworks")

Code Signing and Distribution

Windows (Authenticode + SmartScreen):

  • OV or EV certificate (~$200-600/year)
  • EV no longer gives instant SmartScreen trust (since Aug 2024)
  • SmartScreen builds reputation over time - new publishers always get warnings
  • Azure Trusted Signing available for US/Canada (since Oct 2025)

macOS (codesign + notarization + Gatekeeper):

  • Developer ID via Apple Developer Program ($99/year)
  • Hardened Runtime required for notarization
  • Notarization: upload to Apple for automated scan, receive ticket
  • Without notarization, app will not launch (macOS 10.15+)
  • Hardened Runtime blocks DYLD_INSERT_LIBRARIES injection, disables ASLR override

Build System (CMake)

cmake_minimum_required(VERSION 3.20)
project(MyMLApp LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 20)

if(WIN32)
    add_definitions(-DUNICODE -D_UNICODE)
    set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>")
elseif(APPLE)
    set(CMAKE_OSX_ARCHITECTURES "x86_64;arm64")  # Universal Binary
    set(CMAKE_OSX_DEPLOYMENT_TARGET "13.0")
endif()

find_package(onnxruntime REQUIRED)
target_link_libraries(myapp PRIVATE onnxruntime::onnxruntime)

if(WIN32)
    target_link_libraries(myapp PRIVATE DirectML)
elseif(APPLE)
    target_link_libraries(myapp PRIVATE
        "-framework CoreML" "-framework Metal" "-framework Foundation")
endif()

Universal Binary pitfalls: all dependencies must also be universal. Use lipo to combine single-arch libraries. Check ONNX Runtime provides universal C API build.

Compiler Differences

Aspect MSVC (Windows) Apple Clang (macOS)
C++ ABI Microsoft ABI Itanium ABI
stdlib Microsoft STL libc++ (LLVM)
Warnings /W4 -Wall -Wextra
Sanitizers ASan ASan, UBSan, TSan
Exception handling SEH DWARF/zero-cost
Debug symbols PDB dSYM (DWARF)

Linking: static for C++ deps (onnxruntime, curl), dynamic for system frameworks and GPU runtime (CUDA, DirectML, CoreML).

Unicode Handling

// Internal: UTF-8 everywhere (std::string)
// Windows API boundary: convert to UTF-16
#ifdef _WIN32
std::wstring utf8_to_wide(const std::string& utf8) {
    int len = MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), -1, nullptr, 0);
    std::wstring wide(len - 1, 0);
    MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), -1, &wide[0], len);
    return wide;
}
// Use wide API: CreateFileW, LoadLibraryW
#endif

wchar_t is 2 bytes on Windows (UTF-16), 4 bytes on macOS (UTF-32). Never transfer wstring across platforms via IPC/files.

Library: Boost.Nowide - UTF-8 aware I/O on Windows.

Credential Storage

Aspect Windows macOS
System DPAPI + Credential Manager Keychain Services
ACL per item No - any app of same user can read Yes - prompt on foreign app access
API CredWrite/CredRead SecItemAdd/SecItemCopyMatching

Library: keychain - cross-platform C++ abstraction.

Networking (libcurl)

Use libcurl with native TLS backend per platform:

  • Windows: libcurl + Schannel - uses Windows Certificate Store
  • macOS: libcurl + Secure Transport (or OpenSSL with bundled CA)

This allows corporate proxy SSL inspection to work transparently. If using certificate pinning, add fallback to system store on verification failure.

Auto-Update

Component Windows macOS
Framework WinSparkle (Sparkle port) Sparkle
Update file Appcast XML Appcast XML
Restart Helper process needed (mandatory locking) Can update in place

Crash Reporting

Crashpad (Google, cross-platform) + Sentry (SaaS):

# Windows: upload PDB
sentry-cli debug-files upload --include-sources ./build/Release/*.pdb

# macOS: upload dSYM
sentry-cli debug-files upload --include-sources ./build/Release/*.dSYM

Installers

Platform Recommended Notes
Windows WiX (MSI) or NSIS (EXE) MSI for corporate GPO deployment
macOS DMG with drag-to-Applications pkg inside DMG if system components needed

Cross-Platform Pitfalls

Pitfall Details
Line endings CR+LF (Win) vs LF (Mac). std::getline leaves trailing \r
Sleep precision Sleep(1) = 15.6ms on Windows without timeBeginPeriod(1)
Locale decimal atof("1.5") may parse as 1 in non-C locale. Force setlocale(LC_NUMERIC, "C")
Close window != quit macOS: closing last window does NOT quit the app
Quarantine attrs Downloaded files get quarantine mark: Zone.Identifier (Win) / com.apple.quarantine (Mac)

Testing Matrix (Minimum)

OS Architecture GPU Priority
Windows 10 22H2 x64 NVIDIA (CUDA+DirectML) Primary
Windows 10 22H2 x64 AMD (DirectML) Secondary
Windows 10 22H2 x64 Intel iGPU (DirectML) Edge case
Windows 11 24H2 x64 NVIDIA Primary
macOS 14+ arm64 (M1-M4) CoreML/Metal Primary Mac
macOS 14+ x86_64 (Intel) CPU Legacy

Critical test cases: first launch (SmartScreen/Gatekeeper), GPU OOM graceful fallback, unicode in user paths (Cyrillic), sleep/wake GPU recovery, multiple GPU selection (dGPU vs iGPU on Windows).

Gotchas

  • Mandatory vs advisory file locking: on Windows, you cannot update a DLL/model while it is loaded. Need: close file, update, reopen, or staged update with rename on restart. macOS allows replacement but process keeps old inode until close
  • DirectML is in maintenance mode (2026): stable API, no new features. Production-safe but no performance improvements coming
  • Apple Silicon apps must be arm64 native: Rosetta 2 removal in macOS 28 (2027) means x86_64-only binaries stop working. Build Universal Binary now
  • CoreML conversion can lose precision: not all ONNX ops supported in CoreML. Some models need the ONNX Runtime CoreML EP as fallback instead of direct CoreML conversion
  • std::string SSO buffer size differs: MSVC STL and libc++ have different small string optimization implementations. Never assume binary layout of std::string across platforms

See Also