Skip to content

HKDF + ChaCha20 Personalized Neural Network Weights

Intermediate

Date: 2026-04-03 Context: Desktop C++ app, ONNX Runtime inference. Per-user unique weights via HKDF-derived seed + ChaCha20 stream. Functionally equivalent models - permutation symmetry ensures identical output.


Core Concept

Server holds master_secret. Per-user HKDF derivation → seeds for 3 operations: - Permutation - neuron index shuffling (lossless, mathematically equivalent) - Scale - multiplicative weight perturbation (~±3%) - Offset - additive weight perturbation (~±0.5% of weight magnitude)

Each user's model weights are unique while producing identical outputs (within floating point tolerance).


HKDF + ChaCha20 Implementation

Why This Combination

HKDF alone: limited to 255 × HashLen = 8160 bytes for SHA-256. Insufficient for 100M+ parameter models.

Solution: HKDF generates 32-byte seeds → ChaCha20 expands into arbitrary-length stream.

master_secret (32 bytes, on server)
HKDF-Extract(salt=epoch_bytes, ikm=master_secret) → PRK
HKDF-Expand(PRK, info="user:{account_id}:perm")   → seed_perm  (32 bytes)
HKDF-Expand(PRK, info="user:{account_id}:scale")  → seed_scale (32 bytes)
HKDF-Expand(PRK, info="user:{account_id}:offset") → seed_offset (32 bytes)
ChaCha20(key=seed_perm,   nonce=0) → stream for permutations
ChaCha20(key=seed_scale,  nonce=0) → stream for scale factors
ChaCha20(key=seed_offset, nonce=0) → stream for offsets

Library Choice

Library HKDF Speed API
OpenSSL 3.x EVP_KDF Best (HW SHA-256) Verbose boilerplate
libsodium 1.0.19+ crypto_kdf_hkdf_sha256_* ~20% slower Simple, misuse-resistant
Standalone (RFC 5869) Custom Depends on HMAC Full control

Recommendation: libsodium. Already needed for other crypto ops. ChaCha20 at ~1705 MiB/s without AES-NI (important for weak GPU machines).

ChaCha12 vs ChaCha20: 12 rounds ~1.6× faster. Sufficient for this threat model (PRNG, not encryption). Estimated total personalization time for 100M-parameter model: <500ms on slow CPU.

Code (libsodium)

#include <sodium.h>
#include <vector>
#include <cstring>

struct PersonalizationSeeds {
    uint8_t perm[32];
    uint8_t scale[32];
    uint8_t offset[32];
};

// Step 1: Derive per-user seeds from master secret
PersonalizationSeeds derive_seeds(
    const uint8_t master_secret[32],
    const char* account_id,
    uint32_t epoch)
{
    PersonalizationSeeds seeds;
    uint8_t salt[4] = {
        (uint8_t)(epoch >> 24), (uint8_t)(epoch >> 16),
        (uint8_t)(epoch >> 8),  (uint8_t)(epoch)
    };

    uint8_t prk[crypto_kdf_hkdf_sha256_KEYBYTES];
    crypto_kdf_hkdf_sha256_extract(prk, salt, sizeof(salt),
                                    master_secret, 32);

    char info_perm[128], info_scale[128], info_offset[128];
    snprintf(info_perm,   sizeof(info_perm),   "user:%s:perm",   account_id);
    snprintf(info_scale,  sizeof(info_scale),  "user:%s:scale",  account_id);
    snprintf(info_offset, sizeof(info_offset), "user:%s:offset", account_id);

    crypto_kdf_hkdf_sha256_expand(seeds.perm,   32, info_perm,   strlen(info_perm),   prk);
    crypto_kdf_hkdf_sha256_expand(seeds.scale,  32, info_scale,  strlen(info_scale),  prk);
    crypto_kdf_hkdf_sha256_expand(seeds.offset, 32, info_offset, strlen(info_offset), prk);
    return seeds;
}

// Step 2: Generate float stream from ChaCha20 seed
std::vector<float> generate_float_stream(
    const uint8_t seed[32], size_t count, float min_val, float max_val)
{
    std::vector<float> result(count);
    uint8_t nonce[crypto_stream_chacha20_NONCEBYTES] = {0};
    size_t byte_count = count * sizeof(uint32_t);
    std::vector<uint8_t> stream(byte_count);
    crypto_stream_chacha20(stream.data(), byte_count, nonce, seed);

    float range = max_val - min_val;
    for (size_t i = 0; i < count; i++) {
        uint32_t raw;
        memcpy(&raw, &stream[i * 4], 4);
        float u = (float)(raw >> 8) / (float)(1 << 24); // uniform [0, 1)
        result[i] = min_val + u * range;
    }
    return result;
}

// Step 3: Generate permutation (Fisher-Yates with deterministic PRNG)
std::vector<uint32_t> generate_permutation(const uint8_t seed[32], uint32_t n)
{
    std::vector<uint32_t> perm(n);
    for (uint32_t i = 0; i < n; i++) perm[i] = i;

    uint8_t nonce[crypto_stream_chacha20_NONCEBYTES] = {0};
    std::vector<uint8_t> stream(n * sizeof(uint32_t));
    crypto_stream_chacha20(stream.data(), n * sizeof(uint32_t), nonce, seed);

    for (uint32_t i = n - 1; i > 0; i--) {
        uint32_t raw;
        memcpy(&raw, &stream[i * 4], 4);
        uint32_t j = raw % (i + 1); // negligible bias for n < 2^24
        std::swap(perm[i], perm[j]);
    }
    return perm;
}

OpenSSL 3.x Alternative

EVP_KDF *kdf = EVP_KDF_fetch(NULL, "HKDF", NULL);
EVP_KDF_CTX *kctx = EVP_KDF_CTX_new(kdf);
OSSL_PARAM params[5], *p = params;
*p++ = OSSL_PARAM_construct_utf8_string(OSSL_KDF_PARAM_DIGEST,
       SN_sha256, strlen(SN_sha256));
*p++ = OSSL_PARAM_construct_octet_string(OSSL_KDF_PARAM_KEY,
       master_secret, 32);
*p++ = OSSL_PARAM_construct_octet_string(OSSL_KDF_PARAM_INFO,
       info, info_len);
*p++ = OSSL_PARAM_construct_octet_string(OSSL_KDF_PARAM_SALT,
       salt, salt_len);
*p = OSSL_PARAM_construct_end();
EVP_KDF_derive(kctx, out, 32, params);
EVP_KDF_CTX_free(kctx);
EVP_KDF_free(kdf);

More boilerplate, heavier dependency, no convenient ChaCha20 stream API.


Permutation Symmetry: Which Layers Can Be Permuted

Theory

Permutation symmetry (arxiv:2506.13018, arxiv:2502.17391): For a FC layer with n neurons, n! equivalent parameterizations exist giving identical output. Permuting neuron i→j requires: 1. Permute row i→j in output weight matrix of this layer 2. Permute column i→j in input weight matrix of next layer 3. Permute corresponding biases

Papers confirming quality preservation: - NeuPerm (arxiv:2510.20367): permuting hidden layers of LLM = <0.01% accuracy drop - Git Re-Basin (arxiv:2209.04836): exploits same symmetry for model merging

Layer-by-Layer Rules

Layer Type Safe to Permute? What to Permute Constraints
Fully Connected YES W_out rows + W_in columns + bias Classic permutation symmetry
Conv2d (hidden) YES Output channels = "neurons". Permute output ch of current + input ch of next + bias Same as FC
Grouped Conv PARTIAL Only within each group Groups cannot be swapped
Depthwise Conv NO 1 filter = 1 channel, permutation meaningless
Attention Q/K/V PARTIAL Permute attention heads as whole units Cannot permute neurons within a head without consistent Q,K,V permutation

Layers That CANNOT Be Permuted Independently

Layer Why Consequence
BatchNorm Stores per-channel running_mean/var Must permute BN params together with preceding conv
LayerNorm gamma/beta tied to feature positions Same - permute with preceding layer
Skip connections (residual) Input and output must share order Permutation in block must be cancelled at output (P_out = P_in^(-1))
Input layer Channels tied to RGB Never permute input channels
Output layer Channels tied to task (RGB output) Never permute output channels

U-Net Specifics

U-Net with skip connections is the most complex case:

Encoder:              Decoder:
conv1 → pool ----skip---→ upconv4 + concat
conv2 → pool ----skip---→ upconv3 + concat
conv4 (bottleneck)  ---→ upconv1

Rules: 1. Bottleneck (conv4): SAFE. No skip connection; channels are self-contained. 2. Encoder conv within block: SAFE if permutation propagates through block and cancels before skip. - Permute output channels of conv1 → permute BN1 → permute input channels of conv2 - Do NOT touch output channels of conv2 (they go into skip) 3. Skip connection channels: Must match decoder input after concat. Permuting encoder output requires matching decoder permutation. 4. Attention heads (Stable Diffusion U-Net): Permute entire heads (8 heads = 8! = 40320 variants - too few for security).

Practical Strategy for Retouching Model

Permute only: bottleneck + internal conv layers within residual blocks.

Security gains: - Bottleneck with 512 channels: 512! ~ 10^1166 permutations - Each conv-block with 256 channels: 256! ~ 10^507 - Combined: astronomically large search space

Do NOT touch: input/output layers, skip connection boundaries.


Scale/Offset Injection

Safe Perturbation Ranges

Layer Type Scale Range Offset Range Notes
Conv2d (hidden) - conservative 0.97 - 1.03 ±0.005 Recommended
Conv2d (hidden) - standard 0.95 - 1.05 ±0.02 Risk of visible degradation
Linear (hidden) 0.95 - 1.05 ±0.01 FC weights usually larger, less sensitive
BatchNorm gamma/beta 0.98 - 1.02 ±0.01 BN sensitive - small perturbations only
Attention Q/K/V 0.97 - 1.03 ±0.005 Dot product amplifies errors quadratically
First/last layer DO NOT TOUCH DO NOT TOUCH Direct impact on input/output mapping

Why Offset is More Dangerous Than Scale

Scale 0.97-1.03 = ±3% deviation. For typical conv weight ~0.05 this is ±0.0015 (small).

Offset ±0.02 for same weight = ±40% of value (catastrophic for small weights).

Better approach: relative offset proportional to weight magnitude:

// w' = w * scale + offset * |w|
// offset here is a coefficient, not absolute value
float perturbed = weight * scale + offset_coeff * fabsf(weight);

This keeps perturbations proportional to weight magnitude.

Functional Equivalence Verification

After personalization, verify output matches reference:

// Run original and personalized model on same input
// Maximum acceptable per-pixel difference: ~0.001 (float32)
// PSNR should remain > 50 dB
float max_diff = 0.0f;
for (size_t i = 0; i < output_size; i++)
    max_diff = std::max(max_diff, fabsf(orig[i] - personalized[i]));
assert(max_diff < 1e-3f); // permutation must be lossless

Permutation should be exactly lossless (max_diff ≈ FP32 epsilon). Scale/offset will introduce small perturbations.


Gotchas

  • HKDF-Expand info strings must be unique per purpose. Using the same info for perm and scale seeds produces the same key - completely breaking the purpose. Always use distinct labels.
  • Fisher-Yates modulo bias: raw % (i+1) is slightly biased when i+1 is not a power of 2. For permutations of n < 2^24 this bias is negligible (< 10^-7). For cryptographic quality shuffle, use rejection sampling.
  • ChaCha20 nonce=0 is safe here because each seed is a unique 256-bit key (derived from different HKDF info strings). Different key = different stream regardless of nonce.
  • Permuting BatchNorm separately from conv corrupts the model. Always co-permute: {conv.weight, conv.bias, bn.weight, bn.bias, bn.running_mean, bn.running_var}.
  • Skip connections in U-Net require paired permutations. Permuting only the encoder side without matching the decoder input will produce wrong outputs at skip-concat points.
  • Attention head permutation gives only n_heads! variants (e.g., 8! = 40320 for 8 heads). Not enough for security. Permute neurons within heads in a consistent Q,K,V manner instead.
  • Scale/offset perturbations accumulate across layers. In deep networks (50+ layers), even 0.3% perturbation per layer compounds. Always test final output PSNR, not just per-layer metrics.
  • Master secret leakage = all users compromised. Store master_secret in HSM or secure enclave on server. Never ship it to clients.