Skip to content

FFmpeg Video and Audio Encoding

Intermediate

CLI media encoder. Covers codec selection, NVENC hardware acceleration, audio encoding, tempo change with formant preservation, and concat fast paths. FFmpeg 7.x, NVENC Turing+.

Key Facts

  • -vsync deprecated in FFmpeg 6+; use -fps_mode cfr or -fps_mode vfr instead
  • libfaac removed; use native aac or package management-installed libfdk_aac
  • -rc constqp -qp X still works but -rc vbr -cq X -b:v 0 is the recommended NVENC pattern
  • -preset slow for NVENC is gone; use numeric -preset p5/p6/p7 (p7 = slowest/best)
  • -tag:v hvc1 required for HEVC in MP4 when targeting Apple QuickTime/Safari; default hev1 fails
  • NVENC Turing+ (RTX 2000+) required for spatial-aq/temporal-aq; Ada (RTX 4000+) for av1_nvenc
  • Rubber Band R3 engine requires standalone CLI from breakfastquay.com; ffmpeg's built-in af_rubberband uses the older R2 engine
  • yuv420p mandatory for hardware decoder compatibility (phones, TVs, embedded); yuv420p10le archival only
  • -movflags +faststart moves moov atom to file start — required for HTTP streaming/seeking before full download

Codec Selection

Codec Encoder Speed (1080p, RTX 4090) Quality/bit Compat 2026 Use when
H.264 libx264 -preset slow ~0.5x realtime Best Universal archival master, small batch
H.264 h264_nvenc -preset p6 ~10x realtime Good (with AQ) Universal bulk batch, distribution
HEVC libx265 -preset slow ~0.3x realtime Excellent Wide (not WMP without codec) archival master, 30-50% smaller than H.264
HEVC hevc_nvenc -preset p6 ~8x realtime V.Good (with tune hq) Wide bulk HEVC encode
AV1 libsvtav1 -preset 6 ~0.5x realtime Best CPU AV1 Limited (new devices) future-proof archival
AV1 av1_nvenc -preset p6 ~8x realtime Good Limited GPU speed when storage critical

CRF/CQ equivalence (low-motion content: slides, screen recording, talking head):

Target libx264 CRF libx265 CRF SVT-AV1 CRF h264_nvenc CQ hevc_nvenc CQ
Visually lossless 18 16 20 18 18
Distribution sweet spot 22-23 22 28-32 24-25 26
Aggressive (watch for text artifacts) 28+ 26+ 36+ 28+ 30+

Low-motion content (lectures, screencasts, slides) tolerates CRF 2-3 higher than general video — static regions benefit more from AQ than extra bitrate.

Scaler choice:

Algorithm Quality Speed Use when
lanczos Best detail, slight ringing on text edges Slow Photographic content, talking head
bicubic Good, less ringing Fast Screen recording, text-heavy slides
bilinear Worst Fastest Never for archival

Command Recipes

Stream copy (container repack, no quality loss)

# MKV → MP4 repack, no re-encode
ffmpeg -i input.mkv -c copy -movflags +faststart output.mp4

libx264 archival master

ffmpeg -i input.mp4 \
  -c:v libx264 -preset slow -crf 20 \
  -profile:v high -level 4.1 -pix_fmt yuv420p \
  -c:a aac -b:a 128k -ac 1 -ar 48000 \
  -movflags +faststart \
  output.mp4

h264_nvenc bulk batch (Turing+)

ffmpeg -i input.mp4 \
  -c:v h264_nvenc -preset p6 -tune hq \
  -rc vbr -cq 24 -b:v 0 -maxrate 6M -bufsize 12M \
  -bf 2 -spatial-aq 1 -temporal-aq 1 -rc-lookahead 32 \
  -profile:v high -pix_fmt yuv420p \
  -c:a aac -b:a 128k -ac 1 -ar 48000 \
  -movflags +faststart \
  output.mp4

-bf 2: B-frames for better compression on low-motion content (NVENC may default to 0).
-rc-lookahead 32: 32-frame buffer for better rate control; higher values give diminishing returns.
-multipass fullres: adds ~+10-15% quality at 2x encode time — worth it for archival, skip for bulk.

HEVC archival (libx265, 10-bit for slide/text content)

ffmpeg -i input.mp4 \
  -c:v libx265 -preset slow -crf 22 \
  -pix_fmt yuv420p10le -tag:v hvc1 \
  -c:a aac -b:a 128k -ac 1 -ar 48000 \
  -movflags +faststart \
  output.mp4

10-bit reduces posterization and banding on gradient slides; hvc1 tag required for Apple compatibility.

hevc_nvenc bulk

ffmpeg -i input.mp4 \
  -c:v hevc_nvenc -preset p6 -tune hq \
  -rc vbr -cq 26 -b:v 0 \
  -spatial-aq 1 -temporal-aq 1 -rc-lookahead 32 \
  -pix_fmt yuv420p \
  -c:a aac -b:a 128k -ac 1 -ar 48000 \
  -movflags +faststart \
  output.mp4
ffmpeg -i input.mp4 \
  -c:v libsvtav1 -preset 6 -crf 28 \
  -svtav1-params tune=0:lookahead=120 \
  -c:a aac -b:a 128k -ac 1 -ar 48000 \
  -movflags +faststart \
  output.mp4

tune=0: VOD quality mode. lookahead=120: large lookahead for quality; reduce to 32 for speed.

av1_nvenc (Ada, RTX 4000+)

ffmpeg -i input.mp4 \
  -c:v av1_nvenc -preset p6 -tune hq \
  -rc vbr -cq 30 -b:v 0 \
  -spatial-aq 1 -temporal-aq 1 \
  -c:a aac -b:a 128k -ac 1 -ar 48000 \
  -movflags +faststart \
  output.mp4

Resolution downscale (4K → 1080p)

# Photographic / talking head — lanczos
ffmpeg -i input.mp4 -vf "scale=1920:-2:flags=lanczos" -c:v h264_nvenc ... output.mp4

# Screen recording / text slides — bicubic (less ringing)
ffmpeg -i input.mp4 -vf "scale=1920:-2:flags=bicubic" -c:v h264_nvenc ... output.mp4

-2: computes height automatically while keeping aspect ratio; ensures even number required by H.264.

Audio: AAC for voice

# Native AAC (universal builds) — adequate at 128k
-c:a aac -b:a 128k -ac 1 -ar 48000

# libfdk_aac (Fraunhofer, better quality especially <128k; requires non-free build)
-c:a libfdk_aac -b:a 96k -ac 1 -ar 48000 -afterburner 1

# Verify libfdk_aac availability
ffmpeg -encoders | grep fdk

AAC bitrate guide for mono voice: 64k = acceptable streaming, 96k = distribution minimum, 128k = archival sweet spot, 160k+ = diminishing returns.
HE-AAC (-profile:a aac_he) better below 64k but has weaker player support; use LC-AAC for compatibility.

Speech tempo change with formant preservation (Rubber Band R3)

Three-pass pipeline — extract audio, R3 stretch, re-mux with sped-up video:

# Pass 1: extract audio as lossless WAV
ffmpeg -y -i src.mp4 -vn -c:a pcm_s16le -ar 44100 -ac 1 audio.wav

# Pass 2: R3 time-stretch with formant preservation (-F mandatory for 3x speech)
rubberband-r3 -T 3 -F -q audio.wav audio_3x.wav
# -T 3   : 3x tempo (duration = original / 3)
# -F     : formant preserve — prevents gender-shift on extreme ratios
# -q     : quiet mode for batch

# Pass 3a: speed video only (no audio, use NVENC + AQ)
ffmpeg -y -i src.mp4 -an \
  -filter:v "setpts=PTS/3.0" \
  -c:v h264_nvenc -preset p6 -tune hq -cq 24 \
  -bf 2 -spatial-aq 1 -temporal-aq 1 \
  -movflags +faststart video_3x.mp4

# Pass 3b: mux speed-up video + R3 audio
ffmpeg -y -i video_3x.mp4 -i audio_3x.wav \
  -map 0:v:0 -map 1:a:0 -c:v copy \
  -c:a aac -b:a 128k -ac 1 \
  -shortest \
  -movflags +faststart final_3x.mp4

-shortest aligns stream durations — required when video and audio lengths differ by milliseconds after stretch.

Concat: fast path (no re-encode)

All clips must share identical codec, resolution, frame rate, pixel format, and audio parameters:

# Build list file
printf "file '%s'\n" part1.mp4 part2.mp4 part3.mp4 > list.txt

# Demuxer concat — ~37x faster than re-encode
ffmpeg -f concat -safe 0 -i list.txt -c copy output.mp4

# With regenerated timestamps (for speed-up outputs or AVI/MKV rips with corrupt PTS)
ffmpeg -fflags +genpts -f concat -safe 0 -i list.txt -c copy output.mp4

Concat: filter path (re-encode, mixed params)

ffmpeg -i part1.mp4 -i part2.mp4 \
  -filter_complex "[0:v][0:a][1:v][1:a]concat=n=2:v=1:a=1[v][a]" \
  -map "[v]" -map "[a]" \
  -c:v h264_nvenc -preset p6 -tune hq -cq 24 \
  -c:a aac -b:a 128k -ac 1 \
  -movflags +faststart output.mp4

Use when clips differ in any parameter; slower but handles any input combination.

NVENC Tuning Reference

Flag Values Effect
-preset p4 (skip), p5 bulk, p6 quality, p7 archival Speed/quality tradeoff; p7 still faster than libx264 medium
-tune hq (use always for quality), ll/ull (latency only) hq mandatory — default gives noticeably lower quality
-spatial-aq 1 0/1 More bits on complex spatial regions, fewer on flat — critical for slides
-temporal-aq 1 0/1 More bits on static frames (face), fewer on motion — critical for talking head
-rc-lookahead 32 (recommended), 64 (diminishing) Frame buffer for rate control decisions
-bf 2-3 B-frames; NVENC may default to 0, set explicitly for lecture/low-motion
-multipass fullres (flag) Two-pass full-res: +10-15% quality, 2x time; NVENC SDK 12+, archival only

Gotchas

  • Issue: af_rubberband=tempo=3.0 (ffmpeg built-in) produces metallic/robotic artifacts at 3x speech. Fix: Use Rubber Band R3 standalone CLI (rubberband-r3 -T 3 -F). Built-in uses R2 engine; even with formant=preserved R2 loses to R3 on speech at high ratios.

  • Issue: -movflags +faststart omitted — video buffers entire file before play begins; seeking fails on HTTP streams. Fix: Always append -movflags +faststart for any distribution MP4. No downside for local files.

  • Issue: NVENC quality disappoints without -tune hq; default tune is equivalent to unoptimized mode. Fix: Always specify -tune hq for quality target; omitting it leaves significant quality headroom unused with no speed benefit.

  • Issue: hevc_nvenc output refuses to play in QuickTime/Safari (hev1 tag not recognized). Fix: Add -tag:v hvc1 to force correct tag; applies to any HEVC in MP4 targeting Apple.

  • Issue: ffmpeg -f concat -c copy produces audio/video sync drift when source files have non-zero start PTS (common in speed-up outputs or MKV rips). Fix: Add -fflags +genpts to regenerate timestamps: ffmpeg -fflags +genpts -f concat -safe 0 -i list.txt -c copy out.mp4.

  • Issue: CQ too low (aggressive) on NVENC produces visible blocking/ringing on text and slide gradients. Fix: For 1080p lecture/slide content use CQ 24-26 for H.264, CQ 26-28 for HEVC. CQ 18 or below = near-lossless but filesize grows exponentially; unnecessary for screen content.

  • Issue: libsvtav1 CRF scale is not the same as x264/x265; direct CRF transplant produces vastly different results. Fix: SVT-AV1 CRF 28-32 ≈ x265 CRF 22 ≈ x264 CRF 16-18 for equivalent perceptual quality on typical content.

  • Issue: atempo=3.0 in an ffmpeg audio filter chain produces intelligible but low-quality speech at 3x; phase-vocoder smearing is audible. Fix: Do not use atempo above 2x for speech; use Rubber Band R3 standalone pipeline instead.

See Also