# Hybrid SSM Models: Falcon-Mamba + Zamba2 — Long Context Without KV-Cache

> Source: https://sukruyusufkaya.com/en/learn/fine-tuning-cookbook/ftc-hybrid-ssm-falcon-mamba-zamba
> Updated: 2026-05-14T14:42:53.035Z
> Category: Fine-Tuning Cookbook (Model-by-Model)
> Module: Part IV — Mid-Large Models (13B-70B+) + Distributed Internals
**TLDR:** State Space Model (SSM, Mamba) — alternative architecture to Transformer. No KV-cache, inference O(N) (Transformer O(N²)). Falcon-Mamba 7B, Zamba2 (Mamba + transformer hybrid). FT pattern differs from Transformer: state reset, gradient flow, learning rate sensitivity. RTX 4090 recipe.

