MLX-LM Apple Silicon: FT + Serve on M-Series Mac + Distributed MLX

Apple MLX (2023+) — unified memory ML framework for Apple Silicon. MLX-LM for Llama/Qwen/Gemma FT + inference. 70B inference on M3 Max 128GB, 8B FT on M2 Pro 32GB. Cookbook supplement for Mac users.

Şükrü Yusuf KAYA

22 min read

5/14/2026

Intermediate

MLX-LM Apple Silicon: M-Series Mac'te FT + Serve + Distributed MLX

bash

# === MLX-LM Llama 3.1 8B M-series Mac ===
pip install mlx-lm
 
# 1. Convert HF → MLX
mlx_lm.convert \
    --hf-path meta-llama/Meta-Llama-3.1-8B-Instruct \
    --mlx-path llama-3.1-8b-mlx \
    --quantize true                   # 4-bit MLX quant
 
# 2. Inference
mlx_lm.generate \
    --model llama-3.1-8b-mlx \
    --prompt "İstanbul nüfusu?" \
    --max-tokens 200
 
# 3. Fine-tune (LoRA)
mlx_lm.lora \
    --model meta-llama/Meta-Llama-3.1-8B-Instruct \
    --train \
    --data /path/to/tr_alpaca \
    --num-layers 16 \
    --batch-size 2 \
    --lr 1e-4
 
# Inference performance (M-series):
# - M2 Pro 32GB:  Llama 8B Q4 → 28 tok/s
# - M3 Max 128GB: Llama 70B Q4 → 12 tok/s, Llama 8B Q4 → 65 tok/s
# - M3 Ultra 256GB: Llama 405B Q4 → 4 tok/s !

MLX-LM convert + inference + fine-tune

✅ Teslim

Eğer Apple Silicon kullanıyorsan MLX-LM ile Llama 8B inference test. 2) Sonraki ders: 15.8 — Speculative Decoding Production.

Yorumlar & Soru-Cevap

(0)

Yorum yazmak için giriş yap.

Yorumlar yükleniyor...

MLX-LM Apple Silicon: FT + Serve on M-Series Mac + Distributed MLX

Yorumlar & Soru-Cevap

Related Content

Welcome to the Fine-Tuning Cookbook: System, Stage Taxonomy, and the Reproducibility Contract

Reproducibility Stack: Seeds, cuDNN Flags, and Deterministic CUDA — End the 'Works on My Machine' Problem

Environment Pinning: uv + pyproject.toml, CUDA Version Matrix, and Container Recipes

Subscribe to Newsletter