topiccore

LLM-as-Judge

Use a stronger model to score outputs — 10× faster than manual review, 80% accurate.

3 hours1 resources1 prereqs

Score the output 1-5 against criteria like "accuracy", "tone", "instruction following". Pairwise comparison ("A or B?") is more reliable.

Pitfalls:

→ Randomize positions, write criteria clearly, use multiple judges.

Prerequisites