A Quick List of LLM Benchmarks
A quick dump of the benchmarks that I look at and use personally; I've dropped a few that no longer appear to be kept up to date, and grabbed a few newer ones.
Code Specific
Coding Agent
General Ability
- https://lmarena.ai/leaderboard
- https://dubesor.de/benchtable
- https://huggingface.co/spaces/TIGER-Lab/MMLU-Pro