This visualization shows benchmark results from Aider's LLM leaderboards, comparing different Large Language Models' performance at code editing tasks against their cost per run.
The scatter plot visualizes two key metrics:
Each point represents a different LLM, with models like Claude-3-Sonnet, GPT-4, and various open source models compared. Some key observations:
The interactive visualization allows you to:
Built with React and D3.js, this visualization automatically scales to fit the browser window and uses force-directed label placement to prevent overlapping text labels.
The cost axis uses a symlog (symmetric log) scale to better display the wide range of costs while handling values near zero. The percentage axis uses a linear scale from 0-100%.