DevQualityEval v1.0https://github.com/symflower/eval-dev-qualityPerformance vs. Score (> 50%)0.050.0100.0150.0200.0250.0300.0Processing time (seconds)50%60%70%80%90%Percentage of total possible score01.AI: Yi LargeAI21: Jamba 1.5 MiniAI21: Jamba-InstructAionLabs: Aion-1.0AionLabs: Aion-1.0-MiniAmazon: Nova Lite 1.0Amazon: Nova Pro 1.0Anthropic: Claude 3 HaikuAnthropic: Claude 3 OpusAnthropic: Claude 3 SonnetAnthropic: Claude 3.5 Haiku (2024-10-22)Anthropic: Claude 3.5 Sonnet (2024-10-22)Anthropic: Claude 3.5 Sonnet (2024-06-20)Anthropic: Claude 3.7 Sonnet (2025-02-19)Anthropic: Claude 3.7 Sonnet (Thinking)Databricks: DBRX 132B (Instruct)DeepSeek: DeepSeek V3DeepSeek: DeepSeek V2.5DeepSeek: DeepSeek R1DeepSeek: DeepSeek R1 Distill Qwen 32BGoogle: Gemini Flash 2.0Google: Gemini 2.0 Flash LiteGoogle: Gemini Flash 1.5Google: Gemma 2 9BMeta: Llama 3 70B (Instruct)Meta: Llama 3.1 405B (Instruct)Meta: Llama 3.1 70B (Instruct)Meta: Llama 3.3 70B (Instruct)Microsoft: Phi 4Microsoft: WizardLM-2 8x22BMiniMax: MiniMax-01Mistral: Codestral (2501)Mistral: Ministral 3BMistral: Ministral 8BMistral: Mistral Large 2 (2407)Mistral: Mistral Large 2 (2411)Mistral: Mistral MediumMistral: Mistral Small (v24.02)Mistral: Mistral Small 3Mistral: Mixtral 8x22B (Instruct) (v0.1)Mistral: Pixtral 12B (v2409)Mistral: Pixtral Large (2411)NousResearch: Hermes 3 405B (Instruct)NousResearch: Hermes 3 70B (Instruct)NVIDIA: Llama 3.1 Nemotron 70B (Instruct)OpenAI: GPT-4o (2024-11-20)OpenAI: GPT-4o-mini (2024-07-18)OpenAI: o1-mini (2024-09-12)OpenAI: o1-preview (2024-09-12)OpenAI: o3-mini (2025-01-31) (reasoning_effort=high)OpenAI: o3-mini (2025-01-31) (reasoning_effort=low)OpenAI: o3-mini (2025-01-31) (reasoning_effort=medium)Perplexity: Llama 3 Sonar 70B (Online)Perplexity: Llama 3.1 Sonar 70BQwen: Qwen 2 72B (Instruct)Qwen: Qwen2.5 72B (Instruct)Qwen: Qwen2.5 7B (Instruct)Qwen: Qwen2.5 Coder 32B (Instruct)Qwen: Qwen-MaxQwen: Qwen-PlusQwen: Qwen-Turbo (2024-11-01)Qwen: Qwen2.5 32B InstructQwen: QwQ 32BxAI: Grok-2 (1212)