Tag: SWE-bench
-
OpenAI|Introducing SWE-bench Verified
OpenAI has released SWE-bench Verified, a human-validated subset of SWE-bench, designed to more accurately assess AI models’ ability to solve real-world software problems. SWE-bench Verified addresses issues in the original benchmark, such as overly specific tests and ambiguous problem statements, improving the reliability of AI evaluations in software engineering tasks. Source: OpenAI (August 13, 2024)