Tag: evaluation

  • OpenAI|Introducing SWE-bench Verified

    OpenAI has released SWE-bench Verified, a human-validated subset of SWE-bench, designed to more accurately assess AI models’ ability to solve real-world software problems. SWE-bench Verified addresses issues in the original benchmark, such as overly specific tests and ambiguous problem statements, improving the reliability of AI evaluations in software engineering tasks. Source: OpenAI (August 13, 2024)

  • FAW Group|【Hongqi Carnival】Understanding FAW’s 2023 Sustainability Report in One Picture

    FAW Group July 19, 2024 21:16 FAW Group has released its 2023 Sustainability Report. The report shows total assets of 670.74 billion yuan, operating income of 633.49 billion yuan, and total industrial output value of 575.51 billion yuan, with year-on-year growth rates of 12.47%, 7.41%, and 7.43%, respectively. Additionally, the company ranked 131st in the…