Tag: Kaggle
-
OpenAI | MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
OpenAI has introduced MLE-bench, a benchmark designed to evaluate how well AI agents perform machine learning engineering tasks. It draws from 75 Kaggle competitions to test real-world ML engineering skills. Among the tested models, OpenAI’s “o1-preview with AIDE scaffolding” achieved bronze-level performance in 16.9% of competitions. Source: Here