Try CUGA in Hugging Face, the #1 Generalist Agent in the AppWorld Leaderboard
Posted by jlaredo 11 hours ago
Comments
Comment by verdverm 9 hours ago
Hey, we're #1 on a benchmark that none of the major players use or are even entered with by 3rd parties.... looks like only 2 submissions since the paper a year ago.
No Gemini, No Claude, just old models
Does IBM wonder why no one takes them seriously in the Ai space? I would be embarrassed to have this "#1" so prominently on display