GPT‑Rosalind for life sciences research
Posted by babelfish 20 hours ago
Comments
Comment by Cynddl 17 hours ago
I went back to the BixBench benchmark which they mentioned. I couldn't find official results for Anthropic models, but I found a project taking Opus 4.6 from 65.3% to 92.0% (which would be above GPT-Rosalind) with nearly 200 carefully crafted skills [1]. There also appears to be competitive competitor models with scores on par with this tuned GPT.
Comment by jadusm 16 hours ago
Comment by an0malous 16 hours ago
Sam Altman, August 2025
Comment by falcor84 14 hours ago
For me too, it was around that time last year, with GPT-5, Claude Sonnet 4.5 and then Gemini 3 that I started feeling that these models are clearly becoming great at reasoning. I'm not at all opposed to saying that they are around PhD-level on at least some domains.
Comment by kmaitreys 11 hours ago
Comment by falcor84 3 hours ago
Comment by kmaitreys 1 hour ago
Also what benchmark? How will you you design it?
Comment by 0123456789ABCDE 11 hours ago
Comment by furyofantares 17 hours ago
Comment by jszymborski 10 hours ago
Comment by peyton 17 hours ago
It’s kind of gross to make money off her name (if that’s what’s happening) posthumously. It’s a complicated story anyway. IIRC her sister referred to it as “the Cult of Rosalind” when people were cashing in on books about her.
Comment by bombcar 17 hours ago
Comment by Sanzig 16 hours ago
Comment by bombcar 14 hours ago
Comment by ben_w 6 hours ago
Comment by bombcar 3 hours ago
Any name you pick will immediately override anything that comes before - naming a model Socrates would confuse searches, for example (and it's why I hate the rename of iTunes to "Music" which is a generic term!).
Comment by huslage 13 hours ago
Comment by oofbey 10 hours ago
Comment by ben_w 6 hours ago
Earlier this year I tried to do this for a much simpler target than bioscience, a Farnsworth fusor, and even though I started off with ~"which open source physics libraries do you recommend we use for this?" and it giving me a list, instead of actually bothering to use any of those libraries that it suggested, it decided to roll its own simulation code, and the code it wrote very obviously didn't work.
It may *assist* with coding, but I don't think it could code for them yet.
Comment by modeless 16 hours ago
Comment by shwn2989 11 hours ago
Comment by tonfreed 15 hours ago
Comment by falcor84 15 hours ago
I'm absolutely ok with a legitimate lab scientist conducting biochemical research getting suggestions about substances that are generally considered dangerous but might be appropriate for their study, and it'll be up to the scientist to discern whether it is indeed appropriate to use.
Comment by spwa4 6 hours ago
Why? AI's reputation would be greatly improved by saving a few 10s of millions of lives (per year, I might add). And either of those advances would do just that.
Oh, and another reason. Do either of these things and you'll have very rich businesses screaming to become your customer coming out of every hole. Guaranteed.
Comment by 34pasKj 18 hours ago
Comment by mrcwinn 18 hours ago
Comment by jostmey 15 hours ago
Comment by Gethsemane 2 hours ago
At the moment, it feels like releases like this overcommit and overpromise on "PhD level reasoning", which I wouldn't say is the absolute bottleneck in clinical research.
Comment by XenophileJKO 13 hours ago