-
Training a scientific reasoning model for chemistry
-
ChemLit-QA: a human evaluated dataset for chemistry RAG tasks
-
BixBench: a comprehensive benchmark for LLM-based agents in computational biology
-
Aviary: training language agents on challenging scientific tasks