Job Description

Role Overview
Contribute to a pioneering project focused on evaluating frontier models by designing and validating complex benchmark tasks in data science, machine learning, finance, and coding. This role emphasizes the development of robust, real-world tasks with executable tests, followed by the analysis of model and agent behavior to identify reasoning and problem-solving gaps.
Key Responsibilities
Design challenging, real-world STEM problems.
Implement each task within an agentic development environment using Python.
Core Qualifications
Deep expertise in data science, machine learning, finance, and/or Python-based coding.
Active or recently graduated PhD from a top U.S.-based school.
Strong research background in frontier STEM topics.
Ability to engage reliably for 30+ hours per week, primarily on weekdays.
Demonstrated technical output, such as high-quality open-source contributions, especially in agentic or LLM tooling ecosystems.
Comfort with reading and reasoning about agent behavior traces to diagnose failure modes beyond surface-level errors.
More About the Opportunity
Initial focus area: agentic workflows for STEM tasks.
Familiarity with agentic frameworks and OSS ecosystems is beneficial (e.g., LangChain, MetaGPT, AutoGen, AutoGPT, CrewAI, LlamaIndex, BabyAGI, SuperAGI, CAMEL, AgentGPT, Dify).
Deliverables are expected to be reproducible and testable, with clear specifications, deterministic tests where possible, and documented environments.
About
is a talent marketplace connecting top experts with leading AI labs and research organizations. Our investors include Benchmark, General Catalyst, Adam D’Angelo, Larry Summers, and Jack Dorsey. Thousands of professionals across various domains, including law, creative fields, engineering, and research, have joined to work on groundbreaking projects shaping the future of AI.

Job Tags

Remote job, Summer work, Weekday work

Similar Jobs

Artisan Crew

Viral marketer / Viral Content Creator: 1099 Job at Artisan Crew

...them to Youtube and gotten no views and left it at that or a person who will studiously submit spammy links of stuff we've done to Reddit, we mean a full package real deal amazingly creative but needs work / clients for whom to do that work person. We need a person who...

BJC Healthcare

Intermediate Care Unit Registered Nurse (RN) - Night Shift Job at BJC Healthcare

...Additional Information About the Role Join the Intermedicate Care Unit at Alton Memorial Hospital , part of BJC HealthCare , where nurses are supported, recognized, and empowered to provide exceptional care. This role is open to both new graduate and experienced...

xAI

Environmental Engineer Job at xAI

...share knowledge with their teammates. ABOUT THE ROLE: The Environmental Engineer will support the operations and continued expansion... ...Bachelor's degree in environmental engineering, environmental science, or a related field. ~2+ years of experience in the design,...

National Steel and Shipbuilding Com

Welder (Repair) Job at National Steel and Shipbuilding Com

...Job Posting End Date:Until Filled Shift: ALL Security Clearance: No Clearance required Job Summary Welds shell plating, fabricated cast or forged components according to specifications for vessels, tanks or other structural assemblies. Performs...

CANDA Solutions, LLC

Background Investigator - Experienced Job at CANDA Solutions, LLC

...Join CANDA Solutions, LLC as a Full Time Background Field Investigator and elevate your career in the Investigations industry! This position... ...great benefits such as Medical, Dental, Vision, 401(k), Life Insurance, Health Savings Account, Competitive Salary, and Paid Time...

PhD Researcher in STEM Model Evaluation Job at SaidGig, Remote

Q3F4QjdZMW53WDY3TGJCaXROUTBOR0tJanc9PQ==