PhD Researcher in STEM Model Evaluation Job at SaidGig, Remote

Q3F4QjdZMW53WDY3TGJCaXROUTBOR0tJanc9PQ==
  • SaidGig
  • Remote

Job Description

Role Overview

Contribute to a pioneering project focused on evaluating frontier models by designing and validating complex benchmark tasks in data science, machine learning, finance, and coding. This role emphasizes the development of robust, real-world tasks with executable tests, followed by the analysis of model and agent behavior to identify reasoning and problem-solving gaps.

Key Responsibilities
  • Design challenging, real-world STEM problems.
  • Implement each task within an agentic development environment using Python.
Core Qualifications
  • Deep expertise in data science, machine learning, finance, and/or Python-based coding.
  • Active or recently graduated PhD from a top U.S.-based school.
  • Strong research background in frontier STEM topics.
  • Ability to engage reliably for 30+ hours per week, primarily on weekdays.
  • Demonstrated technical output, such as high-quality open-source contributions, especially in agentic or LLM tooling ecosystems.
  • Comfort with reading and reasoning about agent behavior traces to diagnose failure modes beyond surface-level errors.
More About the Opportunity
  • Initial focus area: agentic workflows for STEM tasks.
  • Familiarity with agentic frameworks and OSS ecosystems is beneficial (e.g., LangChain, MetaGPT, AutoGen, AutoGPT, CrewAI, LlamaIndex, BabyAGI, SuperAGI, CAMEL, AgentGPT, Dify).
  • Deliverables are expected to be reproducible and testable, with clear specifications, deterministic tests where possible, and documented environments.
About

is a talent marketplace connecting top experts with leading AI labs and research organizations. Our investors include Benchmark, General Catalyst, Adam D’Angelo, Larry Summers, and Jack Dorsey. Thousands of professionals across various domains, including law, creative fields, engineering, and research, have joined to work on groundbreaking projects shaping the future of AI.

Job Tags

Remote job, Summer work, Weekday work

Similar Jobs

Ultium Cells LLC

Inspection Engineer II- Electrode Job at Ultium Cells LLC

 ...to get around. Job Purpose Prevent quality issue in advance and maximize output (yield, OEE) through the establishment of inspection system, management of inspection equipment maintenance, improvement of inspection equipment performance, and the introduction of... 

Branscome

Heavy Equipment Operator - Hampton Roads Job at Branscome

 ...Job Title: Heavy Equipment Operator Hampton Roads Department: Construction / Grading Reports to: Project Superintendent...  ...certification a plus ~ Ability to read and interpret grade elevations and stakeout ~ Valid Drivers License in good standing... 

Forensic Risk Alliance

Senior Associate, Forensic Accounting - Cryptocurrency Job at Forensic Risk Alliance

 ...Job Description We are looking for a Subject Matter Expert in cryptocurrency who seeks an exciting, long-term career opportunity at one of the most highly-respected forensic accounting, investigations, and compliance consultancies in the world. This individual has... 

Randstad Technologies

Hybrid Call Center Manager-Health Insurance Job at Randstad Technologies

 ...Permanent The Mission Our client is on a mission to redefine healthcare by providing affordable, high-quality, and accessible care...  ...Insurance Policies (2 years of experience is required) Call Center Management (3 years of experience is required) CRM (2 years... 

OmegaHires

Quality Assurance Engineer Job at OmegaHires

 ...Job Role: Quality Assurance Engineer Location: Phoenix, AZ (Hybrid) Duration: 12 Months Role Overview We are seeking a skilled Quality Assurance Engineer with strong expertise in test automation, API testing, and modern web applications . The ideal...