Protege Jobs

Machine Learning Researcher, RL & Agentic Systems

Protege

Machine Learning Researcher, RL & Agentic Systems

Posted 20 Days Ago

Remote

Hiring Remotely in USA

Senior level

Remote

Hiring Remotely in USA

Senior level

The Machine Learning Researcher will design datasets and evaluate environments for agentic systems, focusing on data quality and benchmarking model performance in RL settings. They will collaborate with teams to translate workflows into evaluative frameworks and improve the understanding of high-quality datasets.

The summary above was generated by AI

Company Overview:

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech.

We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

About DataLab

DataLab exists because truly useful data is rare — and the frontier of AI development only moves forward when high-quality data makes it possible.

We believe data is one of the most underdeveloped layers of the AI stack. Our work focuses on building and evaluating high-value datasets grounded in real-world workflows and economically meaningful tasks.

We work across multiple domains to create safe, high-fidelity datasets that preserve the structure and context needed to train advanced AI systems.

Our research spans data quality, evaluation design, privacy-preserving transformation, workflow reconstruction, and task-grounded AI training data.

At DataLab, applied research is tightly connected to real-world deployment. Researchers work directly with large-scale datasets, production systems, and frontier AI training problems.

Role Overview

Data is the foundation of AI performance, and we believe model quality starts with data quality. As AI systems become more agentic, a critical challenge is understanding which real-world datasets, tasks, and environments actually lead to better model behavior.

We’re seeking a Machine Learning Researcher focused on RL and agentic systems to help define, design, and evaluate the datasets, tasks, environments, and benchmarks used to assess advanced AI systems. In this role, you’ll work closely with research and engineering teams to translate real-world workflows into high-value datasets and evaluation assets: structured tasks, interactive environments, benchmark suites, and quality scorecards that help us understand how models perform in realistic settings.

You’ll help define what “high-quality agentic data” means in practice, using statistical, computational, and ML-driven methods to evaluate dataset quality, task design, environment fidelity, and downstream model performance. You’ll work on the core problems of benchmarking real-world data, measuring how well models perform on that data, and designing RL-style or agentic environments that capture the structure of meaningful work.

This is an ideal role for someone with a strong machine learning background who is excited by reinforcement learning, agentic systems, evaluation, and the role of data in shaping model behavior. You should be excited by the opportunity to build the datasets and benchmarks that help define what high-quality real-world data looks like for frontier AI systems.

What You’ll Do

Design and build datasets, tasks, and environments

Design and build datasets, tasks, environments, and evaluation assets for benchmarking agentic systems and multi-step model behavior.

Translate real-world workflows into structured tasks, interaction traces, trajectories, stateful environments, and verifiable outcomes that can be used to evaluate advanced AI systems.

Develop frameworks for evaluating real-world data quality

Develop frameworks that assess diversity, realism, coverage, fidelity, informativeness, and downstream usefulness of datasets for agentic systems.

Build quality scorecards and evaluation methods that make dataset strengths, weaknesses, and failure modes legible across teams.

Benchmark model behavior in RL and agentic settings

Evaluate planning, tool use, robustness, recovery from failure, task completion, and generalization behavior in RL-style or agentic environments.

Connect model failures back to concrete dataset, environment, or task-design gaps and recommend improvements grounded in empirical evidence.

Build scalable evaluation and validation tooling

Contribute to tools and systems that automate dataset validation, environment generation, rollout analysis, benchmark construction, and evaluation workflows.

Improve internal infrastructure for reproducible experimentation, benchmark management, and evaluation quality.

Partner across research, engineering, and product

Collaborate closely with research and engineering teams to identify data bottlenecks, improve evaluation methodology, and shape internal best practices around task-grounded AI training data.

Represent DataLab’s perspective in cross-functional discussions around dataset quality, benchmark design, and frontier agentic-system evaluation.

What Success Looks Like

Near-term: establish a strong evaluation baseline

Create clear benchmark frameworks, evaluation assets, and dataset-quality scorecards that help Protege reason about how real-world data impacts advanced agentic systems.

Use rigorous evaluation methods to identify meaningful dataset improvements, improve benchmark fidelity, and sharpen the company’s understanding of what high-impact agentic data actually looks like in practice.

What You Bring

PhD or equivalent Master’s Degree + 4+ years industry experience in machine learning, computer science, statistics, engineering, mathematics, economics, or related quantitative fields.
Strong understanding of AI model training pipelines, evaluation methodology, and the role of data in shaping model performance.
Experience working with large, unstructured, or semi-structured datasets used to train or evaluate ML systems.
Experience with reinforcement learning, sequential decision-making, agentic systems, tool-using models, or multi-step model evaluation.
Experience designing tasks, benchmarks, environments, simulations, or evaluation frameworks for real-world model behavior.
Strong intuition for realism, coverage, difficulty, fidelity, and meaningful outcome structure in datasets.
Strong experimental design, evaluation, benchmarking, and data-validation skills.
High ownership and ability to independently identify and solve high-impact problems.

Nice to have

Experience developing evaluation frameworks or performance metrics for datasets, agentic systems, or training data.
Experience translating real-world workflows into structured tasks or environments for model evaluation.
Experience with RLHF, RLAIF, imitation learning, reward modeling, online or offline RL, or related methods.
Experience with Harbor or other agent evaluation frameworks.
Publications or open-source contributions in reinforcement learning, agents, evaluation, or data-centric AI.
Experience collaborating cross-functionally with product, infrastructure, or partnership teams.
Experience with synthetic data generation, trajectory generation, or simulation-based environments.

Protege's Values

Pass the Loved Ones' Test

We act with integrity and do the right thing - especially when it's hard and no one is watching.

Always Find a Way

We are resourceful, resilient builders who solve hard problems and push through obstacles.

Go Fast and Grow Fast

Velocity matters. We move with urgency, learn quickly, and continuously improve as individuals and as a company.

Practice Kindness and Candor

We communicate directly and respectfully, building trust through honest feedback and genuine care for one another.

Deliver Together

We win as one team. Collaboration, accountability, and shared ownership drive our success.

Own the Outcome. Hone the Craft.

We take pride in our work, sweat the details, and continuously raise the bar for excellence.

New York, New York, United States

Similar Jobs

Eve

Senior Deal Desk Analyst (Bay Area, Mountain or Central Time Zone)

27 Minutes Ago

Easy Apply

Remote or Hybrid

United States

Easy Apply

110K-160K Annually

Senior level

110K-160K Annually

Senior level

Legal Tech • Software • Generative AI

Manage deal workflows for new business, renewals, and expansions; review pricing, discounts, contracts, and billing for policy and revenue-recognition alignment; run approval workflows and escalate complex deals; partner with Sales, CS, Finance, Legal, and RevOps; optimize quote-to-cash processes, track deal metrics, and implement AI-powered automation to improve efficiency and scalability.

Top Skills: Ai-Powered ToolsBi/Reporting ToolsCpq ToolsCrm PlatformsDealhubExcelGoogle SheetsHubspotSalesforce

Eve

Software Engineer

27 Minutes Ago

Easy Apply

Remote or Hybrid

United States

Easy Apply

250K-300K Annually

Senior level

250K-300K Annually

Senior level

Legal Tech • Software • Generative AI

Build and own Eve's marketing site and GTM engineering stack: integrate and optimize external tools, design AI agents for sales and marketing, implement webhooks and middleware to sync product/CRM data, and create programmatic campaigns and internal tools in partnership with Marketing, Sales, RevOps, and Product to drive growth and automation.

Top Skills: Ai AgentsCRMCSSHTMLJavaScriptLlmsMarketing AutomationMiddlewarePythonSQLWebhooks

Coinbase

Senior Machine Learning Engineer

31 Minutes Ago

Easy Apply

Remote

USA

Easy Apply

180K-212K Annually

Senior level

180K-212K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

Design and build machine learning systems for Coinbase, responsibly use generative AI tools and copilots, apply human-in-the-loop practices, and deliver measurable efficiency, cost, and quality improvements while collaborating in a remote-first environment with periodic in-person surges.

Top Skills: GeminiGenerative AiGleanLibrechat

What you need to know about the NYC Tech Scene

As the undisputed financial capital of the world, New York City is an epicenter of startup funding activity. The city has a thriving fintech scene and is a major player in verticals ranging from AI to biotech, cybersecurity and digital media. It also has universities like NYU, Columbia and Cornell Tech attracting students and researchers from across the globe, providing the ecosystem with a constant influx of world-class talent. And its East Coast location and three international airports make it a perfect spot for European companies establishing a foothold in the United States.

Key Facts About NYC Tech

Number of Tech Workers: 549,200; 6% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Capgemini, Bloomberg, IBM, Spotify
Key Industries: Artificial intelligence, Fintech
Funding Landscape: $25.5 billion in venture capital funding in 2024 (Pitchbook)
Notable Investors: Greycroft, Thrive Capital, Union Square Ventures, FirstMark Capital, Tiger Global Management, Tribeca Venture Partners, Insight Partners, Two Sigma Ventures
Research Centers and Universities: Columbia University, New York University, Fordham University, CUNY, AI Now Institute, Flatiron Institute, C.N. Yang Institute for Theoretical Physics, NASA Space Radiation Laboratory

Protege

Machine Learning Researcher, RL & Agentic Systems

Protege New York, New York, USA Office

Similar Jobs

Senior Deal Desk Analyst (Bay Area, Mountain or Central Time Zone)

Software Engineer

Senior Machine Learning Engineer

What you need to know about the NYC Tech Scene

Key Facts About NYC Tech