Senior Data Scientist

Microsoft hybrid • Redmondfull_time
M365 Copilot Cadets (Customer & Analytics‑Driven Eval Team) turns real‑world customer feedback into evaluation datasets, rubrics, and insights that measurably improve Microsoft 365 Copilot quality. We connect customer scenarios, analytics, and rigorous evaluation frameworks to power a continuous feedback flywheel across Microsoft 365 Copilot to accelerate measurable product improvements.


As a Senior Data Scientist part of Cadets, you will own evaluation analytics end‑to‑end: curate datasets from customer and production signals; author binary‑first rubrics; build LLM (Large Language Model)‑as‑judge graders and work on high‑quality synthetic data generation to scale evaluations with experience in human‑match rates. You’ll partner with PM/Eng/Design and VIP customers to ship quality gains and AI features with confidence.

You’ll Thrive Here If You Have:Evaluation proficiency for LLM/agent systems: dataset curation, rubric design, human‑in‑the‑loop grading, judge prompts with quantitative agreement goals.

Experience in analytics & experimentation skills (statistical inference, A/B), plus Python/SQL for large‑scale trace analysis.

LLM fundamentals: prompt engineering, few‑shot design, retrieval metrics, multi‑turn/agent trace evaluation.

Data quality mindset: trace hygiene, metadata design, policy/PII awareness, and principled guardrails.

 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.