Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment.

AI workloads are growing at an unprecedented pace, and inference has become one of the most critical challenges in modern computing. Large-scale models demand massive compute resources, and the diversity of hardware across cloud and edge adds complexity. Achieving low latency and high throughput while controlling cost requires rethinking the entire inference stack—from algorithms to infrastructure.

Within our Systems Innovation research group, we pursue a full stack approach towards AI inference. We closely collaborate with multiple research teams and product groups across the globe. Some of the research problems we are currently working on are related to request scheduling/batching mechanisms, KV caching optimizations, LLM inference optimizations, and GPU fleet orchestration.

We are looking for Research Interns to help advance the state of the art of systems for efficient AI. The ideal candidate will have background in systems for AI, including end-to-end AI inference pipelines, request scheduling and batching mechanisms, performance optimizations for AI inference, and KV caching mechanisms.

Research Intern - Systems For Efficient AI