Описание
The AI Platform team at Datadog builds the infrastructure that powers the next generation of generative AI features across our products. As a Senior Software Engineer on the Evaluation and Annotation team, you will design and evolve the systems that define and measure AI quality at scale. This includes building evaluation pipelines, model performance monitoring, and annotation workflows that assess correctness, safety, bias, and reliability across production use cases.
Your work will directly shape how Datadog ships and maintains trustworthy AI capabilities. You will partner closely with product, ML, and infrastructure teams to define quality standards, integrate evaluation systems with our observability platform, and build human-in-the-loop feedback mechanisms that continuously improve model behavior. At Datadog, we place value in our office culture - the relationships that it builds, the creativity it brings to the table, and the collaboration of being together.
We operate as a hybrid workplace to ensure our employees can create a work-life harmony that best fits them. What You’ll Do: Design and scale robust evaluation systems to measure the performance and reliability of LLMs and AI agents across Datadog’s product ecosystem Lead efforts to build human-in-the-loop and automated annotation pipelines for model assessment, ensuring high-quality training and feedback data Define and implement continuous evaluation workflows in CI/CD and production environments to monitor model behavior in real time Analyze model outputs for correctness, bias, safety, and reliability and translate insights into actionable improvements Collaborate cross-functionally with Applied Scientists, Researchers, product
Контакты работодателя (email/phone/telegram) скрыты из публичного превью —
отправьте резюме, чтобы мы связали вас напрямую.