Back

Research Scientist, AI Evaluation Science, Seattle

47.6704 -122.378
Seattle 98127, USA
Last edited: less than a week ago
Save
Share

Description

Research Scientist, AI Evaluation Science

AI systems are only as trustworthy as the methods used to evaluate them. At Apple, where AI powers experiences for billions of people, getting evaluation right is not a support function—it is a foundational science. Our team, part of Apple Services Engineering, is building that scientific foundation: rigorous, scalable evaluation methodology for LLMs, agentic systems, and human-AI interaction. What makes this team unusual is its interdisciplinary core. You will work alongside measurement scientists (psychometrics, validity theory), ML researchers, and platform engineers—bringing together ML research, statistical rigor, and production engineering. We are looking for a Research Scientist who treats evaluation methodology itself as a first-class research problem—someone with deep technical fluency in preference learning, reward modeling, or calibration theory, and the drive to advance the field while solving real problems at scale. We're hiring at multiple levels (early-career to senior researchers). What unites all candidates is depth of thinking about evaluation as a research problem.Responsibilities

Advance evaluation methodology through original research in one or more of the following areas: preference learning and reward modeling (RLHF, DPO, reward hacking mitigation); LLM-as-judge calibration, rubric design, and bias detection; intelligent evaluation strategies including active learning for test selection and automated failure discovery; or validity frameworks for evaluators (construct validity, transfer learning). You are not expected to cover all of these—depth matters more than breadth.Publish at top-tier venues (NeurIPS, ICML, ICLR, ACL, EMNLP), contributing to evaluation science as a recognized research area and representing Apple in the research community. Translate research into production-ready tools by partnering with platform engineers to productionize your methods into evaluation SDKs and APIs used across Apple.Collaborate with measurement scientists to integrate psychometric methods and validity frameworks into evaluation systems, ensuring evaluators measure what they claim to measure. Define the team's research agenda for evaluation science by identifying high-leverage open problems, validating that they address real-world challenges faced by ML engineers across Apple, and designing rigorous experimental programs to solve them.Minimum Qualifications

Ph.D. in Computer Science, Machine Learning, or a closely related field, with a research focus in evaluation-adjacent areas (preference learning, RLHF, human feedback, calibration, automated assessment) Strong publication record at top-tier conferences (NeurIPS, ICML, ICLR, ACL, EMNLP), including first-author publications demonstrating independent research contributionsDeep technical expertise in at least one evaluation-adjacent ML area, with strong mathematical foundations: preference learning and reward modeling (RLHF, DPO, reward hacking, specification gaming); OR calibration theory, proper scoring rules, and statistical reliability; OR human-AI interaction methodology (active learning, annotation quality, preference elicitation)Demonstrated ability to implement complex methods from recent papers and run large-scale experiments Track record of translating research into practical systems—prototypes, tools, or methods adopted by others Excellent written and verbal communication skills, including the ability to write clear research papers and explain complex concepts to diverse audiencesPay&Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $201,300 and $302,200, and your base pay will depend on your skills, qualifications, experience, and location. Apple employees also have the opportunity to become an Apple shareholder through participation in Apple's discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple's Employee Stock Purchase Plan. You'll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses— including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program. Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or otherlegally protected characteristics. Learn more about your EEO rights as an applicant At Apple, we believe accessibility is a fundamental human right. You'll find that idea reflected in everything here — in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong. Learn about accessibility in Apple's workplace Learn about reasonable accommodations for job applicants Apple accepts applications to this posting on an ongoing basis.

Highlights

Company name

Apple
Job position

Research Scientist, AI Evaluation Science

Ad ID:

8791732321
Flag
Block ad

Safety Tips

Beware of ads written with poor grammar or spelling.