Requirements
• Bachelor’s degree in Computer Science, Software Engineering, or a related field,
• 5+ years of experience in software engineering, with significant ownership of backend or distributed systems,
• Strong proficiency in Python, with experience building production services,
• Hands-on experience with AI/ML model serving, inference pipelines, or ML systems engineering,
• Experience designing reliable, scalable systems for production environments,
• Familiarity with cloud platforms (AWS, GCP) and containerized environments (Docker, Kubernetes),
• Strong debugging skills across system, data, and model-facing failures,
• Excellent communication skills and ability to collaborate across research and engineering teams,
• (Desirable) Experience with fine-tuning techniques such as LoRA or PEFT,
• (Desirable) Familiarity with model evaluation frameworks and regression testing,
• (Desirable) Experience with GPU-based workloads or ML infrastructure,
• (Desirable) Knowledge of data formats and pipelines commonly used in ML systems,
• (Desirable) Prior experience working closely with AI research or incubation teams
What the job involves
• We are seeking a Lead AI Engineer, ML Systems to join the Salesforce AI Research Incubation Team,
• In this role, you will own the engineering systems that power model inference, fine-tuning, and evaluation, enabling research models to be reliably deployed and evolved in production environments,
• You will work closely with AI researchers, agent engineers, and platform teams to support model serving, LoRA-based fine-tuning workflows, and model lifecycle management. This role focuses on production ML systems, not on inventing new model architectures,
• This is a lead-level individual contributor role with deep ownership of model-facing systems and strong cross-team influence,
• Design, build, and maintain model inference and serving systems, including integration with AI gateways,
• Own and evolve fine-tuning pipelines (e.g., LoRA / PEFT) using internal model tooling,
• Develop and maintain model evaluation, regression detection, and rollout workflows,
• Collaborate with AI researchers to transition research models into production-ready assets,
• Optimize inference systems for latency, throughput, stability, and cost efficiency,
• Implement best practices for model versioning, deployment, rollback, and monitoring,
• Partner with agent and platform engineers to ensure smooth integration between model systems and agent runtimes,
• Provide technical leadership and mentorship on ML system design and operational excellence