Job Description:
• Lead, mentor, and scale a high-performing engineering team focused on deep learning inference and GPU-accelerated software
• Drive the strategy, roadmap, and execution of NVIDIA’s inference frameworks engineering
• Partner with internal compiler, libraries, and research teams to deliver end-to-end optimized inference pipelines
• Oversee performance tuning, profiling, and optimization of large-scale models
• Guide engineers in adopting best practices for CUDA, Triton, CUTLASS, and multi-GPU communications
• Represent the team in roadmap and planning discussions
• Foster a culture of technical excellence, open collaboration, and continuous innovation
Requirements:
• MS, PhD, or equivalent experience in Computer Science, Electrical/Computer Engineering, or a related field
• 6+ years of software development experience
• 3+ years in technical leadership or engineering management
• Strong background in C/C++ software design and development
• Proficiency in Python is a plus
• Hands-on experience with GPU programming (CUDA, Triton, CUTLASS)
• Proven record of deploying or optimizing deep learning models in production environments
• Experience leading teams using Agile or collaborative software development practices
Benefits:
• Health insurance
• Comprehensive benefits package
Apply Now
Apply Now