NVIDIA is seeking experienced Software Engineers to design, build, and manage cloud infrastructure services at scale. You’ll ensure the reliability of internal and external cloud services running on hardware for accelerated computing.
Key Responsibilities:
- Design and manage cloud infrastructure services, including integrations, migrations, updates, and decommissions.
- Define service level objectives and error budgets.
- Automate processes to reduce toil.
- Provide consultation on system design best practices.
Requirements:
- BS in Computer Science or related field, or equivalent experience.
- 5+ years of relevant experience.
- Strong experience with infrastructure automation and distributed systems.
- Proficiency in Python, Go, or C++.
- In-depth knowledge of Linux, Slurm, Kubernetes, Networking, Storage, and Containers.
Preferred:
- Experience with bare metal as a service (BMaaS), multi-cloud infrastructure, or NVIDIA NCCL.
- Experience with SRE practices and running cloud systems (Kubernetes, OpenStack, Docker, Slurm).
- Strong communication and problem-solving skills.
Compensation: Base salary range $148,000 – $276,000, plus equity and benefits.
Location :
- Santa Clara, CA, US
- Remote, WA, US
- Remote, CA, US
- Remote, OR, US