xPU Specialist - Zaspar Technologies
Palo Alto, CA
About the Job
Job Details
Job Title: xPU Specialist
Location: Palo Alto, CA
Type: Full Time
Our client 8bit.AI is a dynamic startup in the Bay Area, CA seeking to hire Full-time employees and focused on developing a high-performance, multi-technology, vendor-independent, xPU-based Accelerated Cloud Computing platform. We stack massive clusters purpose-built for high-performance parallel computing and aim to launch a global accelerated cloud solution. Additionally, the firm will focus on broader Artificial General Intelligence (AGI) products, supercomputing services, and end-to-end AI engineering services.
RESPONSIBILITIES
- Design and implement innovative hardware solutions for highly scalable and efficient xPU PODs.
- Collaborate with architects, software engineers, and system engineers to
- ensure optimal integration of hardware and software components within the PODs
- Deeply understand leading xPU architectures from NVIDIA, AMD, and/or Intel and leverage their capabilities for performance optimization within PODs
- Conduct performance and power analysis to identify and implement strategies for optimizing resource utilization and power consumption within the PODs
- Participate in the development and execution of hardware verification and validation plans
- Stay up to date on the latest advancements in xPU technology and related hardware trends
- Contribute to technical documentation and maintain clear communication within the team
QUALIFICATIONS:
- Master's degree in Electrical Engineering, Computer Engineering, or a related field (or equivalent experience).
- Minimum 8+ years of experience in designing and developing hardware solutions, preferably for data center or high-performance computing environments
- Proven experience with virtualization platforms, preferably including VMware (vSphere, ESXi, etc.) OR Nutanix (AHV, AOS, etc.)
- Strong understanding of hypervisor technologies and their functionalities
- Ability to integrate and manage both internal and external virtualization platforms
- Must have experience developing and running applications using the ROCm platform with a strong understanding of ROCm components like HIP, OpenCL, and AMD GPU architecture
- In-depth knowledge of xPU architectures, particularly from NVIDIA, AMD, or Intel.
- Completed certifications in NVIDIA AI in Datacenter and InfiniBand OR C-DAC certification.
- Strong understanding of computer architecture, memory systems, and interfacing techniques.
- Solid understanding of OpenStack concepts and experience managing cloud infrastructure
- Prior experience in building and operating Cloud POD infrastructure
- Proficiency in hardware description languages (HDL) like Verilog or VHDL
- Experience with hardware simulation and verification tools
- Excellent communication, collaboration, and problem-solving skills
- A passion for innovation and a drive to contribute to cutting-edge technology development
GOOD TO HAVE:
- Experience with hardware design for data center infrastructure or high-performance computing systems
- Experience with thermal and power management solutions for high-density computing environments
- Familiarity with xPU programming frameworks like CUDA or OpenCL.
Source : Zaspar Technologies