Principal Architect - GPU Optimization - REMOTE Startup - Living Talent
Miami, FL
About the Job
- Company: Revenue-Generating Startup (Series A)
- Company Size: 30 Employees
- Culture: REMOTE-First, Smart, Fun, Low-Ego Team
- Compensation: Base Salary $250k++, Equity
Position Overview:
- Chief Architect to lead the development of a Kubernetes-based AI/ML Infrastructure optimization platform.
- You will collaborate closely with leadership, engineering, and product teams to create and deliver enterprise-class solutions that push the boundaries of performance, scalability, and cost efficiency.
- You’ll also stay engaged with the open-source community, particularly CNCF projects, while driving innovation in cloud-native architectures and FinOps practices.
Key Responsibilities:
- Architecture & Development: Design and develop a Kubernetes-based AI/ML optimization platform focused on cloud cost optimization and resource utilization.
- Leadership & Collaboration: Partner with C-level staff, product management, engineering, and design teams to align on strategic goals and ensure cross-functional execution.
- Communication: Create detailed architecture diagrams, technical documentation, and presentations that clearly convey your designs and strategy.
- User Experience Focus: Ensure the platform delivers an exceptional experience for Infrastructure Admins and MLOps professionals.
- Open Source Engagement: Actively participate in the CNCF and other relevant open-source communities, contributing to and learning from key projects.
- Enterprise-Class Solutions: Deliver high-performance, scalable solutions for enterprise-level AI, ML, and data-driven applications.
- FinOps & SRE Best Practices: Incorporate modern cloud financial management (FinOps) and SRE methodologies into infrastructure design and operation.
Qualifications:
- Entrepreneurial Mindset: Startup experience with a proven track record in infrastructure-level software architecture and development (10+ years).
- Extensive Experience: Strong hands-on expertise with Linux and virtualization platforms.
- Cloud Expertise: Deep experience with major cloud platforms (AWS, GCP, or Azure).
- Kubernetes-Based ML/AI Systems: Experience with tools like Kubeflow, Kueue, KServe, GPU Operators, DRA, and Karpenter.
- Deep Knowledge: Thorough understanding of ML/AI use cases, including model development, training, inference, and hardware accelerator usage (CPU, GPU, TPU).
- Cloud-Native Architectures: Proven expertise in designing and delivering scalable, available, reliable, and secure cloud-native systems with strong observability practices.
- Open Source Contribution: Active involvement in CNCF or other relevant open-source communities.
- Leadership Skills: Demonstrated ability to lead and collaborate effectively with cross-functional teams.
- Excellent Communication: Strong written and verbal communication skills, with the ability to explain complex technical concepts to a range of stakeholders.
Preferred Qualifications:
Experience with additional ML/AI frameworks and tools.
Familiarity with DevOps practices and tools.
Certifications in Kubernetes or related technologies.
Knowledge of FinOps and SRE best practices.
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
Source : Living Talent