VP of AI Infrastructure (The Backbone Builder) - Unreal Gigs
San Francisco, CA
About the Job
Are you passionate about building robust and scalable infrastructure that powers the next generation of AI-driven products and services? Do you have the strategic insight and technical expertise to lead the development of AI infrastructure that supports cutting-edge research, large-scale deployment, and high-performance computing? If you’re ready to design and implement the foundation of AI innovation, our client has the ideal role for you. We’re seeking a VP of AI Infrastructure (aka The Backbone Builder) to architect and oversee the AI infrastructure that accelerates development, deployment, and performance for AI solutions across the company.
As the VP of AI Infrastructure at our client, you’ll be responsible for leading a team of infrastructure engineers, data architects, and platform experts. You’ll collaborate closely with data scientists, machine learning engineers, and product teams to create a secure, flexible, and high-performing infrastructure that supports the company’s AI initiatives. From optimizing data pipelines to managing GPU clusters and ensuring scalable model deployment, your work will be vital in enabling the AI team to build, test, and deploy with confidence.
Key Responsibilities:
- Set the Vision and Strategy for AI Infrastructure:
- Develop and execute a strategic roadmap for AI infrastructure, aligning it with the company’s AI goals and business objectives. You’ll prioritize areas that enhance scalability, reduce latency, and optimize resource usage to support AI-driven innovation.
- Lead the Development of Scalable and Secure Data Pipelines:
- Oversee the design, development, and maintenance of data pipelines that enable real-time data processing and batch analytics. You’ll ensure that data is readily available, secure, and accessible to AI teams for training, testing, and deployment.
- Optimize Compute Resources and High-Performance Environments:
- Manage the infrastructure required for high-performance computing, including GPU clusters, cloud resources, and storage systems. You’ll work to maximize cost-efficiency, minimize latency, and optimize computational resources for AI workloads.
- Enable Scalable AI Model Deployment and Monitoring:
- Develop frameworks for model deployment, monitoring, and versioning across production environments. You’ll ensure that deployed models are robust, scalable, and maintain high availability for end-users.
- Collaborate with Security and Compliance Teams:
- Work closely with cybersecurity and compliance teams to enforce data privacy, security protocols, and regulatory compliance. You’ll implement best practices to safeguard data integrity and meet industry standards for AI systems.
- Drive Automation and Continuous Integration for AI Workflows:
- Implement CI/CD pipelines and automation for model testing, deployment, and updates. You’ll streamline processes that enable rapid iteration, reducing time to market for AI solutions and enhancing productivity for AI teams.
- Stay Updated on Emerging Infrastructure Technologies and Best Practices:
- Keep up with advancements in infrastructure technology, including containerization, distributed computing, and serverless architecture. You’ll integrate new solutions and best practices to maintain a competitive, forward-looking infrastructure.
Requirements
Required Skills:
- AI Infrastructure Expertise: Extensive experience with designing, managing, and scaling AI infrastructure, including data pipelines, compute resources, and cloud services. You’re proficient in creating infrastructure that supports data processing, high-performance computing, and large-scale deployments.
- High-Performance Computing and Cloud Platforms: Expertise in high-performance environments, including experience with cloud platforms (AWS, GCP, Azure), GPU clusters, and distributed computing. You understand how to maximize computational efficiency for AI workloads.
- Data Architecture and Security Knowledge: Strong understanding of data architecture, ETL processes, and data governance. You’re skilled at creating secure, compliant data pipelines that support the demands of AI research and applications.
- Automation and CI/CD for AI: Experience implementing CI/CD pipelines, automation tools, and containerization technologies (Docker, Kubernetes). You know how to streamline workflows for rapid testing, deployment, and scaling of AI models.
- Strategic Vision and Collaboration: Proven ability to develop a strategic vision for AI infrastructure and work collaboratively with cross-functional teams, including AI researchers, data engineers, and security professionals. You understand how to align infrastructure with organizational goals.
Educational Requirements:
- Master’s or Ph.D. in Computer Science, Data Engineering, Cloud Computing, or a related field. Equivalent experience in AI infrastructure and platform engineering may be considered.
- Certifications in cloud computing, data engineering, or infrastructure management (e.g., AWS Certified Solutions Architect, Google Cloud Professional Data Engineer) are advantageous.
Experience Requirements:
- 10+ years of experience in infrastructure engineering, cloud computing, or a similar field, with a proven track record of building infrastructure for AI and machine learning applications.
- 5+ years of experience in a leadership role, managing teams in AI infrastructure, data architecture, or high-performance computing environments.
- Familiarity with regulatory requirements and compliance standards in data security and privacy is highly desirable.
Benefits
- Health and Wellness: Comprehensive medical, dental, and vision insurance plans with low co-pays and premiums.
- Paid Time Off: Competitive vacation, sick leave, and 20 paid holidays per year.
- Work-Life Balance: Flexible work schedules and telecommuting options.
- Professional Development: Opportunities for training, certification reimbursement, and career advancement programs.
- Wellness Programs: Access to wellness programs, including gym memberships, health screenings, and mental health resources.
- Life and Disability Insurance: Life insurance and short-term/long-term disability coverage.
- Employee Assistance Program (EAP): Confidential counseling and support services for personal and professional challenges.
- Tuition Reimbursement: Financial assistance for continuing education and professional development.
- Community Engagement: Opportunities to participate in community service and volunteer activities.
- Recognition Programs: Employee recognition programs to celebrate achievements and milestones.