Senior MLOps Engineer at Recruiting From Scratch
San Francisco, CA 94199
About the Job
We are looking for a Senior MLOps Engineer
Compensation: $215,000 - $270,000 salary, employee equity plan grant & world class benefits.
As a Machine Learning Engineer (MLE) focusing on MLOps, you will maintain and scale our TensorFlow and Kubeflow-based machine learning pipeline, enhancing its reliability and efficiency. In this role, you will optimize deployment processes to reduce setup time and boost pipeline scalability to support throughput as we scale. Working directly with ML engineers, software engineers, and infrastructure teams, you will form cross-functional units to automate workflows, monitor model performance, and address real-time issues. Leverage your expert software engineering skills and comprehensive understanding of the machine learning lifecycle to drive the seamless integration and ongoing enhancement of our AI solutions, ensuring robust performance in production environments.
About the Role:
- Enhance ML Pipeline Efficiency: Directly improve the robustness, scalability, and performance of our TensorFlow and Kubeflow pipelines. Focus on optimizing model training and serving processes, minimizing downtime, and automating routine tasks to increase operational efficiency.
- Drive Platform Scaling & Innovation: Take a leadership role in expanding our ML platform's capacity to manage larger data volumes and more complex models. Research and integrate cutting-edge technologies, develop scalable architectures, and elevate system performance and efficiency through continuous enhancements.
- Establish MLOps Excellence: Design and implement robust MLOps frameworks that streamline the integration, continuous deployment, and monitoring of ML models. Set up comprehensive CI/CD pipelines, automate testing, and create monitoring tools to proactively track model performance and detect issues.
- Foster Cross-Functional Collaboration: Partner with data scientists, software engineers, and product teams to transform business requirements into scalable and dependable machine learning solutions. Bridge the gap between model development and deployment, ensuring models are production-ready and align with performance standards.
- Overcome Production Challenges: Proactively monitor, troubleshoot, and resolve issues affecting model performance, data pipeline integrity, and system efficiency. Identify root causes and implement strategic solutions to ensure the ongoing stability and performance of our ML infrastructure.
Requirements
About You:
- Educational and Technical Foundation: Bachelor’s degree in Computer Science or a related technical field, or equivalent practical experience. You should have solid experience in maintaining and scaling machine learning pipelines using TensorFlow and Kubeflow.
- Advanced MLOps Proficiency: At least 3 years of experience in ML Engineering, with expertise in deploying models and managing ML workflows, and familiarity with MLflow, TFX, or Airflow.
- Strategic Problem Solver with Collaborative Spirit: Excel at solving complex problems at scale and have a proven ability to work effectively within collaborative fast-paced, cross-functional teams. You're adept at communicating technical concepts across various stakeholder groups, ensuring alignment and understanding.
- Innovative Tech Enthusiast with Cloud Expertise: Your proactive approach drives you to continually seek improvements in ML development and deployment processes. You have a strong knowledge of cloud platforms, particularly AWS, and experience with containerization tools like Docker and Kubernetes.