Data Engineer (The Data Pipeline Architect) - Unreal Gigs
San Francisco, CA
About the Job
Are you passionate about building data infrastructure that powers advanced analytics and machine learning? Do you thrive on transforming raw data into well-organized, accessible, and reliable datasets that fuel data-driven decision-making? If you’re excited about working with cutting-edge data technologies and architecting scalable pipelines, then our client has an exciting opportunity for you. We’re looking for a Data Engineer (aka The Data Pipeline Architect) to design, develop, and optimize the data systems that form the backbone of our products.
As a Data Engineer at our client, you’ll be responsible for constructing efficient, scalable data pipelines, ensuring data is accessible and usable for analysts, data scientists, and business stakeholders. You’ll work with large datasets, implement ETL processes, and build the infrastructure that powers analytics and AI-driven insights.
Key Responsibilities:
- Design and Develop Data Pipelines:
- Build and maintain robust, scalable, and efficient data pipelines to ingest, process, and store data from a variety of sources. You’ll design ETL (Extract, Transform, Load) processes to move and transform data, ensuring data integrity and accuracy.
- Data Warehouse Management:
- Architect and maintain data warehouses or data lakes using cloud platforms (e.g., AWS Redshift, Google BigQuery, or Snowflake) to organize and store large-scale datasets. You’ll ensure the infrastructure is optimized for fast querying and scalability.
- Collaborate with Data Scientists and Analysts:
- Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver datasets that meet business needs. You’ll provide clean, well-structured data to enable advanced analytics and machine learning projects.
- Data Quality and Governance:
- Implement data quality checks and monitoring systems to ensure the accuracy, completeness, and consistency of data across the pipeline. You’ll help establish data governance standards and policies to ensure compliance and security.
- Performance Optimization:
- Optimize the performance of data systems, ensuring fast and reliable data access. You’ll tune queries, design efficient storage architectures, and implement best practices for data retrieval and processing.
- Automation and Monitoring:
- Automate data workflows, pipeline deployments, and data quality checks to minimize manual intervention. You’ll set up monitoring and alerting systems to detect issues early and ensure smooth operation of data pipelines.
- Data Security and Compliance:
- Implement security protocols to protect sensitive data, ensuring compliance with relevant regulations such as GDPR, HIPAA, or SOC2. You’ll work with security teams to enforce access controls, encryption, and data privacy best practices.
Requirements
Required Skills:
- Data Engineering Expertise: Strong experience building and maintaining data pipelines, ETL processes, and data warehouses using cloud platforms (AWS, GCP, Azure). You’re skilled at handling large, complex datasets efficiently.
- Programming and Scripting: Proficiency in languages such as Python, SQL, or Scala, and experience with data engineering tools like Apache Spark, Airflow, or Kafka. You can write efficient code to process and transform large datasets.
- Data Warehousing and Storage: Expertise in managing and optimizing data warehouses or data lakes (e.g., Redshift, BigQuery, Snowflake). You understand partitioning, indexing, and storage optimization techniques.
- Database and Query Optimization: Strong knowledge of database design and query optimization for performance. You can fine-tune SQL queries and structure databases for fast, reliable access to large volumes of data.
- Data Governance and Security: Solid understanding of data governance practices, security protocols, and compliance regulations. You can enforce data privacy and implement measures to safeguard sensitive information.
Educational Requirements:
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Technology, or a related field. Equivalent experience in data engineering is also highly valued.
- Certifications in cloud platforms (AWS, GCP, Azure) or data engineering technologies (e.g., Apache Hadoop, Apache Spark) are a plus.
Experience Requirements:
- 3+ years of experience in data engineering, with hands-on experience building and managing data pipelines, data warehouses, and cloud-based storage solutions.
- Proven experience working with big data technologies and distributed systems, optimizing data flows and processes to handle large datasets.
- Familiarity with data quality, data governance, and data security best practices is highly desirable.
Benefits
- Health and Wellness: Comprehensive medical, dental, and vision insurance plans with low co-pays and premiums.
- Paid Time Off: Competitive vacation, sick leave, and 20 paid holidays per year.
- Work-Life Balance: Flexible work schedules and telecommuting options.
- Professional Development: Opportunities for training, certification reimbursement, and career advancement programs.
- Wellness Programs: Access to wellness programs, including gym memberships, health screenings, and mental health resources.
- Life and Disability Insurance: Life insurance and short-term/long-term disability coverage.
- Employee Assistance Program (EAP): Confidential counseling and support services for personal and professional challenges.
- Tuition Reimbursement: Financial assistance for continuing education and professional development.
- Community Engagement: Opportunities to participate in community service and volunteer activities.
- Recognition Programs: Employee recognition programs to celebrate achievements and milestones.