Data Engineer (The Data Pipeline Architect) - Unreal Gigs
San Francisco, CA
About the Job
Are you passionate about designing, building, and maintaining data pipelines that support robust data architectures and facilitate seamless data flow? Do you excel in creating scalable solutions that empower data-driven decision-making? If you’re ready to develop and optimize data systems that drive impactful analytics, our client has the perfect role for you. We’re seeking a Data Engineer (aka The Data Pipeline Architect) to build and manage cloud-based data infrastructures that support analytical needs and operational efficiencies.
As a Data Engineer at our client, you’ll collaborate with data scientists, analysts, and software engineers to construct data pipelines and storage solutions that are both efficient and secure. Your role will be critical in ensuring data systems are optimized for performance, reliability, and scalability.
Key Responsibilities:
- Design and Implement Scalable Data Pipelines:
- Develop and maintain data pipelines that support data ingestion, transformation, and integration using cloud technologies. You’ll automate data workflows and ensure the seamless movement of data between various systems.
- Manage and Optimize Data Storage Solutions:
- Architect and maintain data lakes and data warehouses using platforms like BigQuery, Redshift, Snowflake, or similar cloud-based solutions. You’ll ensure data structures are built for performance and scalability.
- Collaborate with Data Teams for Strategy Development:
- Work closely with data scientists, analysts, and business stakeholders to understand data requirements and align data solutions with business goals. You’ll provide input on data models and storage strategies.
- Ensure Data Quality and Reliability:
- Implement and manage processes for data validation, error handling, and consistency checks. You’ll ensure the quality of data is maintained through robust testing and monitoring practices.
- Develop and Automate ETL Processes:
- Build ETL (Extract, Transform, Load) workflows to handle complex data transformations. You’ll automate data extraction and transformation to support efficient data integration and reporting.
- Monitor and Maintain Data Infrastructure:
- Use monitoring tools to track the performance and reliability of data systems. You’ll proactively identify and resolve potential issues to maintain system health and performance.
- Optimize Data Processing and Resource Management:
- Implement strategies for efficient resource allocation and cost-effective data processing. You’ll leverage parallel processing and cloud capabilities to enhance performance.
Requirements
Required Skills:
- Cloud Data Platform Expertise: Experience with cloud data platforms such as AWS (Redshift, S3, Glue), GCP (BigQuery, Dataflow), or Azure (Azure Data Lake, Synapse). You’re proficient in handling cloud-based data solutions.
- Programming and Scripting Knowledge: Proficiency in Python, Java, or Scala for building data pipelines and data processing tasks. You can write clean, efficient code for automation.
- ETL and Data Pipeline Management: Proven ability to develop, maintain, and optimize ETL processes that handle large volumes of data. You’re experienced with orchestration tools like Apache Airflow or Luigi.
- SQL and Database Management: Strong ability to write complex SQL queries and work with relational and NoSQL databases.
- Problem-Solving and Critical Thinking: Excellent problem-solving skills with a proactive approach to identifying and resolving data-related challenges.
Educational Requirements:
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field. Equivalent experience in data engineering and cloud technologies may be considered.
- Certifications in cloud data engineering (e.g., Google Professional Data Engineer, AWS Certified Big Data – Specialty, Microsoft Certified: Azure Data Engineer Associate) are a plus.
Experience Requirements:
- 3+ years of experience in data engineering, with a proven track record of building and managing cloud-based data systems.
- Experience with real-time data processing frameworks like Apache Kafka or AWS Kinesis is advantageous.
- Familiarity with containerization and microservices architecture is a plus.
Benefits
- Health and Wellness: Comprehensive medical, dental, and vision insurance plans with low co-pays and premiums.
- Paid Time Off: Competitive vacation, sick leave, and 20 paid holidays per year.
- Work-Life Balance: Flexible work schedules and telecommuting options.
- Professional Development: Opportunities for training, certification reimbursement, and career advancement programs.
- Wellness Programs: Access to wellness programs, including gym memberships, health screenings, and mental health resources.
- Life and Disability Insurance: Life insurance and short-term/long-term disability coverage.
- Employee Assistance Program (EAP): Confidential counseling and support services for personal and professional challenges.
- Tuition Reimbursement: Financial assistance for continuing education and professional development.
- Community Engagement: Opportunities to participate in community service and volunteer activities.
- Recognition Programs: Employee recognition programs to celebrate achievements and milestones.