Data Architect - Palo Alto, CA (Hybrid) - Georgia Tek Systems
Palo Alto, CA
About the Job
Data Architect
Location: Palo Alto, CA (Hybrid)
Duration: Long Term
Rate: DOE
Key Responsibilities:
Data Orchestration:
Experience and Knowledge:
Location: Palo Alto, CA (Hybrid)
Duration: Long Term
Rate: DOE
Key Responsibilities:
Data Orchestration:
- Design, implement, and manage data workflows using Airflow to automate and orchestrate data processing tasks.
- Optimize Airflow DAGs (Directed Acyclic Graphs) for performance and scalability.
- Develop and maintain distributed task processing using Celery and ensure robust task queue management with Redis or RabbitMQ.
- Design and manage databases using Cosmos DB, MongoDB, and PostgreSQL.
- Develop and maintain efficient data models and ensure data consistency and integrity.
- Implement and manage FastAPI webhooks to handle data ingestion and integration tasks.
- Develop and maintain Azure Functions to support webhook operations and integrate with cloud services.
- Implement and manage Kafka Streams to handle real-time data processing and streaming requirements.
- Work with Iceberg to manage and optimize large-scale data lake storage and querying.
- Collaborate with data scientists, engineers, and business analysts to understand data requirements and provide technical solutions.
- Document processes, architectures, and configurations to ensure knowledge sharing and compliance with best practices.
Experience and Knowledge:
- Proven experience with Airflow for data orchestration and workflow management.
- Hands-on experience with Celery for task management and Redis or RabbitMQ for messaging.
- Proficiency with Cosmos DB, MongoDB, and PostgreSQL for data storage and management.
- Experience developing and managing webhooks using FastAPI and integrating with Azure Functions.
- Knowledge of Kafka Streams for real-time data processing.
- Familiarity with Iceberg for data lake management and optimization.
- Healthcare domain experience is good to have .
- Strong understanding of data pipelines, ETL processes, and data integration.
- Proficient in Python, with experience in building and maintaining data-oriented applications.
- Ability to work with large datasets and optimize performance across distributed systems
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration skills.
- Ability to work independently and manage multiple priorities in a fast-paced environment.
Source : Georgia Tek Systems