Big Data Architect / Lead-Columbus Ohio - Georgia IT Inc.
Columbus, OH
About the Job
Big Data Architect / Lead
Columbus Ohio
Contract
US Citizen, GC Holder, EAD and TN
Qualifications
Columbus Ohio
Contract
US Citizen, GC Holder, EAD and TN
Qualifications
- 11-15 years of total IT experience including 4+ years of Big Data experience (Hadoop, Spark (Java or Scala or Python), HBase, Hive, Impala, Kafka, etc.)
- Hands on experience on designing and programming Big Data tools and technologies is mandatory
- Experience on Hortonworks distribution is mandatory
- Must have hands on experience on PySpark, Kafka and Spark Streaming for ETL on Big Data Lake
- Must have data architecting and data modeling skills and use of Erwin as data modeling tool.
- Strong UNIX shell script / Python scripting hands-on experience
- Knowledge on developer productivity tools and other productivity management tools is preferable
- Experience in Agile methodology is a must
- Knowledge of standard methodologies, concepts, best practices, and procedures within Big Data environment
- Bachelor's degree in Engineering - Computer Science, or Information Technology. Master's degree in Finance, Computer Science, or Information Technology a plus
- Exposure to infrastructure as service (IAAS) providers such as: Google Compute Engine, Microsoft Azure or Amazon AWS is a plus
- Self-starter and able to independently implement the solution
- Good problem-solving techniques and communication
- Strong Big Data Architect with hands-on data modeling on Big Data Lake space (schema based and schema less data model designs and implementations)
- Develop data pipelines using Big Data technologies that leverage value to the customer; understand customer use cases and workflows and translate them into engineering deliverables
- Actively participate in scrum calls, story points, estimates and own the development piece
- Analyze the user stories, understand the requirements and develop the code as per the design
- Develop test cases, perform unit testing and integrating testing
- Support QA Testing, UAT and production deployment
- Develop batch and real-time data load jobs from a broad variety of data sources into Hadoop
- Design ETL jobs to read data from Hadoop and pass to variety of consumers / downstream applications
- Perform analysis of vast data stores and uncover insights
- Analyze the long running queries and jobs, performance tune them by using query optimization techniques and Spark code optimization
Source : Georgia IT Inc.