Data Engineer-Whitehouse Station, NJ - Georgia IT Inc.
Whitehouse Station, NJ
About the Job
Data Engineer
Location: Whitehouse Station, NJ
Duration: 12 Months
Visa: GC, or USC
Ideal candidate for this role is someone with a strong background in computer programming, statistics, and data science who is eager to tackle problems with large, complex datasets using the latest Python, R, and/or PySpark. You are a self-starter who will take ownership of your projects and deliver high-quality data-driven analytics solutions. You are adept at solving diverse business problems by utilizing a variety of different tools, strategies, algorithms and programming languages.
Specific responsibilities are as follows:
Skills Required
Location: Whitehouse Station, NJ
Duration: 12 Months
Visa: GC, or USC
Ideal candidate for this role is someone with a strong background in computer programming, statistics, and data science who is eager to tackle problems with large, complex datasets using the latest Python, R, and/or PySpark. You are a self-starter who will take ownership of your projects and deliver high-quality data-driven analytics solutions. You are adept at solving diverse business problems by utilizing a variety of different tools, strategies, algorithms and programming languages.
Specific responsibilities are as follows:
- Utilize the data engineering skills within and outside of the developing Client information ecosystem for discovery, analytics and data management
- Work with data science team to deploy Machine Learning Models
- You will be using Data wrangling techniques converting one "raw " form into another including data visualization, data aggregation, training a statistical model etc.
- Work with various relational and non-relational data sources with the target being Azure based SQL Data Warehouse & Cosmos DB repositories
- Clean, unify and organize messy and complex data sets for easy access and analysis
- Create different levels of abstractions of data depending on analytics needs
- Hands on data preparation activities using the Azure technology stack especially Azure Databricks is highly desired
- Implement discovery solutions for high speed data ingestion
- Work closely with the Data Science team to perform complex analytics and data preparation tasks
- Work with the Sr. Data Engineers on the team to develop APIs
- Sourcing data from multiple applications, profiling, cleansing and conforming to create master data sets for analytics use
- Utilize state of the art methods for data manning especially unstructured data
- Experience with Complex Data Parsing (Big Data Parser) and Natural Language Processing (NLP) Transforms on Azure a plus
- Design solutions for managing highly complex business rules within the Azure ecosystem
- Performance tune data loads
Skills Required
- Mid to advanced level knowledge of Python and Pyspark is an absolute must
- Knowledge of Azure, Hadooop 2.0 ecosystems, HDFS, MapReduce, Hive, Pig, sqoop, Mahout, Spark etc. a must
- Experience with Web Scraping frameworks (Scrapy or Beautiful Soup or similar)
- Extensive experience working with Data APIs (Working with RESTful endpoints and/or SOAP)
- Significant programming experience (with above technologies as well as Java, R and Python on Linux) a must
- Knowledge of any commercial distribution like HortonWorks, Cloudera, MapR etc. a must
- Excellent working knowledge of relational databases, MySQL, Oracle etc.
- Experience with Complex Data Parsing (Big Data Parser) a must. Should have worked on XML, JSON and other custom Complex Data Parsing formats
- Natural Language Processing (NLP) skills with experience in Apache Solr, Python a plus
- Knowledge of High-Speed Data Ingestion, Real-Time Data Collection and Streaming is a plus
- Bachelors in Computer Science or related educational background
- 3-5 years of solid experience in Big Data technologies a must
- Microsoft Azure certifications a huge plus
- Data visualization tool experience a plus
Source : Georgia IT Inc.