Data Engineer-Whitehouse Station, NJ - Georgia IT Inc.

Whitehouse Station, NJ

About the Job

Data Engineer
Location: Whitehouse Station, NJ
Duration: 12 Months
Visa: GC, or USC

Ideal candidate for this role is someone with a strong background in computer programming, statistics, and data science who is eager to tackle problems with large, complex datasets using the latest Python, R, and/or PySpark. You are a self-starter who will take ownership of your projects and deliver high-quality data-driven analytics solutions. You are adept at solving diverse business problems by utilizing a variety of different tools, strategies, algorithms and programming languages.

Specific responsibilities are as follows:

Utilize the data engineering skills within and outside of the developing Client information ecosystem for discovery, analytics and data management
Work with data science team to deploy Machine Learning Models
You will be using Data wrangling techniques converting one "raw " form into another including data visualization, data aggregation, training a statistical model etc.
Work with various relational and non-relational data sources with the target being Azure based SQL Data Warehouse & Cosmos DB repositories
Clean, unify and organize messy and complex data sets for easy access and analysis
Create different levels of abstractions of data depending on analytics needs
Hands on data preparation activities using the Azure technology stack especially Azure Databricks is highly desired
Implement discovery solutions for high speed data ingestion
Work closely with the Data Science team to perform complex analytics and data preparation tasks
Work with the Sr. Data Engineers on the team to develop APIs
Sourcing data from multiple applications, profiling, cleansing and conforming to create master data sets for analytics use
Utilize state of the art methods for data manning especially unstructured data
Experience with Complex Data Parsing (Big Data Parser) and Natural Language Processing (NLP) Transforms on Azure a plus
Design solutions for managing highly complex business rules within the Azure ecosystem
Performance tune data loads

Skills Required

Mid to advanced level knowledge of Python and Pyspark is an absolute must

Knowledge of Azure, Hadooop 2.0 ecosystems, HDFS, MapReduce, Hive, Pig, sqoop, Mahout, Spark etc. a must
Experience with Web Scraping frameworks (Scrapy or Beautiful Soup or similar)
Extensive experience working with Data APIs (Working with RESTful endpoints and/or SOAP)
Significant programming experience (with above technologies as well as Java, R and Python on Linux) a must
Knowledge of any commercial distribution like HortonWorks, Cloudera, MapR etc. a must
Excellent working knowledge of relational databases, MySQL, Oracle etc.
Experience with Complex Data Parsing (Big Data Parser) a must. Should have worked on XML, JSON and other custom Complex Data Parsing formats
Natural Language Processing (NLP) skills with experience in Apache Solr, Python a plus
Knowledge of High-Speed Data Ingestion, Real-Time Data Collection and Streaming is a plus

Qualifications/Experience

Bachelors in Computer Science or related educational background
3-5 years of solid experience in Big Data technologies a must
Microsoft Azure certifications a huge plus
Data visualization tool experience a plus

Source : Georgia IT Inc.

Data Engineer-Whitehouse Station, NJ - Georgia IT Inc.

Whitehouse Station, NJ

About the Job

Popular Job Categories

Popular Job Titles

Popular Job Locations

Popular Companies