Site Reliability Engineer - Datum Software, Inc
Atlanta, GA 30354
About the Job
Site Reliability Engineer
Long Term Contract
Atlanta, GA
Qualifications:
Responsibilities:
"All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.”
Long Term Contract
Atlanta, GA
Qualifications:
- Manage and optimize data streaming and API components in OpenShift (On-Premises) and AWS.
- Proactively review APIs and processes to enhance response times across application components.
- Automate testing (data quality checks), production delivery, and deployment processes.
- Develop integrations between On-Premises applications, AWS, and third-party tools (ServiceNow, VersionOne, Sumo).
- Collaborate with teams to create Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
- Monitor and troubleshoot performance issues, conducting root cause analyses and documenting findings.
- Evolve cloud infrastructure by experimenting with emerging technologies and developing prototypes.
- Design and implement CI/CD pipelines for deploying APIs and data processing jobs.
- Configure monitoring and alerting metrics to support proactive issue validation and resolution.
- Ensure data integrity and access control using AWS security tools (HSM, IAM, etc.).
- Monitor AWS billing, generate cost reports, and implement cost optimization strategies.
- Collaborate with enterprise security architects to design and implement data security measures.
- Regularly analyze platform capacity and performance, designing elastic infrastructure to handle traffic spikes.
- Develop backup strategies and solutions for critical data and application components.
- Provide input on continuous improvement of design, performance, and security in collaboration with architecture, infrastructure, and application teams.
- Deep understanding of AWS cloud platform operations.
- Proficient in automation, scripting, and monitoring tools (OpenShift, CloudFormation, Terraform, Ansible, Shell, Python).
- Strong technical knowledge of infrastructure layers (Linux OS, virtualization, networking, API tools, monitoring tools).
- Experience in end-to-end operations of enterprise systems and applications.
- Automation and operational improvement experience with CI/CD tools (GitLab, GitHub, Jenkins, Maven, Gradle, Nexus).
- Working experience with Software Release Management.
- BS degree in Computer Science or related field, or equivalent practical experience.
- 3+ years of DevOps/SysOps engineering experience, focusing on major cloud platforms (AWS preferred).
- 2+ years of application development experience, including data streaming and high-availability deployment/monitoring.
- 1+ year in a Site Reliability Engineering role preferred.
- A total of 4-6 years of relevant experience.
Responsibilities:
- Establish yourself as a technical leader by engaging with a wide range of industry-leading technologies to drive innovation.
- Contributing expert design and development capabilities to an expanding set of services and features within the ecosystem.
- Supporting highly available, business-critical applications and serving as the escalation point for complex, hard-to-define issues in both on-premises and AWS environments.
- Leveraging DevOps technologies, automation, infrastructure orchestration, and configuration management to troubleshoot and resolve complex issues creatively.
"All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.”
Source : Datum Software, Inc