SRE Engineer - Diverse Linx
Chicago, IL
About the Job
Skills: : DevOps, Spring Boot, Kubernetes, Site Reliability Engineering (SRE)
Job Description:
SRE Engineer with real interest and experience in troubleshooting Linux systems, networking, monitoring, Databases, containers/Kubernetes, cloud technologies etc and a proven interest and experience in using software engineering to solve operational problems.
comfortable writing code to automate API-driven tasks at scale.
Python preferred. Architect and implement automations to auto-remediate/self-heal issues in production.
participate in SRE software engineering, writing code for the continuing reduction of human intervention in operational tasks and automation of processes.
Monitor the application ecosystem, jumping on bridges and resolving the issues.
Having a good understanding of core DevOps and SRE practices and technologies.
Be ready to participate in 24x365 on-call schedules and close it within 30 Minutes.
Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
Skills & Qualifications Overall 10+ years of experience with DevOps and SRE practices, technologies, and industry standards to make production reliable and resilient.
Having experience of core DevOps and SRE technologies like:
chaos engineering Ansible Docker Kubernetes, Helm Jenkins Terraform IaaC via Terraform Prometheus, Grafana ELK stack Azure Cloud Stack Azure DevOps
Expert Hands-on experience with provisioning and deploying infrastructures in Azure Public Cloud in a large scale enterprise environment with mission critical applications
Expert Hands-on experience using Azure DevOps stack to build automated CI/CD pipelines for deploying applications and infrastructure
Very Good understanding of application logs and Kubernetes events, application, and infrastructure metrics (Prometheus/Grafana/FluentD).
You have experience in troubleshooting and understand the challenges of deploying applications in distributed systems and running them at scale
Experience with Azure Public Cloud required. Experience with like AWS, GCP, OCI etc is a great plus. Experience of working with applications in Financial Services Industry is also a plus.
Good understanding of Linux systems and Bash scripting.
You have a passion for collaborating cross-functionally & cross-product on outage bridges to resolve issues within 30 Minutes and own the RCA for bridges.
Review recurring incidents and identify improvement and automation opportunities and collaboration with product feature development teams.
Knowledge of BMC products like ITSM and ADE would be a great plus.
Willing to mentor and help team members to grow.
Ability to explain technical concepts to business and technology stakeholders.
Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.
Source : Diverse Linx