Site Reliability Engineer at InfoVision
Atlanta, GA
About the Job
Job Title: Systems SRE (GPC-Cloud)
Location: Atlanta, GA
Duration: Long-term
Work Mode: Hybrid
Main Skills: GCP Cloud, Terraform, Kubernetes, CI/CD Pipelines and deployments, Docker/Containerization
Technical knowledge :
- 10+ Year of overall IT Experience
1. Google Cloud Platform (Any Cloud Platform would be ok preferable GCP)
2. Strong Terraform Knowledge
3. Understanding of micro service architecture, Infrastructure, Network
Responsibilities:
Monitoring: Application and Infrastructure Monitoring
Automating: Automating the deployment process and automating toil-reducing automation
Improving: Improving the software development lifecycle by holding post-incident reviews and documenting all software problems and solutions in a shared knowledge base
Developing and maintaining: Developing and maintaining the system and its services, ensuring system scaling, and identifying and implementing preventive measures
Collaborating: Collaborating with key stakeholders
Reducing toil: Reducing the amount of repetitive work a team must do
On-call duties: Diagnosing, mitigating, fixing, or escalating incidents as needed, and being regularly responsible for nonurgent production duties