Site Reliability Engineer - Software Guidance & Assistance
Newton, MA
About the Job
Software Guidance & Assistance, Inc., (SGA), is searching for a Site Reliability Engineer for a contract assignment with one of our premier SaaS clients in Newton, MA or Fully Remote.
Responsibilities :
SGA is an Equal Opportunity Employer and does not discriminate on the basis of Race, Color, Sex, Sexual Orientation, Gender Identity, Religion, National Origin, Disability, Veteran Status, Age, Marital Status, Pregnancy, Genetic Information, or Other Legally Protected Status. We are committed to providing access, equal opportunity, and reasonable accommodation for individuals with disabilities in employment, and our services, programs, and activities. Please visit our company EEO page to request an accommodation or assistance regarding our policy.
Responsibilities :
- System Monitoring and Incident Response: Monitor system health, performance metrics, and availability. Respond promptly to incidents and outages, ensuring minimal downtime.
- Infrastructure Management: Manage and optimize both cloud and on-premise infrastructure using Infrastructure as Code (IaC) tools.
- Automation: Develop and maintain automation scripts and tools to enhance operational efficiency and reduce manual tasks.
- Collaboration: Work closely with development teams to implement CI/CD practices and improve deployment processes.
- Capacity Planning: Analyze usage patterns and forecast capacity needs to ensure system scalability and reliability.
- Documentation: Create and maintain comprehensive documentation for systems, processes, and incident response protocols.
- Security Best Practices: Implement and enforce security measures to protect infrastructure and data.
- Post-Incident Reviews: Conduct post-mortems on incidents to identify root causes and implement corrective actions.
- The ideal candidate will have a strong technical background, excellent problem-solving skills, and a passion for enhancing system reliability and performance. You will play a crucial role in monitoring, automating, and optimizing our infrastructure to ensure the seamless operation of our services.
- 1-4 years of experience in Site Reliability Engineering or a similar role.
- Strong knowledge of Linux/Unix systems and proficiency in scripting languages (e.g., Python, Bash).
- Familiarity with cloud platforms (e.g., AWS) and their services.
- Experience with container orchestration (e.g., Kubernetes, Docker).
- Proficiency in using monitoring and alerting tools (e.g., Prometheus, Grafana, Nagios).
- Experience with version control systems (e.g., Git).
- Strong troubleshooting skills with the ability to diagnose complex system issues.
- Excellent verbal and written communication skills for collaboration with cross-functional teams.
- Understanding of Agile development practices and methodologies.
SGA is an Equal Opportunity Employer and does not discriminate on the basis of Race, Color, Sex, Sexual Orientation, Gender Identity, Religion, National Origin, Disability, Veteran Status, Age, Marital Status, Pregnancy, Genetic Information, or Other Legally Protected Status. We are committed to providing access, equal opportunity, and reasonable accommodation for individuals with disabilities in employment, and our services, programs, and activities. Please visit our company EEO page to request an accommodation or assistance regarding our policy.
Source : Software Guidance & Assistance