DevOps / Site Recovery Engineer - PRIMUS Global Services, Inc
Naperville, IL
About the Job
Title:DevOps / Site Recovery Engineer
Exp: 10-12 Yrs
Location: Naperville, IL (Hybrid Onsite 2-3 days a week in office)
Rate: DOE, Range is 70s or 80s hourly on C2C.
Requirements: The Site Reliability Engineer (SRE) will play a pivotal role in ensuring the reliability, performance, and scalability of our applications hosted on Amazon Web Services (AWS). This position requires a strong blend of software engineering, systems administration, and operational expertise to proactively identify and resolve potential issues, automate routine tasks, and optimize our AWS environment.
Key Responsibilities
Infrastructure Management:
Manage and maintain AWS infrastructure, including EC2 instances, S3 buckets, VPCs, and other relevant services.
Implement and optimize cloud-native architectures, leveraging technologies like Kubernetes and Docker.
Ensure compliance with security best practices and industry standards.
Application Deployment and Management:
Collaborate with development teams to automate deployment and configuration processes using tools like CI/CD pipelines.
Monitor application performance and troubleshoot issues related to infrastructure or application code.
Incident Response:
Develop and maintain incident response plans to handle system failures and outages effectively.
Coordinate with relevant teams to identify root causes and implement corrective actions.
Capacity Planning:
Forecast resource requirements and scale infrastructure accordingly to meet demand.
Optimize resource utilization to minimize costs.
Automation:
Develop and implement automation scripts and tools to improve operational efficiency and reduce manual tasks.
Automate routine tasks like backups, patching, and monitoring.
Monitoring and Alerting:
Implement comprehensive monitoring solutions to track system health and performance.
Configure alerts to notify teams of critical issues.
Performance Optimization:
Identify and address performance bottlenecks in applications and infrastructure.
Conduct load testing and capacity planning to ensure optimal performance.
Collaboration:
Work closely with development, operations, and security teams to ensure smooth collaboration and alignment.
Contribute to knowledge sharing and best practices within the organization.
Required Skills and Experience:
Strong understanding of AWS services and architecture.
Proficiency in scripting languages (e.g., Python, Bash).
Experience with configuration management tools (e.g., Ansible, Puppet, Chef).
Knowledge of containerization technologies (e.g., Docker, Kubernetes).
Familiarity with CI/CD pipelines and DevOps practices.
Experience with monitoring and alerting tools (e.g., CloudWatch, Prometheus).
Strong problem-solving and troubleshooting skills.
Excellent communication and collaboration skills.
Desired Skills and Experience
Knowledge of security best practices for cloud environments.
Certifications related to AWS or cloud technologies.