Sr. Systems Reliability Engineer at CareerBuilder Premium Subscription
Burbank, CA
About the Job
Software Resources has an immediate job opportunity for a Sr. Systems Reliability Engineer with a major corporation in Burbank, CA. Hybrid - reporting to work 3 to 4 days per week.
Duration : 12 month(s)
Pay Rate: $90 - $100/hr DOE
Must Haves:
1) Experience with public cloud-AWS, Azure, or Google
2) Experience with Terraform-for script automation
3) GitLab, Ansible or other automation products
Coding language is a plus
- Important that candidate can implement new ideas
- Must have: Cloud/AWS, or Azure.
- Must have the ability to collaborate in a team environment and articulate the work and progress being done.
- Must be willing to handle coverage amongst the team for off hours. It will be a rotating shift as the products used are very specific and unique to our company.
- Must Have Cloud aptitude and knows Cloud well
Description :
Collaborate and provide technical leadership within and across teams
• Code, and deploy systems, define and establish best practices in cloud hosting environments using self-healing,
infrastructure-as-code, security, and automation patterns
• Develop useful telemetry, alerts, and response to identify and address reliability risks
• Participate in on-call rotation with other engineering teams
• Identify, experiment, & evangelize new technologies, ideas, and best practices across the broader engineering
community
• Develop useful telemetry, alerts, and response to identify and address reliability risks
• Participate in on-call rotation with other engineering teams
• Identify, experiment, & evangelize new technologies, ideas, and best practices across the broader engineering
community• Develop useful telemetry, alerts, and response to identify and address reliability risks
• Participate in on-call rotation with other engineering teams
• Identify, experiment, & evangelize new technologies, ideas, and best practices across the broader engineering
community
Basic Qualifications:
Configuration management and orchestration (e.g. Chef, Terraform, Cloud Formation)
• One or more languages in your skillset (e.g. GO, Python, Java, Ruby)
• Containerization (e.g. Docker, Kubernetes, Mesos, Elastic Container Service)
• Skilled in Cloud/PaaS Environments (e.g. AWS, Google Cloud Compute)
• Thorough knowledge of continuous integration tools (e.g. Jenkins)
• UNIX/Linux administration, troubleshooting, performance tuning, & security
• 5 years of experience in technical operations or systems reliability engineering
Preferred Qualifications:
• Equivalent experience in technical operations or software engineering
• Bachelor's degree in computer science or related field preferred
• Minimum 3+ years operating complex, large-scale Enterprise guest-facing Applications or web sites
• Prefer 5+ years operating complex, large-scale Enterprise guest-facing Applications or web sites
• Experienced with Distributed Data Platforms
• Experience with AWS, Google or similar cloud computing environments.
• Experience working in an Agile development environment
• Experience working in a high capacity, highly scalable mission-critical web serving environment
• Excellent judgment, problem resolution, team building, negotiation, and decision-making skills as well as the ability to work under continual deadline pressure
• Experience with F5 load balancing helpful
• UNIX/LINUX and some Windows server experience, including expertise in system installation, configuration, administration, troubleshooting, performance tuning, preventative maintenance, capacity planning, monitoring, and security procedures
• Web (IIS, Apache) and Java application (Tomcat, Jboss, etc) server expertise including installation, administration, configuration, troubleshooting, performance tuning, preventative maintenance, capacity planning, monitoring, and security procedures
Job Duties will include 50% of the time performing the following functions:
• Code, and deploy systems, new technologies, and best practices in the cloud using self-healing, infrastructure-as-code, security, and automation patterns
• Develop useful telemetry, alerts, and response to identify and address reliability risks
Required Education:
Bachelors degree or equivalent work experience