Site Reliability Engineer - Talent Space
Coral Gables, FL 33146
About the Job
Talent Space, Inc. is seeking a Site Reliability Engineer for a remote full time opportunity!
Responsible for ensuring the stability, reliability, and scalability of our production systems. Design and implement solutions that improve system performance, reduce downtime, and automate repetitive tasks. Combining systems engineering and operations engineering, you'll enhance operational processes, monitoring systems, and tooling to provide a seamless experience for our customers. Ideal background is
system administration, and network management.
- Monitor all systems and infrastructure for the highest level of availability. Proactively identify and resolve incidents before they impact operations. Perform routine maintenance tasks, including monitoring, patching, and backups.
- Respond to incidents and outages in a timely and effective manner. Collaborate with other teams to diagnose and resolve complex issues.Document incident details and implement corrective actions to prevent recurrence. Document processes, configurations, and troubleshooting procedures.
- Diagnose and resolve application performance problems or system outages. Play the role of Incident Manager during outages.
- Resolve complex hardware and software issues, and work with vendors when necessary.
- Optimize system performance and resource utilization on-prem and in the cloud.
- Develop and maintain automation scripts to streamline repetitive tasks. Utilize scripting languages (e.g., PowerShell, Python, etc.) to automate system administration.
- Implement configuration management tools to ensure consistency and repeatability.
- Create and maintain comprehensive documentation of IT processes and procedures.
- Strong understanding of IT infrastructure components, including servers, networks, and storage.
- Knowledge in scripting languages (e.g., PowerShell, Python).
- Knowledge of networking concepts and protocols (e.g., TCP/IP, DNS, DHCP).
- Experience with IT service management frameworks.
- Experience with cloud platforms such as AWS and Azure.
- Experience of virtualization technologies such as Azure VDI, AWS Workspaces.
- Experience with monitoring and alerting tools (e.g., New Relic, Datadog).
- Excellent problem-solving and analytical skills.
- Strong communication and interpersonal skills.
- Extensive expertise in the Windows operating system.
Source : Talent Space