System Monitoring and IT Observability Engineer - SURESTAFF
Memphis, TN 38105
About the Job
Seasoned professional with extensive technical experience to Monitor, Support, and enhance system monitoring, reporting activities, and IT observability for enterprise computing systems. Responsible for configuring sophisticated monitoring tools, monitoring the performance of critical hosts and services, and developing automation scripts for process optimization where applicable. Additionally, the role will include implementing and managing observability practices to ensure comprehensive visibility into system performance and health.
Note: This is NOT a REMOTE position. ESSENTIAL JOB FUNCTIONS
- Monitor enterprise computing systems, networks, and applications in production and quality assurance environments with a wealth of technical expertise.
- Conduct rigorous application and infrastructure availability checks, delivering comprehensive status reports.
- Configure advanced monitoring tools, including the establishment of notifications for alerts, tickets, and logs.
- Implement and manage observability practices, including metrics, logs, and traces, to provide comprehensive visibility into system performance and health. Leverage scripting and software tools to automate processes, showcasing high technical proficiency.
- Participate in Incident Reports (IR) methodically, from opening to updating and closing, adhering to established standards.
- Contribute actively to the investigation and resolution of problems causing incidents within the scope of responsibilities.
- Maintain meticulous documentation for monitoring systems, including standard operating procedures (SOP).
- Execute the documented handover process effectively, ensuring a smooth transition for the next monitoring shift by highlighting priority items and issues.
- Five (5) + years of experience in system monitoring, reporting activities, and IT observability, showcasing technical proficiency.
- In-depth knowledge acquired through a bachelor’s degree in business, Computer Information Systems, Engineering, or a related field, coupled with significant years of hands-on experience in computer systems.
- Proven ability to decipher requirements from business users and communicate effectively through various channels, including verbal, written, and face-to-face interactions.
- Proven ability with application and Infrastructure monitoring platform administration.
- Solid understanding of fundamental software and hardware concepts, coupled with expertise in tools such as ScienceLogic, Service Now, or other monitoring platforms, in addition to Word, Excel, Visio, and others for meticulous documentation.
- Demonstrated ability to excel both independently and collaboratively within a team environment, contributing decisively to decision-making processes. Strong meeting facilitation skills and the ability to cultivate relationships with the user community and IT personnel through robust interpersonal skills. Establish achievable Service-Level Objectives (SLOs) by tying health and performance to business value and digital experience.
- Prioritize support for open standards instrumentation, collection, and processing, with a focus on metrics ingestion, trace correlation, and logging standards. Collect system and application logs and centralize the storage of log data for analysis and transformation.
- Embed a log collection agent during instance configuration to ensure seamless and automatic deployment.
- Enable achievable SLOs by using log analytics to augment, accelerate, and automate incident identification and resolution.
- Establish control over the increasing amount of observability data by using telemetry pipelines and preprocessing telemetry at the edge.
- Firm understanding of ITIL ideology and nomenclature.
Source : SURESTAFF