Infrastructure SME (Azure, Linux, Windows) - ONLY LOCALS - American Technology Consulting
Harrisburg, PA
About the Job
POSITION PURPOSE:The incumbent will be a subject matter expert on monitoring tools and processesused by the commonwealth and is responsible for collaborating with technicalspecialists, agency teams, and vendors to implement actionable monitoring andreporting. The position’s responsibilities also include coordinating efforts totransform person-centric processes into structured, repeatable, and highlydocumented automated workflows. Additionally, this position is responsibleforthe management and continuous improvement of key enterprise monitoringprocesses, including changes, incident reporting, and problem resolution. Theincumbent will also be responsible for evaluating, preparing, and implementingtechnical solutions for on-prem and cloud-based applications and technologyresource. The position will develop and maintain standard operatingprocedures(SOPs) and ensure consistent communication strategies to enhanceoperational efficiency and service delivery.DESCRIPTION OF DUTIES:Responsible for functioning as the Technical SME on an enterprise-wide systems.Responsible for implementations of products/services that involve significantCommonwealth oversight.Interpret, process, and report data to create meaningful business andoperational dashboards.Maintain (patch, troubleshoot) existing and future monitoring tools includingSystem Center Operations Manager, SolarWinds,SightLine, and SquaredUp.Identifies improvements to existing processes and tools to achieve high qualityservices/products.Create Azure Monitor resources and Log Analytics queries.Create, document, and maintain on-prem and cloud automations.Create, document, and maintain SOAP/REST/JSON/API calls using PowerShell orother compatible languages.Maintain and troubleshoot monitoring tool connectivity to endpoints.Creates documentation for new processesUpdates documentation for existing processDocuments incidents and problems impacting monitoring services.Collaborate with the enterprise change manager to ensure processes arestandardized and documented workflows are followed.Collaborate with the Enterprise Incident Manager to ensure that standardizedSOPs and processes are consistently applied across incident and problemmanagement.Monitor incident and problem resolution processes to ensure timely and effectiveservice restoration and root cause analysis.Manage and document the operational procedures and responses of NOC teams toservice delivery and incident management.Ensure all processes and workflows are documented in an accessible, organized,and secure manner for future reference.Establish and maintain Standard Operating Procedures (SOPs)for all relevantoperational processes.Emphasize the transition from informal, person-dependent workflows to formal,role-driven processes.Develop and document a process documentation workflow that ensures alloperational procedures are captured and updated regularly.Ensure that consistent and clear communication processes are in place forchanges, incidents, and problem management across the NOC.Create and manage distribution lists for technical and non-technicalstakeholders (ETSO) to ensure relevant parties are informed of NOC updates.Enable self-management of distribution lists via subscription options tostreamline communication across the organization.Work closely with NOC staff to ensure effective communication regarding change,incident, and problem management on behalf of NUTSO.Ensure collaboration between different departments to harmonize efforts inincident, problem, and change communication.Complies with and develops recommendations for executive public and enterprisepolicy objectives as it relates to the delivery of Commonwealth IT services.Utilizes the Service Now Change management tool to input request for changes.Directs the development of policies and procedures consistent with Commonwealthstandards and direction.Participates in Enterprise change management meetings for enterprise levelservice configuration and access changes for all supported locations is notimpacted.Provides on-going data submissions regarding network availability, problemresolution and infrastructure enhancements for use in compilation of themonthly/quarterly customer Service Level Agreement (SLA) reports.Designs agency disaster recovery plans for the network infrastructure andparticipates in periodic plan updates and testing exercises.Reviews technical manuals and other literature, attends seminars,conferences,and training classes to maintain currency with new informationservices,products, and information technology developments in networktechnology.Performs other related duties as assigned, to include those outlined in the CoGPlan when the Plan is activated. Responds to the designated alternate orsecondary location when directed in response to a catastrophic incident.This position is expected to adhere to established organizational servicemanagement processes and procedures.Qualifications5+ Years as SolarWinds admin/deployment experience5+ Years of Ansible admin/deployment experience3+ Years of Experience of Log Analytics Azure experience8+ Years of MS Windows Server admin/deployment experience3+ Years of Linux Server admin/deployment experience5+ Years of PowerShell scripting experience3+ Years of Incident Management Experience
Source : American Technology Consulting