Lead Monitoring Systems Engineer - Lumen Solutions Group, Inc.
Washington, DC
About the Job
Job Title: Lead Systems Engineer
Location: Washington, DC (100% Remote)
Job Category/Level: Systems/Monitoring/Lead
Reports to: Manager, Systems Monitoring Team, Infrastructure and Production Operations
Background:
The FEP Operations Center is seeking a Lead Systems Engineer to support multiple Systems Monitoring initiatives for upcoming projects in 2024 and beyond. The role involves overseeing software tool administration for systems and applications monitoring, with a focus on tools like DataDog.
Responsibilities:
-
Monitoring Tool Administration:
Administer and manage DataDog on the Linux platform, with responsibilities that include the instrumentation of Java-based applications on Tomcat Application Servers, and configuration for infrastructure, network, and centralized logging. -
Centralized Logging Configuration:
Configure centralized logging for logs from WebSphere, Tomcat, and other web servers on AIX platforms, routing logs using F5 Load Balancers. Handle various log formats across systems. -
Automation and Scripting:
Automate tasks using scripting languages such as Python, Shell, and ANSIBLE. Additionally, write Selenium scripts to monitor business transactions using CloudBeat's Synthetic Monitoring tool. -
Dashboard Creation and Monitoring:
Create data visualization dashboards and configure alerts within DataDog to provide comprehensive real-time monitoring for applications, networks, and servers. -
Support and Troubleshooting:
Provide support during major production incidents by gathering and analyzing data from multiple sources, troubleshooting issues, and recommending solutions. Communicate findings and next steps through detailed reports. -
Collaboration and Training:
Collaborate with Systems and Application Architecture teams to ensure monitoring requirements are met in early project stages. Provide documentation and training on monitoring tools and processes, facilitating effective tool use across the organization. -
End User Monitoring:
Implement end-user monitoring and real user monitoring (RUM) for applications using JavaScript injection within DataDog. Ensure health checks and rules are optimized for application performance.
Competencies:
- Organizational Skills: Proven ability to organize, prioritize, and manage tasks under tight deadlines.
- Technical Proficiency: Strong technical expertise and hands-on experience with various platforms and monitoring tools.
- Problem-Solving Initiative: Ability to proactively identify issues and develop innovative solutions.
- Adaptability and Learning: Self-motivated and adaptable to change, with a passion for continuous learning and improvement.
- Communication Skills: Strong interpersonal and analytical communication skills.
Required Skills:
-
IT Experience: 5-8 years of experience across distributed technology environments, with expertise in platforms such as Microsoft systems (Windows Server, Active Directory), Linux/Unix, VMware, SQL Server, and networking technologies.
-
Monitoring Tool Expertise:
At least 3 years of experience managing monitoring tools like DataDog or similar tools (ELK Stack), including integration, configuration, and administration. -
Scripting and Automation:
Experience with Python, Shell, Selenium, and VuGen scripts for automation and monitoring optimization. -
SSL and Encryption:
Knowledge of SSL setup, including certificate management and encryption, on Linux platforms. -
Systems Monitoring Strategy:
Demonstrated experience in developing and implementing monitoring strategies for large-scale environments. -
Dashboard and Reporting Skills:
Strong ability to configure monitoring dashboards, alerts, and reports, with experience in service level management (SLAs, SLRs). -
Troubleshooting Expertise:
Proficiency in troubleshooting systems and Java applications using tools like DataDog. -
Agile/SDLC Understanding:
Familiarity with both waterfall and agile methodologies, including SAFe agile frameworks.
Preferred Qualifications:
-
Certifications:
- ITIL Foundations v3 (within 180 days of hiring preferred).
- SAFe Certification preferred.
-
Education:
Bachelor’s degree in Computer Science, Engineering, or related fields (or equivalent experience).