Operations Tech Lead/Engineer - Whitehouse Staion, NJ or Jersey City, NJ - Georgia Tek Systems
Whitehouse staion or Jersey City, NJ
About the Job
Operations Tech Lead/Engineer
Location: Whitehouse Station, NJ or Jersey City, NJ - Hybrid
Duration: 6 months CTH
Job description:
The Advanced Engineering department has a need for a Platforms Operations Tech Lead who will work closely with our strategic partners and will provide support for all the department's platforms including Client.IO and some legacy platforms like App Works, IBM BPM, and others. As these platforms are being consolidated across IT areas, dedicated focus is needed to ensure the highest levels of service to all application areas that utilize them.
Responsibilities
Location: Whitehouse Station, NJ or Jersey City, NJ - Hybrid
Duration: 6 months CTH
Job description:
The Advanced Engineering department has a need for a Platforms Operations Tech Lead who will work closely with our strategic partners and will provide support for all the department's platforms including Client.IO and some legacy platforms like App Works, IBM BPM, and others. As these platforms are being consolidated across IT areas, dedicated focus is needed to ensure the highest levels of service to all application areas that utilize them.
Responsibilities
- Act as Operations Lead with our strategic partners that will be providing support for our platforms
- Engage with Engineering and Architecture with regards to platform usage to determine appropriate support-level activities per platform
- Engage with Delivery teams on specific monitoring and alerts needed for pre-production and post-production support and work closely with the APM team to ensure proper monitoring is in place and current for each new initiative
- Establish dashboards for platform operational health and specific dashboards for critical Tier 1 application/platform users
- Provide ongoing monitoring of all platforms and establish alerts to the L1/L2 support team where needed
- Experience working with infrastructure on platform configuration, optimization, and troubleshooting
- Monitor system performance, and research solutions to any potential bottlenecks
- Provide expert technical insight for Sev 1/2 infrastructure (unplanned outages) and application (job performance or stability) issues
- Troubleshoot and support Sev 3 application (coding, design) issues that gets escalated
- Assist the application teams with any design, code refactoring, and performance analysis requests
- Partnering with agile development and application teams to influence application modernization and migration to cloud platforms
- Collaborate with architects, engineering, client managers, project managers, applications and infrastructure teams to plan and coordinate changes
- When incident contact procedures do not work, act as primary escalation for problems and incidents
- Accountable for root cause analysis for Severity 1, Severity 2 and chronic recurring incidents
- Create and follow processes to implement planned changes to production and non-production systems
- Perform incident response and break-fix triage for serviceability issues
- Partner with the business to fulfill requests for service
- Execute and improve vulnerability and resiliency management programs including patch management, infrastructure testing and other proactive maintenance tasks
- Identify and document opportunities and solutions that enhance service delivery efficiency and improve the customer experience
- Be available when scheduled (rotational with others on team) to provide on-call support after hours and on weekends
- Provide thought leadership to promote the continuous improvement of service delivery and operational practices such as:
- Propose, design, and implement enhancements to current environment
- Develop actionable plans to improve procedures and systems to mitigate risk, ensure compliance with established industry rules, regulations and best practices
- Assist in developing engineering and operational service metrics
- Create and maintain system run-books; documenting day-to-day support, maintenance, and troubleshooting knowledgebase of the infrastructure
- Conduct peer review analysis and acceptance for new or modified processes to ensure sustainability and repeatability
- Analyze performance metrics to ensure timely and accurate delivery of services
- Perform implementation, maintenance, troubleshooting, and remediation services for supported platforms
- Collaborate with application owners to install, configure and deliver third party product installations
- Negotiate dependencies and priorities with stakeholders and internal customers
- Provide subject matter expertise in cross-functional strategic and tactical efforts
- Partner with key vendors to escalate and remediate business-impacting issues
- Provide leadership and mentoring to emerging talent on the team
- Analyze and execute configuration and audit strategies to ensure compliance with policies, standards, and security direction
- Experience in troubleshooting technology problems and active participation in Sev1/Sev2 calls
- Minimum 7 to 10 years' experience in technology engineering or operations with expertise in working with technology platforms
- Understanding of security and risk as it relates to patch and configuration management, logging, alerting and monitoring
- Extensive troubleshooting, triage, root cause analysis and performance monitoring skills.
- Experience with ITIL and/or other similar IT best practice frameworks
- Experience with LEAN methodologies to drive problem resolution and service improvements
- Experience in troubleshooting technology problems and active participation in Sev1/Sev2 calls
- Experience in usage of the following tools/technologies:
- Dynatrace, AppDynamics, OpenSearch, Azure Portal
- Experience working with vendors as key strategic partners seeking to maximize their experience and knowledge based on our ongoing and growing needs.
- Experience with Agile and/or Kanban methodologies
- Background working in organizations that provide 24x7x365 support
- Demonstrated ability to achieve successful outcomes in difficult situations and work with application teams, business customers, and various levels of management
- Must be able to communicate effectively with technical and non-technical audiences.
- Must be a self-starter with the ability to work independently and in a collaborative team environment.
- Java programming experience is a huge plus as Client.IO will be largely Java-based.
- Ability to learn and apply new technologies to solve critical business problems.
- Experience in technology operations with regards to monitoring and support.
- Experience working with infrastructure on platform configuration, optimization, and troubleshooting.
- Insurance Domain experience is a plus
Source : Georgia Tek Systems