Sr Performance Engineering lead at TCA Consulting Group, Inc.
Hartford, CT 06112
About the Job
Our client is seeking an experienced and highly motivated Performance & Observability Engineer who will be responsible for assessing, strategizing and building solutions for complex application and infrastructure observability needs to ensure production stability and visibility. Position will be responsible for driving performance and stability improvements of critical business applications, the architecture, and integrations to ensure optimal end user experience by working closely with Application development, Infrastructure, Database and middleware teams.
Responsibilities:
Configure, maintain our applications and infrastructure s observability capabilities in partnership with SRE, AIOps and Assess code, conf, or infra changes readiness for production
Monitor and develop SLOs and SLIs through customer user journey; Advise on SLA; Establish error budgets
Strategize, analyze and tune applications for Performance and availability via DevSecOps principles and technologies as needed to continuously (DevOps) validate Load, Stability, Scalability, and Reliability standards of the application are achieved.
Define, Create, and maintain monitoring dashboards. Implement application performance management, log analytics, error analytics, business analytics solutions using APM tools, but not limited to Dynatrace, Splunk, Akamai etc.
Responsible for monitoring application stability trends (cloud/on-Prem) and identifying opportunities to improve performance and/or availability
Automate system scalability and continually work to improve system resiliency, performance and efficiency; Makes recommendations for design changes for improved reliability
Working with Incident/ Problem manager and help drive blameless RCA, postmortems and RCRs by providing engineering and tool expertise
Define, configure, review, assess and use Artificial Intelligence based analytics, and fine-tune it to make a good balance between faster automatic root cause analysis and false positives
Develop next-gen, smart, automated monitoring & alerting solutions in software delivery and production environment. Implement observability and monitoring framework using tools/frameworks like Dynatrace, ServiceNow, Ansible, Splunk, Akamai, Open Telemetry and AWS monitoring capabilities CloudWatch etc
Independently contribute toward finding game changer solutions using cutting-edge technologies and Experience in enabling next-gen alerting, problem-solving, and self-healing solutions for IT assets
Qualifications and Skillset:
Position requires a Bachelor s degree (or foreign equivalent) or Advanced degree in Information Technology, Computer Science Engineering, or a related field plus eight(8+) years of experience in the job offered or in a related occupation
Experienced in Dynatrace, LoadRunner and Splunk for performance analysis, alerting and monitoring solutions implementation, while being proficient in performance testing process and automation
Expertise in cloud-native technologies, including containerization, microservices architecture, and serverless computing
Demonstrate strong analytical, problem solving, debugging, troubleshooting skills and identifying root cause of issues
Proficient in utilizing and supporting APM tools in a large enterprise environment
Proficient with application server instrumentation (Java), Real User Monitoring, Synthetic Monitoring
Experience in the fields of Agile, DevOps, Site Reliability Engineering
Strong knowledge in Event Management in ServiceNow preferred
Exceptional communication and collaboration skills, enabling effective engagement with stakeholders at all levels of the organization
Ability to coach team members and engineering teams
Certifications preferred:
Dynatrace certified associate
AWS Solution Architect - Associate
Splunk core certified Power user
Splunk core certified Advanced Power user
Secondary AWS certification (optional)