Lead Site Reliability Engineer - Allvue Systems
Coral Gables, FL 33134
About the Job
We are Allvue Systems, the leading provider of software solutions for the Private Capital and Credit markets. Whether a client wants an end-to-end technology suite, or independently focused modules, Allvue helps eliminate the boundaries between systems, information, and people. We’re looking for ambitious, smart, and creative individuals to join our team and help our clients achieve their goals. Working at Allvue Systems means working with pioneers in the fintech industry. Our efforts are powered by innovative thinking and a desire to build adaptable financial software solutions that help our clients achieve even more. With our common goals of growth and innovation, whether you’re collaborating on a cutting-edge project or connecting over shared interests at an office happy hour, the passion is contagious. We want all of our team members to be open, accessible, curious and always learning. As a team, we take initiative, own outcomes, and have passion for what we do. With these pillars at the center of what we do, we strive for continuous improvement, excellent partnership and exceptional results. Come be a part of the team that’s revolutionizing the alternative investment industry. Define your own future with Allvue Systems!
Responsibilities:- Develop and implement strategies for the monitoring and alerting of systems health, performance, and security
- Develop and implement strategies for incident management, problem management, and change management
- Create and maintain automation tools and code for configuration management, deployment, and maintenance of cloud-based infrastructure
- Collaborate with development and operations teams to ensure that application and infrastructure changes are properly tested, deployed, and maintained
- Develop and maintain documentation of system configurations, processes, and procedures.
- Champion an atmosphere of continuous improvement by serving as a coach, mentor, and technical advisor.
- Be a thoughtful technical voice within the team, aiding in diligent architectural decisions and fostering a culture of high-quality code and engineering processes.
- Collaborate with Product and Engineering teams to ensure successful delivery and operation of diverse systems at scale.
- Identify opportunities for improvement in current technology and that of individual systems. Avoid the creation of, quickly identify, and prioritize the remediation of technical debt.
- Strong understanding of DevOps methodologies and SRE best practices.
- Solid understanding of DevOps practices, including CI/CD pipelines, configuration management, and Infrastructure as Code (IaC).
- Proficiency in scripting or programming languages (PowerShell, Python, or similar) for automation and infrastructure management in AWS and Azure, as well as IAC like Terraform and CloudFormation
- Deep understanding of networking, security, and identity and access management (IAM) in cloud environments.
- In-depth knowledge of cloud computing concepts, including expertise in designing cloud-based solutions using IaaS, PaaS, and SaaS models.
- Experience with monitoring, observability and logging tools (Datadog, Splunk, Prometheus, Grafana, etc.).
- Familiarity with cloud architecture patterns, microservices, containers, and serverless computing.
- Proficient in performing in-depth analysis, complex technical troubleshooting, and problem resolution
- Strong time management skills, ability to multi-task and perform well under pressure. Ability to adapt to changing priorities and meeting deadlines.
- Experience working within geographically distributed organizations.
- Professional written and interpersonal skills.
- AWS or Azure certifications (AWS/Azure Solutions Architect, Developer, etc) are a plus, but not required.
- Bachelor’s degree in information systems/technology, Computer Science, a related field in technical engineering, or equivalent experience
- 5-7 years of Platform, Cloud, Software Engineering or relevant technical experience, with a focus on AWS and/or Azure.
- Health Coverage options along with other voluntary benefits
- Enterprise Udemy membership with access to thousands of personal and professional development courses
- 401K with Company match up to 4% or Employee Pension plan
- Competitive pay and year-end bonus potential
- Flexible PTO
- Charitable Donation matching, along with Volunteer and Voting PTO
- Numerous team building activities to promote collaboration in a fun and fast-paced work environment
Allvue Systems provides equal employment opportunities (EEO) for all employees and applicants for employment. We recognize the real value of bringing people together from diverse backgrounds, experiences and perspectives - we don’t just accept difference, we celebrate and support it. We are committed to advancing these efforts through our strategies to hire, promote, create and support a diverse and inclusive environment throughout our workforce and workplace. It is our policy to prohibit discrimination and harassment of any type without regard to race, color, religion, marital status, age, national origin, ancestry, physical or mental disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law. In addition, Allvue will provide reasonable accommodations for qualified individuals with disabilities.
Job Summary:
As Lead Site Reliability Engineer you will help ensure the reliability, availability, and performance of our applications and services for our customers. This role combines software and systems engineering to maintain and deliver robust services. You will work alongside our development, platform, and architecture teams to enhance service delivery, building monitoring systems and deployment automation while working to remove toil from all that we do. You will help manage incidents and conduct post-mortem reviews to facilitate a culture of continuous improvement. The successful candidate will possess strong technical and mentoring skills, with a history of delivering on complex technology initiatives.