Staff Data Center Engineer / Linux Administration - Sirius XM
Ashburn, VA 20147
About the Job
Who We Are:
SiriusXM and its brands (Pandora, SiriusXM Media, AdsWizz, Simplecast, and SiriusXM Connect) are leading a new era of audio entertainment and services by delivering the most compelling subscription and ad-supported audio entertainment experience for listeners -- in the car, at home, and anywhere on the go with connected devices. Our vision is to shape the future of audio, where everyone can be effortlessly connected to the voices, stories and music they love wherever they are.
This is the place where a diverse group of emerging talent and legends alike come to share authentic and purposeful songs, stories, sounds and insights through some of the best programming and technology in the world. Our critically-acclaimed, industry-leading audio entertainment encompasses music, sports, comedy, news, talk, live events, and podcasting. No matter their individual role, each of our employees plays a vital part in bringing SiriusXM’s vision to life every day.
SiriusXM is the leading audio entertainment company in North America, and the premier programmer and platform for subscription and digital advertising-supported audio products. SiriusXM’s platforms collectively reach approximately 150 million listeners, the largest digital audio audience across paid and free tiers in North America, and deliver music, sports, talk, news, comedy, entertainment and podcasts. Pandora, a subsidiary of SiriusXM, is the largest ad-supported audio entertainment streaming service in the U.S. SiriusXM's subsidiaries Simplecast and AdsWizz make it a leader in podcast hosting, production, distribution, analytics and monetization. The Company’s advertising sales organization, which operates as SiriusXM Media, leverages its scale, cross-platform sales organization and ad tech capabilities to deliver results for audio creators and advertisers. SiriusXM, through SiriusXM Canada Holdings, Inc., also offers satellite radio and audio entertainment in Canada. In addition to its audio entertainment businesses, SiriusXM offers connected vehicle services to automakers.
Pandora, a subsidiary of SiriusXM, is the largest ad-supported audio entertainment streaming service in the U.S. Pandora provides consumers with a uniquely-personalized music and podcast listening experience with its proprietary Music Genome Project® and Podcast Genome Project® technology. Pandora is available through its mobile app, the web, and integrations with more than 2,000 connected products.
How you’ll make an impact
We have a terrific opportunity in our Systems Engineering team for an intelligent and motivated Staff Datacenter Engineer who is enthusiastic about datacenter deployment and site reliability for large-scale consumer online services. As a member of our Data Center Engineering and Operations team, you will be responsible for the day-to-day operations of our co-location data centers and assume a critical role in maintaining the overall uptime, performance, and capacity of the SiriusXM+Pandora service. You will be able to bring your solid experience to bear in supporting the various services we manage and take on interesting and mission-critical projects as part of a fast-paced, highly collaborative team. We hold ourselves to high standards and take pride in our work.
What you’ll do:
- Plan and facilitate datacenter expansions and build-outs for new and existing footprints.
- Responsible for monitoring datacenter power consumption and environmentals with existing footprints.
- Lead a team of datacenter operations engineers in accomplishing various day-to-day tasks.
- Facilitate weekly planning meetings with the datacenter team.
- Collaborate with internal teams to define project requirements for hardware deployments.
- Collaborate with our sysad team for keeping our PXE infrastructure up to date and help with creating new boot methods within the environment.
- Create Debian live images for the Data Center Engineering team to use for troubleshooting, disk wiping, performance testing, etc.
- Create zero touch provisioning methods and processes for streamlining hardware deployments in python.
- Review current automated processes, update as needed, and look for opportunities for greater efficiencies.
- Determine if automated processes can be containerized and if so, develop a migration plan, refactor, and deploy the process to our internal private cloud.
- Maintain existing monitoring processes and implement new functionality/metrics.
- Create (or update) documentation on all automated processes and monitoring infrastructure
- Plan, schedule and perform upgrades/maintenance on infrastructure hardware.
- Manage vendor relations with manufacturers and VARs.
- Develop methodologies for hardware stress testing, performance reports, and ways to compare various architectures and configurations
- Hands-on with datacenter infrastructure provisioning and server/network equipment deployments.
- Rack/Cable/Provision a large inventory of servers, switches, PDUs and consoles alongside a team of engineers.
- Perform initial configuration of systems as defined by our standard operating procedures. (BIOS configuration, PXE OS installs, DNS updates etc.)
- Diagnose complex technical problems, provide detailed analysis/root cause as well as remediation/mitigation recommendations.
- Plan and assist with hardware life-cycle management from provisioning to retiring and decommission.
- Manage RMA processes with various vendors.
- Maintain an up-to-date inventory list of all hardware equipment across our datacenters.
- Implement best-practice methodology for maintaining a datacenter environment.
- Document and track all assigned datacenter related issues and tasks via our internal ticketing system in a timely fashion.
What you’ll need
- BA/BS Information Technology, Computer Science or a related field. (Or equivalent experience)
- IT Certifications such as RHCE or similar are a plus.
- Minimum 8 years of combined data center and Linux administration related experience with at least 4 years of day-to-day hands-on experience in an enterprise scale datacenter environment.
- Self-motivated, continuous learner, appreciates challenge, comfortable and effective working in new areas that require experimentation and rapid problem solving.
- Excellent time management skills, with the ability to prioritize and multitask, and work under shifting deadlines in a fast-paced environment.
- Strong understanding of x86 server hardware architecture and subsystems as it relates to configuration, triage, and certification in a large-scale server environment.
- Knowledgeable in datacenter best practices including but not limited to cabling, power balancing, cooling and airflow optimization, inventory tracking, capacity planning and host/service diversity.
- Strong interpersonal skills with the ability to lead as well as work in a team environment.
- Meticulous attention to detail and strong organization skills.
- Past experience as a team lead or as a people manager is a plus.
- Mentoring datacenter engineers.
- Take pride in keeping a clean and tidy work environment within the datacenter co-location.
- Ability to lift and carry equipment up to 75 pounds safely and reliably on a regular basis.
- Excellent written and verbal communication skills.
- Participate in a 24x7x365 on-call rotation.
- Up to 15% travel
- Must have legal right to work in the U.S.
Technical Skills:
- Demonstrated proficiency in monitoring stacks such as Prometheus, Alertmanager, and Grafana.
- Hands-on experience with PXE boot, UEFI, AMI BIOS distributions, BMC/iDRAC implementation.
- Experience creating and executing Ansible playbooks.
- Experience with docker containers
- Basic understanding of Hashicorp Nomad/Consul/Vault
- Practical professional knowledge of Linux and full network stack from NIC firmware to TCP/IP.
- Expertise with SAN and NAS arrays such as Netapp, Isilon, Pure Storage, and Brocade.
- Familiarity with Bitbucket and Git.
- Familiarity with performance testing and reporting tools, such as Phoronix, FIO, Stream and others.
- Experience with ISC DHCP and BIND DNS operations.
- Intermediate scripting skills in Python and familiarity with OOP concepts.
- Significant knowledge of Linux kernel drivers, kernel tuning, and debugging hardware compatibility issues.
- Basic understanding of subnetting, DHCP Relays, network load balancing, and ARP.
- Working knowledge of package management tools such as APT and RPM.
Our goal at SiriusXM is to provide and maintain a work environment that fosters mutual respect, professionalism and cooperation. SiriusXM is an equal opportunity employer that does not discriminate on the basis of actual or perceived race, creed, color, religion, national origin, ancestry, alienage or citizenship status, age, disability or handicap, sex, gender identity, marital status, familial status, veteran status, sexual orientation or any other characteristic protected by applicable federal, state or local laws.
The requirements and duties described above may be modified or waived by the Company in its sole discretion without notice.
#LI-AT1