Data Centre Operations Lead - Athreya Inc.
ROCKVILLE, MD
About the Job
Data Centre Operations Lead
Rockville, MD – (on site)
6+ months Contract
Job Description:
• Lead the data center operations team, providing guidance, training, and support to ensure high performance and operational excellence. Act as the primary point of contact for all data center-related issues and escalations.
• Oversee the daily operations of data center facilities, ensuring high availability and reliability of all systems.
• Manage data center infrastructure technology stack end to end – VMWare/VxRail/Citrix/Logic Monitor/Moog Soft/AD/Azure AD SSO, Azure Security Policy/PKI/Windows & Linux Servers/Vulnerability management/Beyond Trust Password Safe and AD-Bridge/Storage & Backup tools etc.
• Ensure adherence to operational standards and best practices.
• Drive the major incidents and potential incidents end to end with periodic updates to client stake holders for approvals/recommendations.
• Lead, mentor, and manage a team of data center operation engineers.
• Provide guidance and support for professional development and performance improvement.
• Coordinate and manage the team's daily activities, ensuring alignment with organizational goals and priorities.
• Lead the response to data center incidents, ensuring timely resolution and minimal impact on business operations.
• Perform root cause analysis and implement preventive measures to avoid recurrence of issues.
• Develop and maintain incident management processes and procedures.
• Plan and oversee scheduled maintenance and upgrades of data center infrastructure.
• Ensure that all hardware and software components are up-to-date and functioning optimally.
• Coordinate with vendors and service providers for maintenance and support activities.
• Monitor and analyze data center resource usage, ensuring efficient utilization and avoiding over-provisioning.
• Conduct capacity planning to support future growth and demand.
• Implement optimization strategies to enhance performance and reduce operational costs.
• Ensure data center infrastructure adheres to security policies, standards, and best practices.
• Implement and maintain security controls to protect data and systems.
• Ensure compliance with regulatory requirements and industry standards (e.g., ISO 27001, HIPAA).
• Develop and implement disaster recovery and business continuity plans for data center operations.
• Ensure regular testing and validation of disaster recovery procedures.
• Ensure data center infrastructure is resilient and can recover quickly from failures or disruptions.
• Work closely with other IT teams, business units, and stakeholders to understand requirements and deliver solutions that meet their needs.
• Collaborate with vendors and service providers to evaluate and integrate new technologies and services.
• Communicate effectively with stakeholders, providing regular updates on data center operations and performance.
• Maintain comprehensive documentation of data center infrastructure, configurations, processes, and procedures.
• Generate regular reports on data center performance, incidents, and operational metrics.
• Ensure documentation is up-to-date and accessible to relevant stakeholders.
Here are some technical responsibilities in detail.
Active Directory and Cloud Services
• Administer Azure AD, manage security groups, GPO, SSO, and application configurations.
• Handle public cloud directory services, Oracle IDCS, network/file shares, SCP policies, privileged user management, and service account passwords.
• Conduct AD audits, schema updates, backup/restore services, and assist with JSOX, FDA, and GQS audits.
• Manage ticket queues and follow up on aging tickets.
• End-to-end support for Active Directory Domains (Azure AD, AD security groups, GPO, SSO, application configurations, etc.
IT Environment Monitoring
• 24x7 ITSM queue-based monitoring.
• Triage and first-level troubleshooting based on alert severity.
• Incident resolution using Standard Operating Procedures.
Vendor Coordination
• Coordinate with vendors for infrastructure on public/private Cloud.
• Provide vendor contact details and escalation matrix.
Citrix Architecture and Optimization
• Maintain Citrix architecture and seek continuous optimization.
• Participate in architecture design and planning with the steering committee.
• Recommend system and end-user performance improvements.
• Implement approved performance improvements.
Citrix Environment Support
• Support Citrix environment and integrate with Otsuka-specific technologies.
• Order, install, update, and maintain Citrix servers and tools.
• Assess, consolidate, upgrade, and manage Citrix infrastructure, including SDX appliances.
• Manage NetScaler infrastructure and upgrades.
.
IT Service Continuity and Disaster Recovery (DR) Services
• Strategy and Policy Definition
• Coordination and Execution
• Data Management
• Testing and Reporting
• DR Activation and Coordination
• Review and Enhancement
Onsite and Remote Support
• Onsite server support, IMAC services, and remote software installation.
• Decommissioning, proactive evaluation, and datacenter assessment.
Windows Server Management & Projects
• Administer and monitor Windows servers, including health checks and problem management.
• Manage local users, groups, shares, and server disk/storage.
• Handle event logs, vendor coordination, and performance issues.
• Install and manage IIS, apply security patches, and troubleshoot clusters.
• Oversee DNS, SCOM, certificate management, migrations, and server deployments.
Linux Server Administration and Projects
• User Administration - Manage user accounts, environments, and home directories.
• OS Package Administration - Add/remove OS packages and troubleshoot issues.
• Storage Management - Create/manage file systems, logical volumes, and clean up disk space.
• NIS and NFS Management - Administer NIS tables and services, install/configure NFS servers.
• Network and Security - Configure/manage NTP, DNS, and implement security standards.
• OS Upgrade and Patching - Upgrade/patch Linux OS, configure SSSD and AD, manage disk and security.
• High Availability and Compliance - Build/configure HA environments, enforce security, and ensure regulatory compliance.
• Server Builds and Management - Install/configure NIS, mail, DNS servers, and centralized syslog servers.
DC Power Tools
• Tool Stack –Logic Monitor, MoogSoft, Manage Engine, Beyond Trust Password Safe, Beyond Trust AD Bridge, CommVault compliance Search, Veritas Hubstor etc. – Management and Support
Logic Monitor Administration
• Installation and Configuration - Install and configure LogicMonitor Collectors and group servers for monitoring.
• Monitoring and Reporting - Configure monitoring settings, create HLD/Templates/SOPs, and integrate with Moogsoft.
• Maintenance and Troubleshooting - Backup/restore LogicMonitor Collectors, troubleshoot devices, and modify LogicModules.
• Consultancy and Coordination - Provide consultancy, manage stakeholders, oversee platform support, and monitor infrastructure services.
Moogsoft Administration and Issues
• Integration and Event Management -Resolve Element Layer Tool integration issues and missing events/alarms at the Moogsoft layer.
• Ticketing and Situation Formulation - Address ticketing problems with ITSM tools and inconsistencies in situation formulation/Cookbook.
• Maintenance and Upgrades - Fix maintenance window malfunctions and perform Moogsoft module upgrades.
• Configuration Management - Manage Moogsoft ReC, Ipe additions/deletions/modifications, and Cookbook enablement/disablement.
• TeamRooms and API Integration - Create/modify/delete Moogsoft TeamRooms and integrate Moogsoft AI Operations with vendor APIs to automate ticketing.
• Updates and Enhancements - Manage Moogsoft updates and enhancements.
Storage Backup & Data Management
• Define performance, data segregation, backup, restore, archival, retention, reliability, encryption, security, scheduling, and access control needs.
• Recommend hierarchical storage solutions (shared/dedicated, tiered storage, platforms) and procedures to meet requirements and SLRs.
• Review and approve storage and backup solutions and procedures.
• Procure and manage data storage infrastructure (SAN, NAS, tape, optical).
• Provide and manage backup and archival consumables for Otsuka facilities.
• Maintain data set placement, manage data catalogs, and configure Nimble SAN and NAS switches.
• Notify Otsuka of any data losses or risks.
• Perform data and file backups/restores per procedures and SLRs.
• Manage file transfers, data movement, and input processing for third-party media.
• Decommission storage and backup environments per policies.
• Develop and maintain backup schedules, manage backup media, and ensure data retention.
• Work with third-party vendors to archive data at secure offsite locations.
• Conduct media testing to ensure data recovery capability and integrity.
• Test end-to-end system recovery, remediate flaws, and coordinate with vendors.
• Recover files/data as required, provide recovery updates, and manage data replication to DR sites.
Qualifications we seek in you!
Minimum Qualifications / Skills
• Bachelor's degree in Computer Science, Information Technology, Electrical Engineering, or a related field. Advanced degrees or relevant professional training are a plus.
• Minimum 10 years of experience in data center operations, with at least 5 years in a leadership or senior technical role.
• Extensive experience in data center operations, with a proven track record of managing large-scale data center environments.
• Strong leadership and team management skills, with the ability to motivate and develop a high-performing operations team.
• In-depth knowledge of data center infrastructure, including servers, storage, networking, power, and cooling systems.
• Excellent problem-solving and analytical skills, with the ability to diagnose and resolve complex technical issues.
• Experience with incident and problem management, change management, and capacity planning.
• Strong understanding of compliance, security, and regulatory requirements related to data center operations.
• Effective communication and interpersonal skills, with the ability to interact with stakeholders at all levels.
• Experience in vendor management and contract negotiations.
• A proactive approach to continuous improvement and innovation in data center operations.
Preferred Qualifications/ Skills
• Relevant certifications from Microsoft, VMWare Citrix and Storage vendors are highly desirable.
• Experience with ITIL or other IT service management frameworks.
• Familiarity with cloud computing and hybrid data center environments.
• Excellent communication and collaboration skills, with the ability to effectively interact with technical and non-technical stakeholders at all levels of the organization.
• Strong analytical and problem-solving skills, with the ability to identify root causes of issues and implement effective solutions in a timely manner.
• Proven ability to work independently as well as part of a team, with a proactive and self-motivated attitude towards achieving project goals.