Senior AI Infrastructure Operations Manager – Data Centers

Engineering / Technical Trades
Posted on: 2025/12/01 | Job ID: 178149
Toronto
,
Ontario
Permanent
The Senior Infrastructure Operations Manager – Data Centers oversees the design, deployment, and lifecycle management of GPU-based AI infrastructure across on-premises and cloud environments. This role ensures reliability, scalability, and efficiency for compute-intensive workloads while leading a high-performing technical team in a fast-paced, data center–driven environment.
Job Benefits:
  • Permanent position
  • Competitive salary between $160,000 and $190,000 + Bonuses
  • Comprehensive benefits package
  • Dynamic and collaborative work environment
  • Exposure to cutting-edge AI and data center technologies
  • International collaboration opportunities
Responsibilities of the Senior Infrastructure Operations Manager – Data Centers:
  • Lead, mentor, and grow a high-performing infrastructure and DevOps team.
  • Define objectives, measure performance, and foster a culture of accountability and innovation.
  • Oversee deployment, scaling, and lifecycle management of GPU-based AI clusters across data centers and hybrid cloud environments.
  • Ensure optimal performance, resilience, and cost efficiency for large-scale compute workloads.
  • Collaborate with Networking and System Architecture teams to ensure high-bandwidth, low-latency infrastructure design and execution.
  • Drive continuous integration and deployment (CI/CD) pipelines for infrastructure and AI workloads.
  • Ensure system availability, security, and performance through proactive monitoring and incident management.
  • Manage capacity planning and long-term infrastructure roadmaps to support growing AI demands.
  • Oversee budgets for hardware procurement, cloud services, and licensing, ensuring financial efficiency.
  • Collaborate with Security and Networking teams to enforce access controls, monitoring, and compliance standards.

Qualifications for the Senior Infrastructure Operations Manager – Data Centers:
  • Minimum 7 years of experience in infrastructure or IT operations, including 3+ years in leadership
  • Mandatory experience in data centers or mission-critical environments
  • Proven expertise managing high-performance computing or GPU/TPU infrastructures
  • Strong knowledge of Linux systems, distributed architectures, and automation frameworks
  • Proficiency with Terraform, Ansible, Kubernetes, Docker, and CI/CD pipelines
  • Experience with cloud and hybrid infrastructures (AWS, GCP, Azure)
  • Collaboration with Networking and Architecture teams to ensure scalability and reliability
  • Strong leadership, communication, and organizational abilities
  • Experience ensuring performance, compliance, and cost efficiency in complex environments

Interested in the Senior Infrastructure Operations Manager – Data Centers position?
Apply now to start the conversation!


PTWMTL