Objective of the

R

ole

 

Responsible

for

driving

the

strategic

direction

,

operational

excellence

, and

continuous

evolution

of

site

reliability

engineering

practices

across

critical

systems

and

services

.

This

role leads

a

team

of

SRE

engineers

an

d

complex

initiatives

,

ensuring

high

availability

,

scalability

, and performance.

The

Senior Lead

of

SRE

fosters

cross-functional

collaboration

,

anticipates

future

infrastructure

needs

, and

aligns

SRE

practices

with

business

and

product

priorities

,

while

cultivating

a culture

of

ownership

,

automation

, and

resilience

and

driving

operational

excellence

with

engineering

teams

.

 

Main

Responsibilities

 

  • Build, lead, and inspire high-performing SRE teams, fostering a culture of operational ownership, engineering excellence, and continuous learning.

     

  • Define and execute the strategic roadmap for SRE, integrating best practices in reliability, incident management, observability, and infrastructure automation in alignment with business and product goals.

     

  • Elevate observability across the stack by designing and enforcing standards for telemetry, structured logging, distributed tracing, and service-level dashboards. Ensure 100% coverage of business-critical systems with actionable metrics and alerting along with the engineering teams.

     

  • Act as the technical escalation point for the most complex production issues, leading hands-on incident response and deep root cause analysis in large-scale, low-latency, event-driven architectures.

     

  • Champion automation-first infrastructure practices, enforcing IaC, immutable deployments, and auto-remediation patterns that reduce manual intervention and accelerate delivery.

     

  • Drive architectural and operational improvements through close partnership with Product Engineering, Platform, Security, and Architecture teams. Proactively identify and mitigate systemic reliability risks and performance bottlenecks.

     

  • Lead the definition, adoption, and review of SLIs, SLOs, and Error Budgets, ensuring they are embedded into engineering and product decision-making processes.

     

  • Operationalize change management, chaos engineering, and DR strategies, validating readiness through frequent simulations and failover exercises.

     

  • Mentor and develop SRE leads and senior engineers, scaling internal capabilities and reinforcing technical depth across the organization.

     

  • Represent SRE in architecture boards, and business reviews, aligning engineering reliability strategies with company-wide objectives.

     

  • Promote a culture of autonomy and proactive engineering, encouraging teams to own their services end-to-end with accountability and resilience thinking.

     

  • Serve as a cultural leader within Spin, fostering psychological safety, ownership, and a sense of mission to serve millions of people across LATAM with secure, reliable financial technology.

     

 

Required Knowledge

and Experience

 

  • Bachelor’s degree in Computer Science, Software Engineering, or related field (or equivalent experience).

     

  • 10+ years of experience in SRE, DevOps, or Software Engineering roles, with at least 4+ years in leadership roles.

     

  • Strong experience leading distributed SRE or platform teams in complex, production-scale environments.

     

  • Deep understanding of reliability engineering principles, cloud-native infrastructure on AWS, observability, and incident response.

     

  • Hands-on experience with infrastructure as code, CI/CD pipelines, containers, and orchestration tools.

     

  • Strong architectural and performance optimization skills across cloud and hybrid infrastructure.

     

  • Demonstrated ability to influence and collaborate across engineering, product, and business teams.

     

  • Familiarity with regulatory and security frameworks relevant to infrastructure reliability.

     

  • Excellent communication and leadership skills, with experience presenting to senior stakeholders.

     

  • Strategic thinking, systems-level problem solving, and a proactive approach to continuous improvement.

     

Spin está comprometida con un lugar de trabajo diverso e inclusivo. Somos un empleador que ofrece igualdad de oportunidades y no discrimina por motivos de raza, origen nacional, género, identidad de género, orientación sexual, discapacidad, edad u otra condición legalmente protegida. Si desea solicitar una adaptación, notifique a su Reclutador.

Apply to this Job", "url": "https://remote.thetodayupdate.com/job/m2-infra-sr-lead-sre/", "jobLocationType": "TELECOMMUTE", "applicantLocationRequirements": [{"@type": "Country","name": "US"}], "employmentType": "FULL_TIME", "baseSalary": { "@type": "MonetaryAmount", "currency": "USD", "value": { "@type": "QuantitativeValue", "value": "Not Disclosed by Recruiter", "unitText": "MONTHLY" } }, "hiringOrganization": { "@type": "Organization", "name": "taskbloomio_net", "sameAs": "https://taskbloomio.net" }, "jobLocation": { "@type": "Place", "address": { "@type": "PostalAddress", "addressCountry": "US" } } }
Back to Jobs

M2 - Infra Sr Lead - SRE

Remote, USA Full-time Posted 2025-11-03

Objective of the

R

ole

 

Responsible

for

driving

the

strategic

direction

,

operational

excellence

, and

continuous

evolution

of

site

reliability

engineering

practices

across

critical

systems

and

services

.

This

role leads

a

team

of

SRE

engineers

an

d

complex

initiatives

,

ensuring

high

availability

,

scalability

, and performance.

The

Senior Lead

of

SRE

fosters

cross-functional

collaboration

,

anticipates

future

infrastructure

needs

, and

aligns

SRE

practices

with

business

and

product

priorities

,

while

cultivating

a culture

of

ownership

,

automation

, and

resilience

and

driving

operational

excellence

with

engineering

teams

.

 

Main

Responsibilities

 

  • Build, lead, and inspire high-performing SRE teams, fostering a culture of operational ownership, engineering excellence, and continuous learning.

     

  • Define and execute the strategic roadmap for SRE, integrating best practices in reliability, incident management, observability, and infrastructure automation in alignment with business and product goals.

     

  • Elevate observability across the stack by designing and enforcing standards for telemetry, structured logging, distributed tracing, and service-level dashboards. Ensure 100% coverage of business-critical systems with actionable metrics and alerting along with the engineering teams.

     

  • Act as the technical escalation point for the most complex production issues, leading hands-on incident response and deep root cause analysis in large-scale, low-latency, event-driven architectures.

     

  • Champion automation-first infrastructure practices, enforcing IaC, immutable deployments, and auto-remediation patterns that reduce manual intervention and accelerate delivery.

     

  • Drive architectural and operational improvements through close partnership with Product Engineering, Platform, Security, and Architecture teams. Proactively identify and mitigate systemic reliability risks and performance bottlenecks.

     

  • Lead the definition, adoption, and review of SLIs, SLOs, and Error Budgets, ensuring they are embedded into engineering and product decision-making processes.

     

  • Operationalize change management, chaos engineering, and DR strategies, validating readiness through frequent simulations and failover exercises.

     

  • Mentor and develop SRE leads and senior engineers, scaling internal capabilities and reinforcing technical depth across the organization.

     

  • Represent SRE in architecture boards, and business reviews, aligning engineering reliability strategies with company-wide objectives.

     

  • Promote a culture of autonomy and proactive engineering, encouraging teams to own their services end-to-end with accountability and resilience thinking.

     

  • Serve as a cultural leader within Spin, fostering psychological safety, ownership, and a sense of mission to serve millions of people across LATAM with secure, reliable financial technology.

     

 

Required Knowledge

and Experience

 

  • Bachelor’s degree in Computer Science, Software Engineering, or related field (or equivalent experience).

     

  • 10+ years of experience in SRE, DevOps, or Software Engineering roles, with at least 4+ years in leadership roles.

     

  • Strong experience leading distributed SRE or platform teams in complex, production-scale environments.

     

  • Deep understanding of reliability engineering principles, cloud-native infrastructure on AWS, observability, and incident response.

     

  • Hands-on experience with infrastructure as code, CI/CD pipelines, containers, and orchestration tools.

     

  • Strong architectural and performance optimization skills across cloud and hybrid infrastructure.

     

  • Demonstrated ability to influence and collaborate across engineering, product, and business teams.

     

  • Familiarity with regulatory and security frameworks relevant to infrastructure reliability.

     

  • Excellent communication and leadership skills, with experience presenting to senior stakeholders.

     

  • Strategic thinking, systems-level problem solving, and a proactive approach to continuous improvement.

     

Spin está comprometida con un lugar de trabajo diverso e inclusivo. Somos un empleador que ofrece igualdad de oportunidades y no discrimina por motivos de raza, origen nacional, género, identidad de género, orientación sexual, discapacidad, edad u otra condición legalmente protegida. Si desea solicitar una adaptación, notifique a su Reclutador.

Apply to this Job

Find the best remote jobs in USA - taskbloomio_net