Site Reliability Engineering Manager, GWCP
Brazil - Curitiba
Product Development and Operations/Full time/Hybrid
Site Reliability Engineering (SRE) brings together software and systems engineering to design and operate large-scale, highly distributed, and fault-tolerant systems. As an SRE Manager at Guidewire, you will lead a team responsible for the reliability, scalability, and operational excellence of the Guidewire Cloud Platform (GWCP) and InsuranceSuite products.
This role combines hands-on technical leadership with leading people working for you, ensuring that systems operate reliably at scale while building a high-performing SRE team. You will collaborate across engineering, product, and platform teams to drive reliability standards, automation, and continuous improvement.
To learn more about GWCP and its tenancy model, you can read more here:
https://medium.com/guidewire-engineering-blog/guidewire-cloud-why-hybrid-tenancy-is-the-right-choice-part-2-of-2-ba22c9888bb8
Job Description
What You'll Do
Technical Leadership & Execution
Provide technical direction and oversight for SRE initiatives, ensuring best practices in reliability, scalability, and performance.
Remain hands-on where needed, contributing to system design, automation, and incident resolution.
Guide the design and development of tools supporting 24x7 follow-the-sun operations.
Drive automation across infrastructure provisioning, deployments, and operational workflows.
Ensure effective observability strategies (metrics, logging, tracing) and promote self-healing systems.
Partner with engineering teams to influence system design for reliability and operability.
Reliability, Process Engineering & Continuous Improvement
Design, evolve, and simplify SRE processes (incident management, production readiness, capacity planning, change management) with a focus on effectiveness over overhead.
Apply process engineering principles—ensuring processes are lightweight, scalable, and enable teams rather than slow them down.
Prioritize people over process: use processes as guardrails, not rigid workflows, and empower engineers to make sound decisions.
Proactively identify gaps, inefficiencies, and risks—and drive them through to resolution with a bias for action.
Establish and enforce SLOs, SLIs, and error budgets across services.
Lead major incident response and ensure blameless postmortems result in real, implemented improvements, not just documentation.
Continuously reduce operational toil through automation and simplification.
Ensure follow-the-sun operations are practical, sustainable, and optimized for real-world execution.
Leading People Working for You
Hire, onboard, and develop SRE engineers.
Lead the people working for you by setting clear expectations, providing guidance, and removing obstacles to execution.
Foster a culture of ownership, accountability, and service orientation.
Support engineers in making decisions and taking action, rather than relying on rigid processes or escalation.
Encourage critical thinking and problem-solving over checklist-driven execution.
Balance workload across the team, ensuring sustainable on-call participation and operational responsibilities.
Set clear priorities and ensure the team is focused on high-impact work that improves reliability and customer outcomes.
Cross-Team Collaboration & Service Mentality
Act as a key stakeholder across SRE Platform, Product Development, and Cloud Engineering teams.
Demonstrate a strong service mentality—ensuring platform capabilities meet the needs of internal teams and customers.
Balance platform standards with pragmatism, enabling teams while maintaining reliability and guardrails.
Partner with teams to solve problems collaboratively, rather than acting as a gatekeeper.
Drive adoption of best practices through influence, not enforcement alone.
Operational Strategy & Execution
Define and track metrics that reflect real outcomes (reliability, customer impact, team efficiency), not just process adherence.
Ensure work is prioritized toward meaningful improvements in reliability, scalability, and developer experience.
Continuously evaluate whether processes, tools, and practices are delivering value—and adjust when they are not.
Avoid unnecessary process overhead; focus on enabling teams to move faster safely.
Advocate for and drive investments in platform improvements and reliability initiatives.
Documentation & Knowledge Sharing
Ensure high-quality documentation, runbooks, and operational guidance.
Promote knowledge sharing across teams and regions.
Enable teams to operate independently through clear documentation and tooling.
Who You Are
Technical Expertise
Strong programming skills in Python or Go; experience with Java/Spring Boot is a plus.
Deep experience with Kubernetes (EKS), including networking, ingress, and operator patterns.
Expertise in Terraform and infrastructure as code at scale.
Advanced knowledge of AWS services and distributed systems architecture.
Strong background in observability tools such as Prometheus, OpenTelemetry, or Datadog.
Experience supporting production systems at scale in a microservices environment.
Familiarity with CI/CD systems such as TeamCity, GitHub Actions, or Jenkins.
Understanding of SSO, SAML, OAuth; experience with Okta is a plus.
Leadership & Ownership
Proven experience leading engineers working for you while remaining technically credible.
Demonstrated ability to build and evolve processes that serve people and outcomes, not bureaucracy.
Strong sense of ownership with a track record of driving issues through to resolution.
Demonstrated ability to identify problems, take initiative, and implement solutions without waiting for direction.
Ability to balance short-term operational needs with long-term improvements.
Comfortable making decisions and taking accountability in high-pressure situations.
Collaboration & Communication
Excellent communication skills with the ability to influence across teams.
Ability to translate complex technical concepts into clear, actionable insights.
Experience working in agile environments (Scrum, Kanban).
Mindset
Strong service-oriented mindset with a focus on enabling others to succeed.
Bias toward action and problem-solving over coordination and escalation.
Focus on outcomes, not process overhead.
Passion for reliability, automation, and continuous improvement.
Curiosity and willingness to explore emerging technologies, including AI, to improve productivity and outcomes.
Bonus Points
Kubernetes or AWS certifications.
Experience leading SRE or platform teams.
Contributions to open source projects.
Familiarity with tools like KubeVela (OAM) or Crossplane.
Experience implementing SLO/error budget frameworks at scale.
Interested in this position?
About Guidewire
Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently. We combine digital, core, analytics, and AI to deliver our platform as a cloud service. More than 540+ insurers in 40 countries, from new ventures to the largest and most complex in the world, run on Guidewire.
As a partner to our customers, we continually evolve to enable their success. We are proud of our unparalleled implementation track record with 1600+ successful projects, supported by the largest R&D team and partner ecosystem in the industry. Our Marketplace provides hundreds of applications that accelerate integration, localization, and innovation.
For more information, please visit www.guidewire.com and follow us on Twitter: @Guidewire_PandC.
Guidewire Software, Inc. is proud to be an equal opportunity and affirmative action employer. We are committed to an inclusive workplace, and believe that a diversity of perspectives, abilities, and cultures is a key to our success. Qualified applicants will receive consideration without regard to race, color, ancestry, religion, sex, national origin, citizenship, marital status, age, sexual orientation, gender identity, gender expression, veteran status, or disability. All offers are contingent upon passing a criminal history and other background checks where it's applicable to the position.
Learn More About Guidewire
Explore All Careers Resources

