Site Reliability Engineer Iii - Guadalajara, México - f5

Empresa verificada

Guadalajara, México

hace 3 semanas

Publicado por:

Rodrigo Fernández

Reclutador de talento para beBee

Descripción

Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive.

Why do you want to join our team?

F5 has innovated a consistent, cloud-native environment that can be deployed across multiple public clouds and edge sites — a distributed cloud platform.

Within this SaaS-based offering, F5 integrates a broad range of services that have normally been siloed across many point products and network or cloud providers.

The solution is designed to provide a single way to view security, operations and management components.

Cloud-native technologies and distributed architectures introduce new challenges around speed, scale, and data complexity— challenges that traditional operating models simply weren't designed to handle.

Systems must be able to operate effectively and reliably through web-scale builds and deployments, frequent releases, and complex architectures that encompass technologies such as microservices, cluster management, containers, and cloud.

Every issue is an opportunity; a single discovery can be the key to resolving a problem impacting thousands.

Position Summary

Primary Responsibilities

Engage in and improve the whole lifecycle of services—from inception and design, through to deployment, operation and refinement.
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
Practice sustainable incident response and blameless postmortems.

Knowledge, Skills and Abilities

Strong programming skills and experience using engineering and automation to solve operational issues and processes
Strong troubleshooting and problemsolving skills, and the ability to manage production issues while working on improvements and automations.
Solid ability to work independently or as part of a team to deliver features on agreed upon timelines.
Application Build and Deployment processes focusing on Infrastructure as Code (IaC)
Service Operation (define, instrument, measure and management of service level objectives)
Incident management, including service restoration, root cause analysis, postmortem authorship, etc.)

Qualifications:

Bachelor's Degree in computer science or related field and 5+ years of experience or equivalent combination of education and experience
Minimum 4 years in an enterpriselevel site/system engineering or reliability engineering role
Strong base knowledge of operating systems, networking basics, and security bestpractices.
Solid knowledge of both Agile delivery principles as well as Devops principles
Working knowledge of at least one Public Cloud provider (Azure, AWS, GCP, etc.), with a preference for Azure.
Proficiency in one or more programming or scripting languages (Typescript, Python, Bash, Powershell, etc.).
Strong understanding of DevOps tools likes Jenkins or Azure DevOps, with Azure DevOps being preferred
Experience with IaC management using tools like Terraform, Ansible, Bicep, etc.
Familiarity with IT governance methodologies (ITIL, ISMS, etc.)
Working knowledge of security best practices
Technical certifications a plus

LI-OR1

Remote

Hybrid

Job may be performed on-site at a customer facility or data center, or in an office environment sitting at a desk or computer table.

Equal Employment Opportunity