No se aceptarán más candidaturas para esta oferta

Principal Site Reliability Engineer - Mexico City, México - Oracle

Oracle Mexico City, México

hace 1 semana

Regular Employee

Descripción

Responsibilities

Solve complex problems related to Linux infrastructure and Oracle Cloud Infrastructure

Act as a partner concern point for critical issues that may not have a detailed procedure and provide Root Cause Analysis (RCA)

Understand the end-to-end configuration, technical dependencies, characteristics of production infrastructure and services

Quickly grasp and analyze new technologies that are sophisticated and constantly evolving and integrate those into automation and infrastructure support

Design and delivery of mission-critical automation, with a focus on security, resiliency, scale, and performance.

See opportunities and drive the implementation of automation to improve service health, availability and reliability

Author functional and technical documentation and standard operating producers (SOP)

Collaborate with development teams in defining and implementing improvements in service architecture.

Articulate technical characteristics of services and technology areas and guide multi-functional teams to engineer and add capabilities to internal tools.

Partner with DevOps teams, Oracle Cloud Infrastructure deployment, and development teams to identify and resolve issues.

Knowledge Skills

Proven experience in Site Reliability Engineering and automation.

Experience in Linux Administration with good knowledge of Kernel-level debugging

Experience in debugging operating system performance issues and performance tuning

Experience working with fault-tolerant, highly available, high-efficiency, distributed and scalable systems

Expertise in developing scripts, utilities, and tools to automate routine or manual intensive tasks

Experience in application, compute, storage, and database solving for improving application reliability, scalability, availability

Experience in cloud infrastructure technologies

Experience in operations and problem management

Development experience using Python and building Infrastructure using Terraform

Experience in handling high-availability production applications

Experience working with global teams across different time zones.

Possesses and demonstrates strong logical-thinking skills, full of intellectual curiosity and high for self-development.

Ability to be a good teammate and the desire to learn and implement new Cloud technologies as needed

Good understanding of Agile software development principles including using common tools such as JIRA

Good understanding of cloud security, and compliance management including patching

Excellent interpersonal, verbal, and written communication skills

Qualifications required

Proven experience working in IT Operations\Infrastructure team

Bachelor degree in Computer Science, Computer Engineering, Software Engineering, or related areas is helpful

Principal Site Reliability Engineer - Mexico City, México - Oracle

Descripción

para Reclutadores

Información