Principal Site Reliability Engineer - Mexico City, México - Oracle

    Oracle
    Oracle background
    Regular Employee
    Descripción

    Responsibilities

  • Solve complex problems related to Linux infrastructure and Oracle Cloud Infrastructure
  • Act as a partner concern point for critical issues that may not have a detailed procedure and provide Root Cause Analysis (RCA)
  • Understand the end-to-end configuration, technical dependencies, characteristics of production infrastructure and services
  • Quickly grasp and analyze new technologies that are sophisticated and constantly evolving and integrate those into automation and infrastructure support
  • Design and delivery of mission-critical automation, with a focus on security, resiliency, scale, and performance.
  • See opportunities and drive the implementation of automation to improve service health, availability and reliability
  • Author functional and technical documentation and standard operating producers (SOP)
  • Collaborate with development teams in defining and implementing improvements in service architecture.
  • Articulate technical characteristics of services and technology areas and guide multi-functional teams to engineer and add capabilities to internal tools.
  • Partner with DevOps teams, Oracle Cloud Infrastructure deployment, and development teams to identify and resolve issues.
  • Knowledge Skills

  • Proven experience in Site Reliability Engineering and automation.
  • Experience in Linux Administration with good knowledge of Kernel-level debugging
  • Experience in debugging operating system performance issues and performance tuning
  • Experience working with fault-tolerant, highly available, high-efficiency, distributed and scalable systems
  • Expertise in developing scripts, utilities, and tools to automate routine or manual intensive tasks
  • Experience in application, compute, storage, and database solving for improving application reliability, scalability, availability
  • Experience in cloud infrastructure technologies
  • Experience in operations and problem management
  • Development experience using Python and building Infrastructure using Terraform
  • Experience in handling high-availability production applications
  • Experience working with global teams across different time zones.
  • Possesses and demonstrates strong logical-thinking skills, full of intellectual curiosity and high for self-development.
  • Ability to be a good teammate and the desire to learn and implement new Cloud technologies as needed
  • Good understanding of Agile software development principles including using common tools such as JIRA
  • Good understanding of cloud security, and compliance management including patching
  • Excellent interpersonal, verbal, and written communication skills
  • Qualifications required

  • Proven experience working in IT Operations\Infrastructure team
  • Bachelor degree in Computer Science, Computer Engineering, Software Engineering, or related areas is helpful