Trabajos
>
San Pedro Garza García

    Associate AIOps Engineer - San Pedro Garza García, México - SAP

    Default job background
    Regular De jornada completa
    Descripción

    Bring out your best

    SAP innovations help more than four hundred thousand customers worldwide work together more efficiently and use business insight more effectively. Originally known for leadership in enterprise resource planning (ERP) software, SAP has evolved to become a market leader in end-to-end business application software and related services for database, analytics, intelligent technologies, and experience management. As a cloud company with two hundred million users and more than one hundred thousand employees worldwide, we are purpose-driven and future-focused, with a highly collaborative team ethic and commitment to personal development. Whether connecting global industries, people, or platforms, we help ensure every challenge gets the solution it deserves. At SAP, you can bring out your best.

    Global Cloud Infrastructure & Delivery (GCID) develops and delivers services for cloud infrastructure and cloud operations to SAP Lines of Business (LoB) and through them, our external customers. We support LoBs and their customers' cloud adoption journey through four hyperscaler public clouds and SAP's Infrastructure-as-a-Service.

    Service Reliability Engineering (SRE) is a team within the GCID organization. It contributes to ensure the reliability and availability of SAP cloud services (internal or external) by developing and enhancing observability tools that help to either prevent or isolate an incident. SRE's proactively help automate and optimize processes. The SRE team runs globally in a follow the sun model.

    We are looking for an Associate AIOps Engineer (SRE) focusing on both soft and physical layers of our global operations.

    About the Role:
    You will join a global & multidisciplinary SRE team of DevOps engineers, contributing to the development of AI solutions that power a stack of diverse observability services using Machine Learning and Large Language models. This role involves reshaping how we manage alerts, metrics, and logs by introducing deep learning and NLP to enhance reliability services. You will also support troubleshooting during major incidents related to our global cloud infrastructure, ensuring excellence in triage and resolution. You will help the team to reduce critical KPI's around MTTD/MTTR, Signal to Noise Ratio, and other relevant metrics using these advanced methods.

    Expectations and Tasks:

  • Collaborate with engineering and product management following Agile Methodologies such as SCRUM.
  • Ability to prioritize and deliver high-quality developments under time constraints.
  • Ensure smooth operations and maximize uptime of the services we are responsible for.
  • Participate in On-Call rotational coverage, including weekends and holidays, with compensation as per local policies. Global follow the sun model with local daytime coverage.
  • Share knowledge across the team.
  • Work on data analysis & generation.
  • Support AI research & development projects.
  • Train and fine-tune AI Models.
  • Required Skills:

  • Fast adoption of cutting-edge technologies.
  • Advanced analytical and problem-solving mindset.
  • Strong team player with excellent communication skills.
  • Self-starter who acts with a sense of urgency to quickly move issues forward efficiently and effectively.
  • Fluent in spoken & written English.
  • Required Experience:

  • Development: 1+ years of professional development and/or educational equivalence. Must be able to demonstrate knowledge. Proficiency in Python & JavaScript as programming language. Knowledge in REST API using Flask or FastAPI or equivalent.
  • DevOps: Basic understanding of CI/CD pipelines Hands-on practice with docker containers & Kubernetes. Work with public cloud environments such as GCP/AWS/Azure. Familiarity with JSON, YAML, & Github. Knowledgeable in Monitoring and Performance Management tools.
  • Artificial Intelligence: Knowledgeable in at least one ML frameworks like PyTorch, TensorFlow, or similar. Basic understanding of Large Language Models and GenAI. Basic understanding of Machine Learning Supervised/Unsupervised models. Good understanding of data structures & data patterns.
  • Education: Bachelor's or equivalent education in Software Engineering, Computer Science, or a related field.
  • Preferred experience:

  • Elasticsearch, Splunk, or similar.
  • Some experience in web development frameworks.
  • Terraform, HelmChart, Ansible, or similar tools.
  • Kubeflow, MLFlow, Dataflow, or similar technologies.
  • Understanding of Enterprise/Service Provider Data Center Architecture.
  • Bring out your best

    SAP innovations help more than four hundred thousand customers worldwide work together more efficiently and use business insight more effectively. Originally known for leadership in enterprise resource planning (ERP) software, SAP has evolved to become a market leader in end-to-end business application software and related services for database, analytics, intelligent technologies, and experience management. As a cloud company with two hundred million users and more than one hundred thousand employees worldwide, we are purpose-driven and future-focused, with a highly collaborative team ethic and commitment to personal development. Whether connecting global industries, people, or platforms, we help ensure every challenge gets the solution it deserves. At SAP, you can bring out your best.



  • SAP San Pedro Garza García, México

    Bring out your best · SAP innovations help more than four hundred thousand customers worldwide work together more efficiently and use business insight more effectively. Originally known for leadership in enterprise resource planning (ERP) software, SAP has evolved to become a mar ...