Data Engineer Mid - Zapopan, México - Derevo
Descripción
We are looking for your talentData Engineer Mid
The desired profile should have at least 3 years hands-on experience in designing, establishing, and maintaining data management and storing systems. Skilled in collecting, processing, cleaning, and deploying large datasets, understanding ER data models, and integrating with multiple data sources. Efficient in analyzing, communicating, and proposing different ways of building Data Warehouses, Data Lakes, End-to-End Pipelines, and Big Data solutions to clients, either in batch or streaming strategies.
It will be very important that you have the following skills/experience:
English B2 or higher
Technical Proficiencies:
- SQL:
- Python:
- Databricks / Pyspark:
Understanding of narrow and wide transformations, actions, and lazy evaluations
How DataFrames are transformed, executed, and optimized in Spark
Use DataFrame API to explore, preprocess, join, and ingest data in Spark
Use Delta Lake to improve the quality and performance of data pipelines
Use SQL and Python to write production data pipelines to extract, transform, and load data into
tables and views in the Lakehouse
Understand the most common performance problems associated with data ingestion and how to
mitigate them
Monitor Spark UI:
Jobs, Stages, Tasks, Storage, Environment, Executors, and Execution Plans
Configure a Spark cluster for maximum performance given specific job requirements
Configure Databricks to access Blob, ADL, SAS, user tokens, Secret Scopes and Azure Key Vault
Configure governance solutions through Unity Catalog and Delta Sharing
Use Delta Live Tables to manage an end-to-end pipeline with unit and integrations test
- Azure:
Azure Storage Account:
Provision Azure Blob Storage or Azure Data Lake instances
Build efficient file systems for storing data into folders with static or parametrized names, considering possible security rules and risks
Experience identifying use cases for open-source file formats like parquet, AVRO, ORC
Understanding optimized column-oriented file formats vs optimized row-oriented file formats
Implementing security configurations through Access Keys, SAS, AAD, RBAC, ACLs
Azure Data Factory:
Provision Azure Data Factory instances
Use Azure IR, Self-Hosted IR, Azure-SSIS to establish connections to distinct data sources
Use of Copy or Polybase activities for loading data
Build efficient and optimized ADF Pipelines using linked services, datasets, parameters, triggers, data movement activities, data transformation activities, control flow activities and mapping data flows
Build Incremental and Re-Processing Loads
What benefits will you have?
WELLNESS:
We will promote your integral wellbeing through personal, professional and economic balance. Our legal and additional benefits will help you achieve it.
LET'
S RELEASE YOUR POWER:
You will have the opportunity to specialize in a comprehensive manner in different areas and technologies, thus achieving an interdisciplinary development.
We will push you to take on new challenges and surpass yourself.WE CREATE NEW THINGS:
We like to think outside the box. You will have the space, confidence and freedom to create and the training required to achieve it.
WE GROW TOGETHER:
You will participate in cutting-edge, multinational technology projects with foreign teams.
Where will you do it?
We are a great team working in a remote scheme, we are flexible and structured; providing the necessary equipment to work with and internal communication tools that facilitate our operation and that of our clients.
Become derevian & develop your superpower
Más ofertas de trabajo de Derevo
-
Key Account Manager Sr
México - hace 3 semanas
-
Key Account Manager
Desde casa, México - hace 3 semanas
-
DevOps Engineer
Zapopan, México - hace 1 semana
-
Technical Project Manager
México - hace 2 semanas
-
Tester
Xico, México - hace 3 semanas
-
Discovery Manager
México - hace 1 semana