The Data Engineering Intern supports the Global Data Analytics organization by helping design, build, and maintain scalable data pipelines and cloud-based data solutions. This role contributes to data ingestion, transformation, validation, and documentation activities across on‑prem SQL Server and Azure platforms. The intern will work with senior data engineers to develop production-ready data workflows, strengthen data quality, and enable analytics and reporting use cases across the enterprise.
What will be my duties and responsibilities in this job?
- Assist in the development of data pipelines and ETL workflows using SQL, Azure Data Factory, and Azure Databricks. (30%)
- Support data ingestion from on‑prem SQL Server and Azure Blob Storage into enterprise data environments. (20%)
- Perform data profiling, validation, and quality checks to ensure accurate and reliable data outputs. (15%)
- Participate in unit testing, code review, and documentation creation for developed pipelines and data models. (15%)
- Collaborate with data engineers and business partners to clarify requirements and support analytics use cases. (10%)
- Assist with cloud environment enhancements, workflow scheduling, and monitoring activities. (10%)
What are the requirements needed for this position
- Actively pursuing an undergraduate or graduate degree in Computer Science, Data Engineering, Information Systems, or a related field.
- Foundational understanding of relational databases and SQL.
- Basic experience with Python for scripting or data processing.
- Familiarity with cloud concepts; exposure to Microsoft Azure preferred.
- Understanding of core data engineering concepts such as ETL/ELT, data ingestion, and data modeling.
- Strong analytical, problem‑solving, and communication skills.
- Ability to work in an Agile environment and use tools such as Azure DevOps.
What are the preferred requirements needed for this position?
- Exposure to Azure Data Factory, Azure Blob Storage, and Azure Databricks.
- Experience with PySpark or distributed data processing frameworks.
- Familiarity with Delta Lake, data lakes, or big data platforms.
- Understanding of CI/CD concepts and version control (Git).
- Knowledge of data validation, data profiling, or data quality processes.
- Awareness of cloud identity/access concepts (IAM) and resource management.
- Interests in data engineering best practices, pipeline orchestration, and scalable cloud architecture.