Position: - Lead Data Engineer (AWS Cloud)
Location: - Remote
Type: - Contract to Hire
Job Description
• Design, develop, and maintain ETL/ELT pipelines using PySpark on Databricks.
• Build and optimize batch and streaming data pipelines.
• Implement Delta Lake solutions (Delta tables, time travel, ACID transactions).
• Collaborate with data scientists, analysts, and architects to deliver analytics-ready datasets.
• Optimize Spark jobs for performance, scalability, and cost.
• Integrate data from multiple sources (RDBMS, APIs, files, cloud storage).
• Implement data quality checks, validation, and monitoring.
• Manage Databricks notebooks, jobs, clusters, and workflows.
• Follow data governance, security, and compliance standards.
• Participate in code reviews and contribute to best practices.
Qualifications
• Hands-on experience with Data Frames, RDDs, joins, transformations, and actions within PySpark.
• Proven experience leading teams and mentoring engineers.
• Job optimization, cluster configuration, repartitioning, and Shuffle mechanics in Databricks.
• S3 buckets, IAM, CloudWatch, and integration with Databricks and AWS.
• Strong query skills for analytics and ETL with SQL.
• Performance tuning: Partitioning, caching, broadcast joins, and skew handling.
• Delta Lake, Medallion Architecture, Spark Streaming, Spark ML, and CI/CD pipelines.
• ETL/ELT design patterns. - Handling large-scale structured and semi-structured data.
• Performance tuning (partitioning, caching, broadcast joins).
• Understanding of data warehousing concepts.
• Excellent communication and stakeholder management skills.
• Ability to work in Agile delivery environments.
• Ownership mindset and delivery-focused approach.
• Strong technical decision-making and problem-solving skills.
Apply Now
Apply Now