Explain how you would design an ETL pipeline to move and transform data from multiple sources.
Data Engineer Interview Questions
21,096 data engineer interview questions shared by candidates
- What is the difference between shallow and deep copy in Python?
General purpose Azure Storage types ?
What is your knowledge on cloud and why do you want to work in this area
I was asked the difference between DBMS and RDBMS. Also,when denormalised forms can be better than the normalised forms.
Suppose I have records like this: ("a-b", "data1", 1) ("a-c", "data2", 1) ("a-b", "data3", 1) How can I group and sum, such that I have the following results when the input is a DataStream? ("a-b", ["data1", "data3"], 2) ("a-c", ["data2"], 1)
First Round Q1 Delete duplicates from a table. Q2 Find duplicate rows from the table. Q3 Repartition vs Coalesce Q4 How stages are created in Spark ? Q5 Word count program with Spark Q6 Fibonnaci Sequence using Python Q7 What are generators ? Q8 Spark Architecture Q9 Questions related to project Techno managerial Round Q1 Discussion about the project and experience. Q2 Query to create a table and partition the table in HIVE ? Q3 Directory structure for partitioned table Q4 If you add a new directory with correct schema to hdfs will data be shown in HQL ? Q5 Find the max temperature for a given date range ?
What are the Complexities in previous Project
Find middle element of linked list in one iteration.
Can you walk us through the process of designing an end-to-end data pipeline, including data ingestion, transformation, and loading (ETL), and how you would ensure its scalability and reliability?
Viewing 1861 - 1870 interview questions