Spark optimizations: what are the optimizations that can be done for the below snippet code: shoppers_df (customers description DF) 250MB, 15M records: schema: StructType = StructType(Array(StructFiled("shopper_id", LongType, nullable = True), StructField("retailer_id", StringType, nullable = True), StructField("shopper_group_id", StringType, nullable = True), StructField("join_date", DateType, nullable = True), StructField("shopper_type", StringType, nullable = True), StructField("gender", StringType, nullable = True))) sku_df (dimension DF): 15 MB, 90K records purchase_df (transactions DF): 50GB of parquet compressed files 5,000,000,000 records. schema: StructType = StructType(Array(StructFiled("shopper_id", LongType, nullable = True), StructField("product_id", LongType, nullable = True), StructField("pos_id", IntegerType, nullable = True), StructField("purchase_date", DateType, nullable = True), StructField("units", DoubleType, nullable = True), StructField("total_spent", DoubleType, nullable = True))) Current code: products_purchased_df = purchase_df.alias("purchase").join(shoppers_df, on = "shopper_id", how = "left outer").join(sku_df.alias("sku"), on = "product_id").select(Col("purchase.*"), Col("sku.*")) usage: status_df = products_purchased_df.groupBy(["shopper_id", "product_id"]).agg(...) Optimize join statement
Data Engineer Interview Questions
21,066 data engineer interview questions shared by candidates
We will give you a take-home project to do and you will have to do research and come up with architecture around it?
How might the things you learned in university be useful for Tessella?
Two rounds - Online technical test Multiple choice answer and question format (skip questions that are not relevant) Technical questions on current problems the company faced and how you would solve it
Talk about a project that involved Databases
What are your career goals for the next 5 years
Job experience, what model did I used? what is the pros and cons of the model? What can you do to further improve the performance.
How does a lithium ion cell work?
How do you work in a team?
Have you worked with AWS cloud tools?
Viewing 1371 - 1380 interview questions