Telegram Web Link
Important Interview Question On Spark
=========================================
1. Difference between RDD & Dataframes
2. What are the challenges you face in spark?
3. What is difference between reduceByKey & groupByKey?
4. What is the difference between Persist and Cache?
5. What is the Advantage of a Parquet File?
6. What is a Broadcast Join ?
7. What is Difference between Coalesce and Repartition?
8. What are the roles and responsibility of driver in spark Architecture?
9. What is meant by Data Skewness? How is it deal?
10. What are the optimisation techniques used in Spark?
11. What is Difference Between Map and FlatMap?
12. What are accumulator and BroadCast Variables?
13. What is a OOM Issue, how to deal it?
14. what are tranformation in spark? Type of Transformation?
15. Tell me some action in spark that you used ?
16. What is the role of Catalyst Optimizer ?
17. what is the checkpointing?
18. Cache and persist
19. What do you understand by Lazy Evaluation ?
20. How to convert Rdd to Dataframe?
21. How to Dataframe to Dataset.
22. What makes Spark better than Hadoop?
23. How can you read a CSV file without using an external schema?
24. What is the difference between Narrow Transformation and Wide Transformation?
25. What are the different parameters that can be passed while Spark-submit?
26. What are Global Temp View and Temp View?
27. How can you add two new columns to a Data frame with some calculated values?
28. Avro Vs ORC, which one do you prefer?
29. What are the different types of joins in Spark?
30. Can you explain Anti join and Semi join?
31. What is the difference between Order By, Sort By, and Cluster By?
32. Data Frame vs Dataset in spark?
33. 4.What are the join strategies in Spark
34. What happens in Cluster deployment mode and Client deployment mode
35. What are the parameters you have used in spark-submit
36. How do you add a new column in Spark
37. How do you drop a column in Spark
38. What is difference between map and flatmap?
39. What is skew partitions?
40. What is DAG and Lineage in Spark?
41. What is the difference between RDD and Dataframe?
42. Where we can find the spark application logs.
43. What is the difference between reduceByKey and groupByKey?
44. what is spark optimization?
45. What are shared variables in spark
46. What is a broadcast variable
47. Why spark instead of Hive
48. what is cache
49. Tell me the steps to read a file in spark
50. How do you handle 10 GB file in spark, how do you optimize it
Data science interview questions JP Morgan and chase
Tower capital is hiring for ML engineers interns

Interested students can apply

https://www.tower-research.com/open-positions/?gh_jid=5798772
DataSpoof
Top 100 product-based companies in India .pdf
Top 100 products based companies in india
Meritshot is hiring Data Science and Data Analysis Interns.

- For Both Students & Graduates
- No Experience Required
- Work From Home

To apply, send your CV to: [email protected]
Dax function in PowerBI
DAX.pdf
552.7 KB
2025/07/07 18:03:02
Back to Top
HTML Embed Code: