A new chapter of Spark in Action, 2e, (formerly known as Spark with Java) is available. Chapter 11 is titled “Working with SQL”. In chapter 11, you will explore how…
Lazy is good: understand why it’s good for you that Spark is lazy
This new chapter, chapter 4, of Spark with Java ( is not only about celebrating laziness, it also teaches, through examples and experiments, the fundamental differences in building a data…
The majestic dataframe in Apache Spark
Chapter 3 of Spark with Java is focusing on the dataframe. There is something majestic with Apache Spark’s dataframe, like those mountains of Montana. Apache Spark revolves around the concept of…
Advanced Spark Ingestion
Chapter 9 still covers Spark ingestion (like chapter 7 and chapter 8), but this time, it’s about “anything can become a Spark datasource.” When I was working for Zaloni, we…
Ingestion of data from databases into Apache Spark
Chapter 8 of Spark with Java is out and it covers ingestion, as did chapter 7. However, as chapter 7 was focusing on ingestion from files, chapter 8 focus on…
Apache Spark with Java
Apache Spark has been a game changer for distributed data processing, thanks to an easy to understand API, a focus on simplicity, and an adoption of modern infrastructure. However, rumors…
