A new chapter of Spark in Action, 2e, (formerly known as Spark with Java) is available. Chapter 11 is titled “Working with SQL”. In chapter 11, you will explore how…

As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book…

A couple of weeks ago, I chatted about Apache Spark with Tobias Macey on data engineering on more specifically Apache Spark. Tobias Macey runs the data engineering podcast, which you can directly…

The checklist is updated for ATO 2019! All Things Open 2018 (ATO 2018), a premier open source conference, will open its doors on October 21st 2018 in the Raleigh Convention Center,…

This new chapter, chapter 4, of Spark with Java ( is not only about celebrating laziness, it also teaches, through examples and experiments, the fundamental differences in building a data…

Chapter 3 of Spark with Java is focusing on the dataframe. There is something majestic with Apache Spark’s dataframe, like those mountains of Montana. Apache Spark revolves around the concept of…

Chapter 9 still covers Spark ingestion (like chapter 7 and chapter 8), but this time, it’s about  “anything can become a Spark datasource.” When I was working for Zaloni, we…

Chapter 8 of Spark with Java is out and it covers ingestion, as did chapter 7. However, as chapter 7 was focusing on ingestion from files, chapter 8 focus on…

In a typical Big Data analytics scenario, you will probably be tempted to ingest files. You know, those pesky CSV files where the comma is sometimes a semicolon or a…

Apache Spark has been a game changer for distributed data processing, thanks to an easy to understand API, a focus on simplicity, and an adoption of modern infrastructure. However, rumors…