(Almost) All you need to know about file ingestion in Apache Spark

As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book…

What is Apache Spark, the podcast

A couple of weeks ago, I chatted about Apache Spark with Tobias Macey on data engineering on more specifically Apache Spark. Tobias Macey runs the data engineering podcast, which you can directly…

Checklist for All Things Open (ATO)

The checklist is updated for ATO 2019! All Things Open 2018 (ATO 2018), a premier open source conference, will open its doors on October 21st 2018 in the Raleigh Convention Center,…

Lazy is good: understand why it’s good for you that Spark is lazy

This new chapter, chapter 4, of Spark with Java ( is not only about celebrating laziness, it also teaches, through examples and experiments, the fundamental differences in building a data…

Advanced Spark Ingestion

Chapter 9 still covers Spark ingestion (like chapter 7 and chapter 8), but this time, it’s about “anything can become a Spark datasource.” When I was working for Zaloni, we…

File Ingestion in Apache Spark

In a typical Big Data analytics scenario, you will probably be tempted to ingest files. You know, those pesky CSV files where the comma is sometimes a semicolon or a…

Spark v2.3

(Almost) All you need to know about file ingestion in Apache Spark

What is Apache Spark, the podcast

Checklist for All Things Open (ATO)

Lazy is good: understand why it’s good for you that Spark is lazy

Advanced Spark Ingestion

File Ingestion in Apache Spark

Let's be social

jgperrin.substack

/in/jgperrin

/jgperrin

Help share:

Help share:

Help share:

Help share:

Help share:

Help share:

Let's be social

jgperrin.substack

/in/jgperrin

/jgperrin