A stretch… Data is organized by schemas, data is stored on disk (or memory), but nothing like a good old school disk. to illustrate data In this fifth episode of…
(Almost) All you need to know about file ingestion in Apache Spark
As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book…
Lazy is good: understand why it’s good for you that Spark is lazy
This new chapter, chapter 4, of Spark with Java ( is not only about celebrating laziness, it also teaches, through examples and experiments, the fundamental differences in building a data…
The majestic dataframe in Apache Spark
Chapter 3 of Spark with Java is focusing on the dataframe. There is something majestic with Apache Spark’s dataframe, like those mountains of Montana. Apache Spark revolves around the concept of…
Spark is Making Big Data Easy at NCDevCon
NCDevCon is a yearly event in the Triangle, targeted for developers of all breeds, from front-end to back-end. Its origin starts in the ol’ days of Adobe ColdFusion, and thus…
Spark Java Recipes
Here are a few quick recipes to solve some common issues with Apache Spark. All examples are based on Java 8 (although I do not use consciously any of the…
