DataFriday: manipulating the schemas of Spark dataframes

In this fifth episode of DataFriday, I will dig into the schema linked to each dataframe in Apache Spark. I will rename columns, create columns, and analyze the result. I […]

(Almost) All you need to know about file ingestion in Apache Spark

As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book […]

Lazy is good: understand why it’s good for you that Spark is lazy

This new chapter, chapter 4, of Spark with Java (https://www.manning.com/books/spark-with-java) is not only about celebrating laziness, it also teaches, through examples and experiments, the fundamental differences in building a data […]

The majestic dataframe in Apache Spark

Chapter 3 of Spark with Java is focusing on the dataframe. There is something majestic with Apache Spark’s dataframe, like those mountains of Montana. Apache Spark revolves around the concept of […]

Spark is Making Big Data Easy at NCDevCon

NCDevCon is a yearly event in the Triangle, targeted for developers of all breeds, from front-end to back-end. Its origin starts in the ol’ days of Adobe ColdFusion, and thus […]

Spark Java Recipes

Here are a few quick recipes to solve some common issues with Apache Spark. All examples are based on Java 8 (although I do not use consciously any of the […]

Dataframe

DataFriday: manipulating the schemas of Spark dataframes

Like this:

(Almost) All you need to know about file ingestion in Apache Spark

Lazy is good: understand why it’s good for you that Spark is lazy

The majestic dataframe in Apache Spark

Spark is Making Big Data Easy at NCDevCon

Spark Java Recipes

Let's be social

@jgperrin

/jgperrin

/jgperrin

Help share:

Like this:

Help share:

Help share:

Help share:

Help share:

Help share:

Let's be social

@jgperrin

/jgperrin

/jgperrin