(Almost) All you need to know about file ingestion in Apache Spark

As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book […]

What is Apache Spark, the podcast

A couple of weeks ago, I chatted about Apache Spark with Tobias Macey on data engineering on more specifically Apache Spark. Tobias Macey runs the data engineering podcast, which you can directly […]

The majestic dataframe in Apache Spark

Chapter 3 of Spark with Java is focusing on the dataframe. There is something majestic with Apache Spark’s dataframe, like those mountains of Montana. Apache Spark revolves around the concept of […]

File Ingestion in Apache Spark

In a typical Big Data analytics scenario, you will probably be tempted to ingest files. You know, those pesky CSV files where the comma is sometimes a semicolon or a […]

Apache Spark with Java

Apache Spark has been a game changer for distributed data processing, thanks to an easy to understand API, a focus on simplicity, and an adoption of modern infrastructure. However, rumors […]

Spark is Making Big Data Easy at NCDevCon

NCDevCon is a yearly event in the Triangle, targeted for developers of all breeds, from front-end to back-end. Its origin starts in the ol’ days of Adobe ColdFusion, and thus […]

Loading CSV in Spark

Loading CSV in Apache Spark is a standard feature since version 2.0, previously you required a free plugin (provided by Databricks). Although it starts with a basic value proposition: Comma […]

A New Dimension for Apache Spark Clusters

Summer has been busy and it’s now behind us. I won’t annoy you with all the details of what happened but I wanted to come back on a project I […]

A Deep-Dive Introduction to Spark for RDBMS Users

Earlier in the summer, I start a series of articles for IBM developerWorks. Those articles focus on Apache Spark from a RDBMS user perspective, of course, the database of choice […]

Apache Spark 2.2 is Out

The Best Spark in Town Yesterday, Apache Spark v2.2.0 has been released. Excitement started a few months ago, reaching a “summit” during Spark Summit where a lot of the features […]

Spark v2.2

(Almost) All you need to know about file ingestion in Apache Spark

What is Apache Spark, the podcast

The majestic dataframe in Apache Spark

File Ingestion in Apache Spark

Apache Spark with Java

Spark is Making Big Data Easy at NCDevCon

Loading CSV in Spark

A New Dimension for Apache Spark Clusters

A Deep-Dive Introduction to Spark for RDBMS Users

Apache Spark 2.2 is Out

Let's be social

@jgperrin

/jgperrin

/jgperrin

Help share:

Help share:

Help share:

Help share:

Help share:

Help share:

Help share:

Help share:

Help share:

Help share:

Let's be social

@jgperrin

/jgperrin

/jgperrin