As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book […]
What is Apache Spark, the podcast
A couple of weeks ago, I chatted about Apache Spark with Tobias Macey on data engineering on more specifically Apache Spark. Tobias Macey runs the data engineering podcast, which you can directly […]
The majestic dataframe in Apache Spark
Chapter 3 of Spark with Java is focusing on the dataframe. There is something majestic with Apache Spark’s dataframe, like those mountains of Montana. Apache Spark revolves around the concept of […]
File Ingestion in Apache Spark
In a typical Big Data analytics scenario, you will probably be tempted to ingest files. You know, those pesky CSV files where the comma is sometimes a semicolon or a […]
Apache Spark with Java
Apache Spark has been a game changer for distributed data processing, thanks to an easy to understand API, a focus on simplicity, and an adoption of modern infrastructure. However, rumors […]
Spark is Making Big Data Easy at NCDevCon
NCDevCon is a yearly event in the Triangle, targeted for developers of all breeds, from front-end to back-end. Its origin starts in the ol’ days of Adobe ColdFusion, and thus […]
Loading CSV in Spark
Loading CSV in Apache Spark is a standard feature since version 2.0, previously you required a free plugin (provided by Databricks). Although it starts with a basic value proposition: Comma […]
A New Dimension for Apache Spark Clusters
Summer has been busy and it’s now behind us. I won’t annoy you with all the details of what happened but I wanted to come back on a project I […]
A Deep-Dive Introduction to Spark for RDBMS Users
Earlier in the summer, I start a series of articles for IBM developerWorks. Those articles focus on Apache Spark from a RDBMS user perspective, of course, the database of choice […]
Apache Spark 2.2 is Out
The Best Spark in Town Yesterday, Apache Spark v2.2.0 has been released. Excitement started a few months ago, reaching a “summit” during Spark Summit where a lot of the features […]