Chapter 3 of Spark with Java is focusing on the dataframe. There is something majestic with Apache Spark’s dataframe, like those mountains of Montana. Apache Spark revolves around the concept of…
Advanced Spark Ingestion
Chapter 9 still covers Spark ingestion (like chapter 7 and chapter 8), but this time, it’s about “anything can become a Spark datasource.” When I was working for Zaloni, we…
Ingestion of data from databases into Apache Spark
Chapter 8 of Spark with Java is out and it covers ingestion, as did chapter 7. However, as chapter 7 was focusing on ingestion from files, chapter 8 focus on…
File Ingestion in Apache Spark
In a typical Big Data analytics scenario, you will probably be tempted to ingest files. You know, those pesky CSV files where the comma is sometimes a semicolon or a…
Apache Spark with Java
Apache Spark has been a game changer for distributed data processing, thanks to an easy to understand API, a focus on simplicity, and an adoption of modern infrastructure. However, rumors…
Apache Spark Maturity on the Rise
Spark Summit Europe 2017 just concluded, here, in Dublin. More than 102 speakers, 1200 attendees, and an impressive Databricks team attended the 3-day long celebration. Spark is reaching a new…
Spark is Making Big Data Easy at NCDevCon
NCDevCon is a yearly event in the Triangle, targeted for developers of all breeds, from front-end to back-end. Its origin starts in the ol’ days of Adobe ColdFusion, and thus…
Loading CSV in Spark
Loading CSV in Apache Spark is a standard feature since version 2.0, previously you required a free plugin (provided by Databricks). Although it starts with a basic value proposition: Comma…
A New Dimension for Apache Spark Clusters
Summer has been busy and it’s now behind us. I won’t annoy you with all the details of what happened but I wanted to come back on a project I…
A Deep-Dive Introduction to Spark for RDBMS Users
Earlier in the summer, I start a series of articles for IBM developerWorks. Those articles focus on Apache Spark from a RDBMS user perspective, of course, the database of choice…
