Loading CSV in Apache Spark is a standard feature since version 2.0, previously you required a free plugin (provided by Databricks). Although it starts with a basic value proposition: Comma […]
A New Dimension for Apache Spark Clusters
Summer has been busy and it’s now behind us. I won’t annoy you with all the details of what happened but I wanted to come back on a project I […]
Apache Spark 2.2 is Out
The Best Spark in Town Yesterday, Apache Spark v2.2.0 has been released. Excitement started a few months ago, reaching a “summit” during Spark Summit where a lot of the features […]
Meet Cactar, the Ancient Mongolian Warlord of Data Quality
A Little History On August 18, 1227, the well-known Mongolian emperor Genghis Khan passed. Despite numerous criticisms, based on rumors of genocide and brutality, he united Mongolia. One of his […]
Spark Boosts IBM Event Store
IBM just announced Event Store, a hybrid datastore to store events. The originality? Events can be streamed in and it is based on Apache Spark. IBM claims to be able […]
The Key to Machine Learning is Prepping the Right Data
Earlier this month, I was in San Francisco, CA, to attend Spark Summit 2017. I gave a talk on the phase before you can apply Machine Learning on data, using […]
Recents Publications
A quick flashback on a few articles I published recently. You Are Not a Machine, So Learn Machine Learning published by Database Trends and Applications on February 21st, 2017. What Are Spark […]
What are Spark Checkpoints on Dataframes?
Let’s understand what can checkpoints do for your Spark dataframes and go through a Java example on how we can use them. Checkpoint on Dataframe In v2.1.0, Apache Spark introduced […]
Apache Spark Event in the Triangle
A quick post to share the next Spark event that we will run in the NC Triangle (RTP – Chapel Hill, Durham, Raleigh). This event will be held on December […]
The WoW I Went To
Right before Halloween, from October 24th to October 27th, I went to WoW. Of course, when I told that to my kids they assumed I was going to play World […]