Loading CSV in Apache Spark is a standard feature since version 2.0, previously you required a free plugin (provided by Databricks). Although it starts with a basic value proposition: Comma […]
A New Dimension for Apache Spark Clusters
Summer has been busy and it’s now behind us. I won’t annoy you with all the details of what happened but I wanted to come back on a project I […]
A Deep-Dive Introduction to Spark for RDBMS Users
Earlier in the summer, I start a series of articles for IBM developerWorks. Those articles focus on Apache Spark from a RDBMS user perspective, of course, the database of choice […]
Getting Ready for This Pint of Guinness
Next month, I’ll be heading to Dublin, the capital of Ireland. I have been to Ireland quite a few times – I was 3 the first time. However this time, […]
Apache Spark 2.2 is Out
The Best Spark in Town Yesterday, Apache Spark v2.2.0 has been released. Excitement started a few months ago, reaching a “summit” during Spark Summit where a lot of the features […]
Meet Cactar, the Ancient Mongolian Warlord of Data Quality
A Little History On August 18, 1227, the well-known Mongolian emperor Genghis Khan passed. Despite numerous criticisms, based on rumors of genocide and brutality, he united Mongolia. One of his […]
Spark Boosts IBM Event Store
IBM just announced Event Store, a hybrid datastore to store events. The originality? Events can be streamed in and it is based on Apache Spark. IBM claims to be able […]
The Key to Machine Learning is Prepping the Right Data
Earlier this month, I was in San Francisco, CA, to attend Spark Summit 2017. I gave a talk on the phase before you can apply Machine Learning on data, using […]
A new Informix Book is Out for MacOS and Java
Why a book on Informix? Informix has been a passion for almost 20 years. Very often, a younger version of myself would say: “I don’t like databases, that’s why I […]
HDP 2.6 is Out: Spark 2, Hive 2, and Zeppelin 0.7 are GA
Hortonworks Data Platform (HDP) v2.6 has been released and you can download the platform from their website. The sandbox is not yet available in v2.6. New Versions of Key Components […]