Earlier this month, I was in San Francisco, CA, to attend Spark Summit 2017. I gave a talk on the phase before you can apply Machine Learning on data, using […]

A quick flashback on a few articles I published recently. You Are Not a Machine, So Learn Machine Learning published by Database Trends and Applications on February 21st, 2017. What Are Spark […]

Let’s understand what can checkpoints do for your Spark dataframes and go through a Java example on how we can use them. Checkpoint on Dataframe In v2.1.0, Apache Spark introduced […]

When you start an application, you need to think about where it’s going to run, and also how it’s going to run. Basically, the way I use Spark is in […]

UDF stands for User Defined Functions. With those, you can easily extend Apache Spark with your own routines and business logic. Let’s see how we can build them and deploy […]

Here are a few quick recipes to solve some common issues with Apache Spark. All examples are based on Java 8 (although I do not use consciously any of the […]

Here is your very first Apache Spark program using Java: the equivalent of the Kernighan and Ritchie’s “Hello, World”. You can download it from GitHub: Basically, the key is to […]

The Apache Software Foundation (ASF) offers a wide range of tools, libraries, frameworks, and data stores for building enterprise applications. The purpose of this list is to keep track of […]

I had a system crash with my Raspberry Pi during an update process. As a result the disk was corrupted and I needed to fix the trusted GPG keys as […]

Following the benchmarks I did on the Raspberry Pi, Programmez! Magazine has published more of my benchmarks, both on CPU and storage. I compare more extensively the CPU performance of the […]