Right before Halloween, from October 24th to October 27th, I went to WoW. Of course, when I told that to my kids they assumed I was going to play World […]

This is really hot off the telescripter. Leading hosting company OVH is continuing its development in the United States and they will come to Virginia. OVH announced today its decision […]

Zaloni’s CEO Ben Sharma is speaking about managing data lakes. What has happened is IT department starts by installing Hadoop and jumps into Big Data. Not a lot of companies […]

Mica is data preparation tool, which can be used by anyone… It makes it a self-service data preparation software. Data scientists and engineers can use it for discovery, curation, and […]

To help foster the Apache Spark community in the (Research) Triangle region (Raleigh, Durham, and Chapel Hill in North Carolina), with some friends, we decided to create a Slack team […]

This week has seen the release of Apache Spark v2.0.0. As with every major releases, you can expect some changes. My Java recipes for Apache Spark have been affected, but […]

Unlike the new iPhone, the release of Apache Spark v2.0.0 did not gather 1,000s of people in a room, but it is a very important event in the small world of […]

When you start an application, you need to think about where it’s going to run, and also how it’s going to run. Basically, the way I use Spark is in […]

UDF stands for User Defined Functions. With those, you can easily extend Apache Spark with your own routines and business logic. Let’s see how we can build them and deploy […]

Here are a few quick recipes to solve some common issues with Apache Spark. All examples are based on Java 8 (although I do not use consciously any of the […]