This is really hot off the telescripter. Leading hosting company OVH is continuing its development in the United States and they will come to Virginia. OVH announced today its decision…

Zaloni’s CEO Ben Sharma is speaking about managing data lakes. What has happened is IT department starts by installing Hadoop and jumps into Big Data. Not a lot of companies…

Mica is data preparation tool, which can be used by anyone… It makes it a self-service data preparation software. Data scientists and engineers can use it for discovery, curation, and…

To help foster the Apache Spark community in the (Research) Triangle region (Raleigh, Durham, and Chapel Hill in North Carolina), with some friends, we decided to create a Slack team…

This week has seen the release of Apache Spark v2.0.0. As with every major releases, you can expect some changes. My Java recipes for Apache Spark have been affected, but…

Unlike the new iPhone, the release of Apache Spark v2.0.0 did not gather 1,000s of people in a room, but it is a very important event in the small world of…

When you start an application, you need to think about where it’s going to run, and also how it’s going to run. Basically, the way I use Spark is in…

UDF stands for User Defined Functions. With those, you can easily extend Apache Spark with your own routines and business logic. Let’s see how we can build them and deploy…

Here are a few quick recipes to solve some common issues with Apache Spark. All examples are based on Java 8 (although I do not use consciously any of the…

Successful first deployment of Apache Spark on a production server. Yep… I could add the line on my resume. Right now, we have set 24 cores, 72 GB of memory…