Drsti (pronounced drishti) is an effortless data visualization that interfaces easily with Apache Spark

In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make […]

Chapter 8 of Spark with Java is out and it covers ingestion, as did chapter 7. However, as chapter 7 was focusing on ingestion from files, chapter 8 focus on […]

Earlier this month, I was in San Francisco, CA, to attend Spark Summit 2017. I gave a talk on the phase before you can apply Machine Learning on data, using […]

Hortonworks Data Platform (HDP) v2.6 has been released and you can download the platform from their website. The sandbox is not yet available in v2.6. New Versions of Key Components […]

Of course, nobody will tell you I am right. At least officially. But at was what was goal of Hadoop? Perform analytics over a wide range of servers. Of course, […]

Here is your very first Apache Spark program using Java: the equivalent of the Kernighan and Ritchie’s “Hello, World”. You can download it from GitHub: Basically, the key is to […]