Drsti (pronounced drishti) is an effortless data visualization that interfaces easily with Apache Spark
DataFriday: manipulating the schemas of Spark dataframes
In this fifth episode of DataFriday, I will dig into the schema linked to each dataframe in Apache Spark. I will rename columns, create columns, and analyze the result. I […]
DataFriday: extracting metadata from photos
Following episode 3, where I talked about metadata in relational databases, this week, I am talking about metadata as found in digital photography (and specifically EXIF). Wikipedia defines EXIF as […]
DataFriday: what is Metadata?
In this episode, I will explain what is metadata, at least, some metadata, more specifically metadata on relational databases. It’s a quick introduction. I will also run a small Java […]
DataFriday: basic ETL ops with Apache Spark
In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make […]
Ingestion of data from databases into Apache Spark
Chapter 8 of Spark with Java is out and it covers ingestion, as did chapter 7. However, as chapter 7 was focusing on ingestion from files, chapter 8 focus on […]
The Key to Machine Learning is Prepping the Right Data
Earlier this month, I was in San Francisco, CA, to attend Spark Summit 2017. I gave a talk on the phase before you can apply Machine Learning on data, using […]
HDP 2.6 is Out: Spark 2, Hive 2, and Zeppelin 0.7 are GA
Hortonworks Data Platform (HDP) v2.6 has been released and you can download the platform from their website. The sandbox is not yet available in v2.6. New Versions of Key Components […]
Spark is a Hadoop Killer
Of course, nobody will tell you I am right. At least officially. But at was what was goal of Hadoop? Perform analytics over a wide range of servers. Of course, […]
Your Very First Apache Spark Application
Here is your very first Apache Spark program using Java: the equivalent of the Kernighan and Ritchie’s “Hello, World”. You can download it from GitHub: Basically, the key is to […]