DrstiĀ (pronounced drishti) is an effortless data visualization that interfaces easily with Apache Spark
DataFriday: manipulating the schemas of Spark dataframes
A stretch… Data is organized by schemas, data is stored on disk (or memory), but nothing like a good old school disk. to illustrate data In this fifth episode of…
DataFriday: extracting metadata from photos
This Rolleiflex requires a physical piece of paper and pencil to store the photo’s metadata Following episode 3, where I talked about metadata in relational databases, this week, I am…
DataFriday: what is Metadata?
Metadata is like the foundation of your data In this episode, I will explain what is metadata, at least, some metadata, more specifically metadata on relational databases. It’s a quick…
DataFriday: basic ETL ops with Apache Spark
In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make…
Ingestion of data from databases into Apache Spark
Chapter 8 of Spark with Java is out and it covers ingestion, as did chapter 7. However, as chapter 7 was focusing on ingestion from files, chapter 8 focus on…
The Key to Machine Learning is Prepping the Right Data
Earlier this month, I was in San Francisco, CA, to attend Spark Summit 2017. I gave a talk on the phase before you can apply Machine Learning on data, using…
HDP 2.6 is Out: Spark 2, Hive 2, and Zeppelin 0.7 are GA
Hortonworks Data Platform (HDP) v2.6 has been released and you can download the platform from their website. The sandbox is not yet available in v2.6. New Versions of Key Components…
Spark is a Hadoop Killer
Of course, nobody will tell you I am right. At least officially. But at was what was goal of Hadoop? Perform analytics over a wide range of servers. Of course,…
Your Very First Apache Spark Application
Here is your very first Apache Spark program using Java: the equivalent of the Kernighan and Ritchie’s “Hello, World”. You can download it from GitHub: Basically, the key is to…
