Standing in front of the convention center, next to the statue of Sir Walter Raleigh. On September 15th, 2021, after more than 18 months, I was finally able to give…
Bringing vision to Apache Spark
Drsti (pronounced drishti) is an effortless data visualization that interfaces easily with Apache Spark
Get your own copy of Spark in Action 2e
Spark in Action, second edition is a favorite for the Big Bag Theory gang Spark in Action, second edition, has been out for about a month and was running a…
DataFriday: manipulating the schemas of Spark dataframes
A stretch… Data is organized by schemas, data is stored on disk (or memory), but nothing like a good old school disk. to illustrate data In this fifth episode of…
DataFriday: extracting metadata from photos
This Rolleiflex requires a physical piece of paper and pencil to store the photo’s metadata Following episode 3, where I talked about metadata in relational databases, this week, I am…
DataFriday: what is Metadata?
Metadata is like the foundation of your data In this episode, I will explain what is metadata, at least, some metadata, more specifically metadata on relational databases. It’s a quick…
DataFriday: basic ETL ops with Apache Spark
In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make…
DataFriday: load a CSV file with Apache Spark
Starting today, I will host a weekly live show about data. You may join, attend “live,” and ask questions as I go through a data-oriented topic. For now, the topic…
Spark in Action, Second Edition MEAP Update
I just wanted to share with you the latest update on Spark in Action, second edition What’s new? Chapter 12, “Transforming your data” Chapter 13, “Transforming entire documents” Appendix K,…
How I built the perfect data science team
When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering…
