Standing in front of the convention center, next to the statue of Sir Walter Raleigh. On September 15th, 2021, after more than 18 months, I was finally able to give…

Drsti (pronounced drishti) is an effortless data visualization that interfaces easily with Apache Spark

Spark in Action, second edition is a favorite for the Big Bag Theory gang Spark in Action, second edition, has been out for about a month and was running a…

In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make…

Starting today, I will host a weekly live show about data. You may join, attend “live,” and ask questions as I go through a data-oriented topic. For now, the topic…

I just wanted to share with you the latest update on Spark in Action, second edition What’s new? Chapter 12, “Transforming your data” Chapter 13, “Transforming entire documents” Appendix K,…

When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering…