On September 15th, 2021, after more than 18 months, I was finally able to give a talk in person. My conference schedule did not really go down during the pandemic, […]
Bringing vision to Apache Spark
Drsti (pronounced drishti) is an effortless data visualization that interfaces easily with Apache Spark
DataFriday: manipulating the schemas of Spark dataframes
In this fifth episode of DataFriday, I will dig into the schema linked to each dataframe in Apache Spark. I will rename columns, create columns, and analyze the result. I […]
DataFriday: extracting metadata from photos
Following episode 3, where I talked about metadata in relational databases, this week, I am talking about metadata as found in digital photography (and specifically EXIF). Wikipedia defines EXIF as […]
DataFriday: what is Metadata?
In this episode, I will explain what is metadata, at least, some metadata, more specifically metadata on relational databases. It’s a quick introduction. I will also run a small Java […]
DataFriday: basic ETL ops with Apache Spark
In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make […]
DataFriday: load a CSV file with Apache Spark
Starting today, I will host a weekly live show about data. You may join, attend “live,” and ask questions as I go through a data-oriented topic. For now, the topic […]
Spark in Action, Second Edition MEAP Update
I just wanted to share with you the latest update on Spark in Action, second edition What’s new? Chapter 12, “Transforming your data“ Chapter 13, “Transforming entire documents“ Appendix K, […]
How I built the perfect data science team
When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering […]
Spark in Action’s Chapter Eleven on Working with SQL is in MEAP
A new chapter of Spark in Action, 2e, (formerly known as Spark with Java) is available. Chapter 11 is titled “Working with SQL”. In chapter 11, you will explore how […]