On September 15th, 2021, after more than 18 months, I was finally able to give a talk in person. My conference schedule did not really go down during the pandemic, […]
Bringing vision to Apache Spark
Drsti (pronounced drishti) is an effortless data visualization that interfaces easily with Apache Spark
Get your own copy of Spark in Action 2e
Spark in Action, second edition, has been out for about a month and was running a little low in some stores, so here is a convenient list of stores where […]
DataFriday: manipulating the schemas of Spark dataframes
In this fifth episode of DataFriday, I will dig into the schema linked to each dataframe in Apache Spark. I will rename columns, create columns, and analyze the result. I […]
DataFriday: extracting metadata from photos
Following episode 3, where I talked about metadata in relational databases, this week, I am talking about metadata as found in digital photography (and specifically EXIF). Wikipedia defines EXIF as […]
DataFriday: what is Metadata?
In this episode, I will explain what is metadata, at least, some metadata, more specifically metadata on relational databases. It’s a quick introduction. I will also run a small Java […]
DataFriday: basic ETL ops with Apache Spark
In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make […]
DataFriday: load a CSV file with Apache Spark
Starting today, I will host a weekly live show about data. You may join, attend “live,” and ask questions as I go through a data-oriented topic. For now, the topic […]
Spark in Action, Second Edition MEAP Update
I just wanted to share with you the latest update on Spark in Action, second edition What’s new? Chapter 12, “Transforming your data“ Chapter 13, “Transforming entire documents“ Appendix K, […]
How I built the perfect data science team
When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering […]