In this fifth episode of DataFriday, I will dig into the schema linked to each dataframe in Apache Spark. I will rename columns, create columns, and analyze the result. I […]
DataFriday: extracting metadata from photos
Following episode 3, where I talked about metadata in relational databases, this week, I am talking about metadata as found in digital photography (and specifically EXIF). Wikipedia defines EXIF as […]
DataFriday: what is Metadata?
In this episode, I will explain what is metadata, at least, some metadata, more specifically metadata on relational databases. It’s a quick introduction. I will also run a small Java […]
DataFriday: basic ETL ops with Apache Spark
In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make […]
How I built the perfect data science team
When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering […]
Eleven key elements of data science outcome
Before thinking about what is the outcome of data science, maybe I should take the two seconds I think it takes to define it. As how to define data science, […]
(Almost) All you need to know about file ingestion in Apache Spark
As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book […]
Eight very hot data trends for 2019
Read about eight very hot predictions for data management in 2019, in usages, shapes, governance, and people.
What is Apache Spark, the podcast
A couple of weeks ago, I chatted about Apache Spark with Tobias Macey on data engineering on more specifically Apache Spark. Tobias Macey runs the data engineering podcast, which you can directly […]
Checklist for All Things Open (ATO)
The checklist is updated for ATO 2019! All Things Open 2018 (ATO 2018), a premier open source conference, will open its doors on October 21st 2018 in the Raleigh Convention Center, […]