I have been a mentor and judge for Call for Code. However, this year, I have other projects limiting my time to contribute to this world-changing initiative. That’s why I…
Awaited Apache Spark v3.0.0 is finally released
Apache Spark v3.0.0 hits the road, let’s celebrate! Apache Spark v3.0.0 has been released on June 18th, 2020, just before Spark + AI Summit 2020, which is being held virtually…
DataFriday: manipulating the schemas of Spark dataframes
A stretch… Data is organized by schemas, data is stored on disk (or memory), but nothing like a good old school disk. to illustrate data In this fifth episode of…
DataFriday: extracting metadata from photos
This Rolleiflex requires a physical piece of paper and pencil to store the photo’s metadata Following episode 3, where I talked about metadata in relational databases, this week, I am…
DataFriday: what is Metadata?
Metadata is like the foundation of your data In this episode, I will explain what is metadata, at least, some metadata, more specifically metadata on relational databases. It’s a quick…
DataFriday: basic ETL ops with Apache Spark
In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make…
Spark in Action, Second Edition MEAP Update
I just wanted to share with you the latest update on Spark in Action, second edition What’s new? Chapter 12, “Transforming your data” Chapter 13, “Transforming entire documents” Appendix K,…
How I built the perfect data science team
When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering…
Advanced Spark Ingestion
Chapter 9 still covers Spark ingestion (like chapter 7 and chapter 8), but this time, it’s about “anything can become a Spark datasource.” When I was working for Zaloni, we…
Ingestion of data from databases into Apache Spark
Chapter 8 of Spark with Java is out and it covers ingestion, as did chapter 7. However, as chapter 7 was focusing on ingestion from files, chapter 8 focus on…
