I have been a mentor and judge for Call for Code. However, this year, I have other projects limiting my time to contribute to this world-changing initiative. That’s why I […]

Apache Spark v3.0.0 has been released on June 18th, 2020, just before Spark + AI Summit 2020, which is being held virtually this year (like most big conferences). Spark v3 […]

In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make […]

I just wanted to share with you the latest update on Spark in Action, second edition What’s new? Chapter 12, “Transforming your data“ Chapter 13, “Transforming entire documents“ Appendix K, […]

When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering […]

Chapter 9 still covers Spark ingestion (like chapter 7 and chapter 8), but this time, it’s about  “anything can become a Spark datasource.” When I was working for Zaloni, we […]

Chapter 8 of Spark with Java is out and it covers ingestion, as did chapter 7. However, as chapter 7 was focusing on ingestion from files, chapter 8 focus on […]