I like to use π day, to remember a few things about science and technology that influence who I am. This year, π day is perfect for that. Let me tell…

When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering…

Before thinking about what is the outcome of data science, maybe I should take the two seconds I think it takes to define it. As how to define data science,…

A new chapter of Spark in Action, 2e, (formerly known as Spark with Java) is available. Chapter 11 is titled “Working with SQL”. In chapter 11, you will explore how…

As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book…

Read about eight very hot predictions for data management in 2019, in usages, shapes, governance, and people.

A couple of weeks ago, I chatted about Apache Spark with Tobias Macey on data engineering on more specifically Apache Spark. Tobias Macey runs the data engineering podcast, which you can directly…

The checklist is updated for ATO 2019! All Things Open 2018 (ATO 2018), a premier open source conference, will open its doors on October 21st 2018 in the Raleigh Convention Center,…

Yesterday, during Ignite 2018, Microsoft announced that they will integrate Apache Spark more tightly with SQL Server 2019. If you missed previous announcements around SQL Server, it now runs on…

This new chapter, chapter 4, of Spark with Java ( is not only about celebrating laziness, it also teaches, through examples and experiments, the fundamental differences in building a data…