Before thinking about what is the outcome of data science, maybe I should take the two seconds I think it takes to define it. As how to define data science, […]

As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book […]

Read about eight very hot predictions for data management in 2019, in usages, shapes, governance, and people.

A couple of weeks ago, I chatted about Apache Spark with Tobias Macey on data engineering on more specifically Apache Spark. Tobias Macey runs the data engineering podcast, which you can directly […]

The checklist is updated for ATO 2019! All Things Open 2018 (ATO 2018), a premier open source conference, will open its doors on October 21st 2018 in the Raleigh Convention Center, […]

Yesterday, during Ignite 2018, Microsoft announced that they will integrate Apache Spark more tightly with SQL Server 2019. If you missed previous announcements around SQL Server, it now runs on […]

This new chapter, chapter 4, of Spark with Java (https://www.manning.com/books/spark-with-java) is not only about celebrating laziness, it also teaches, through examples and experiments, the fundamental differences in building a data […]

Chapter 3 of Spark with Java is focusing on the dataframe. There is something majestic with Apache Spark’s dataframe, like those mountains of Montana. Apache Spark revolves around the concept of […]

Chapter 9 still covers Spark ingestion (like chapter 7 and chapter 8), but this time, it’s about  “anything can become a Spark datasource.” When I was working for Zaloni, we […]