Second day of Q&A around Data Mesh with IBM’s Technical Group about “ten lessons learned from building a Data Mesh.”

Drsti (pronounced drishti) is an effortless data visualization that interfaces easily with Apache Spark

Spark in Action, second edition, has been out for about a month and was running a little low in some stores, so here is a convenient list of stores where […]

Apache Spark v3.0.0 has been released on June 18th, 2020, just before Spark + AI Summit 2020, which is being held virtually this year (like most big conferences). Spark v3 […]

In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make […]

Starting today, I will host a weekly live show about data. You may join, attend “live,” and ask questions as I go through a data-oriented topic. For now, the topic […]

A new chapter of Spark in Action, 2e, (formerly known as Spark with Java) is available. Chapter 11 is titled “Working with SQL”. In chapter 11, you will explore how […]

As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book […]

Read about eight very hot predictions for data management in 2019, in usages, shapes, governance, and people.

A couple of weeks ago, I chatted about Apache Spark with Tobias Macey on data engineering on more specifically Apache Spark. Tobias Macey runs the data engineering podcast, which you can directly […]