In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make […]

Starting today, I will host a weekly live show about data. You may join, attend “live,” and ask questions as I go through a data-oriented topic. For now, the topic […]

I just wanted to share with you the latest update on Spark in Action, second edition What’s new? Chapter 12, “Transforming your data“ Chapter 13, “Transforming entire documents“ Appendix K, […]

This article follows my previous post on my trip to NASA in Langley, Va. in early March 2019. This is the second and last part of this article. Before going […]

I like to use π day, to remember a few things about science and technology that influence who I am. This year, Ï€ day is perfect for that. Let me tell […]

When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering […]

Before thinking about what is the outcome of data science, maybe I should take the two seconds I think it takes to define it. As how to define data science, […]

A new chapter of Spark in Action, 2e, (formerly known as Spark with Java) is available. Chapter 11 is titled “Working with SQL”. In chapter 11, you will explore how […]

As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book […]

Read about eight very hot predictions for data management in 2019, in usages, shapes, governance, and people.