Data Mesh raises more questions, here are the answers

Second day of Q&A around Data Mesh with IBM’s Technical Group about “ten lessons learned from building a Data Mesh.”

It’s painful how much data rules the world

On September 15th, 2021, after more than 18 months, I was finally able to give a talk in person. My conference schedule did not really go down during the pandemic, […]

Bringing vision to Apache Spark

Drsti (pronounced drishti) is an effortless data visualization that interfaces easily with Apache Spark

Get your own copy of Spark in Action 2e

Spark in Action, second edition, has been out for about a month and was running a little low in some stores, so here is a convenient list of stores where […]

Hot July planning

Despite 2020 being a mess so far, and after a very calm period in terms of events, it’s time to get back on stage. I will have two similar events, […]

Awaited Apache Spark v3.0.0 is finally released

Apache Spark v3.0.0 has been released on June 18th, 2020, just before Spark + AI Summit 2020, which is being held virtually this year (like most big conferences). Spark v3 […]

DataFriday: manipulating the schemas of Spark dataframes

In this fifth episode of DataFriday, I will dig into the schema linked to each dataframe in Apache Spark. I will rename columns, create columns, and analyze the result. I […]

DataFriday: basic ETL ops with Apache Spark

In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make […]

DataFriday: load a CSV file with Apache Spark

Starting today, I will host a weekly live show about data. You may join, attend “live,” and ask questions as I go through a data-oriented topic. For now, the topic […]

Spark in Action, Second Edition MEAP Update

I just wanted to share with you the latest update on Spark in Action, second edition What’s new? Chapter 12, “Transforming your data“ Chapter 13, “Transforming entire documents“ Appendix K, […]

Apache Spark

Data Mesh raises more questions, here are the answers

It’s painful how much data rules the world

Bringing vision to Apache Spark

Get your own copy of Spark in Action 2e

Hot July planning

Like this:

Awaited Apache Spark v3.0.0 is finally released

Like this:

DataFriday: manipulating the schemas of Spark dataframes

Like this:

DataFriday: basic ETL ops with Apache Spark

Like this:

DataFriday: load a CSV file with Apache Spark

Like this:

Spark in Action, Second Edition MEAP Update

Let's be social

@jgperrin

/jgperrin

/jgperrin

Help share:

Help share:

Help share:

Help share:

Help share:

Like this:

Help share:

Like this:

Help share:

Like this:

Help share:

Like this:

Help share:

Like this:

Help share:

Let's be social

@jgperrin

/jgperrin

/jgperrin