DataFriday is back with a focus on architects

After a (too long) hiatus, DataFriday is back. The first episode of the new season was released last Friday, January 14, 2022. It focuses on defining Enterprise Architects and how they are perceived and what they really bring to the enterprise.

DataFriday: manipulating the schemas of Spark dataframes

In this fifth episode of DataFriday, I will dig into the schema linked to each dataframe in Apache Spark. I will rename columns, create columns, and analyze the result. I […]

DataFriday: extracting metadata from photos

Following episode 3, where I talked about metadata in relational databases, this week, I am talking about metadata as found in digital photography (and specifically EXIF). Wikipedia defines EXIF as […]

DataFriday: what is Metadata?

In this episode, I will explain what is metadata, at least, some metadata, more specifically metadata on relational databases. It’s a quick introduction. I will also run a small Java […]

DataFriday: basic ETL ops with Apache Spark

In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make […]

DataFriday: load a CSV file with Apache Spark

Starting today, I will host a weekly live show about data. You may join, attend “live,” and ask questions as I go through a data-oriented topic. For now, the topic […]

DataFriday

DataFriday is back with a focus on architects

DataFriday: manipulating the schemas of Spark dataframes

Like this:

DataFriday: extracting metadata from photos

Like this:

DataFriday: what is Metadata?

Like this:

DataFriday: basic ETL ops with Apache Spark

Like this:

DataFriday: load a CSV file with Apache Spark

Like this:

Let's be social

@jgperrin

/jgperrin

/jgperrin

Help share:

Help share:

Like this:

Help share:

Like this:

Help share:

Like this:

Help share:

Like this:

Help share:

Like this:

Let's be social

@jgperrin

/jgperrin

/jgperrin