After a (too long) hiatus, DataFriday is back. The first episode of the new season was released last Friday, January 14, 2022. It focuses on defining Enterprise Architects and how they are perceived and what they really bring to the enterprise.
DataFriday: manipulating the schemas of Spark dataframes
A stretch… Data is organized by schemas, data is stored on disk (or memory), but nothing like a good old school disk. to illustrate data In this fifth episode of…
DataFriday: extracting metadata from photos
This Rolleiflex requires a physical piece of paper and pencil to store the photo’s metadata Following episode 3, where I talked about metadata in relational databases, this week, I am…
DataFriday: what is Metadata?
Metadata is like the foundation of your data In this episode, I will explain what is metadata, at least, some metadata, more specifically metadata on relational databases. It’s a quick…
DataFriday: basic ETL ops with Apache Spark
In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make…
DataFriday: load a CSV file with Apache Spark
Starting today, I will host a weekly live show about data. You may join, attend “live,” and ask questions as I go through a data-oriented topic. For now, the topic…
