It’s painful how much data rules the world

Standing in front of the convention center, next to the statue of Sir Walter Raleigh. On September 15th, 2021, after more than 18 months, I was finally able to give…

Bringing vision to Apache Spark

Drsti (pronounced drishti) is an effortless data visualization that interfaces easily with Apache Spark

DataFriday: manipulating the schemas of Spark dataframes

A stretch… Data is organized by schemas, data is stored on disk (or memory), but nothing like a good old school disk. to illustrate data In this fifth episode of…

DataFriday: extracting metadata from photos

This Rolleiflex requires a physical piece of paper and pencil to store the photo’s metadata Following episode 3, where I talked about metadata in relational databases, this week, I am…

DataFriday: what is Metadata?

Metadata is like the foundation of your data In this episode, I will explain what is metadata, at least, some metadata, more specifically metadata on relational databases. It’s a quick…

DataFriday: basic ETL ops with Apache Spark

In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make…

How I built the perfect data science team

When I assembled my first data science team, the term was barely getting printed in the Harvard Business Review. I had no clue that I was building a team pioneering…

Eleven key elements of data science outcome

Before thinking about what is the outcome of data science, maybe I should take the two seconds I think it takes to define it. As how to define data science,…

(Almost) All you need to know about file ingestion in Apache Spark

As you may know, I start writing Apache Spark with Java (now renamed Spark in Action, 2nd edition). Usually, as the book develops, authors share a few excerpt of the book…

Eight very hot data trends for 2019

Read about eight very hot predictions for data management in 2019, in usages, shapes, governance, and people.

TechWork

It’s painful how much data rules the world

Bringing vision to Apache Spark

DataFriday: manipulating the schemas of Spark dataframes

Like this:

DataFriday: extracting metadata from photos

Like this:

DataFriday: what is Metadata?

Like this:

DataFriday: basic ETL ops with Apache Spark

Like this:

How I built the perfect data science team

Eleven key elements of data science outcome

(Almost) All you need to know about file ingestion in Apache Spark

Eight very hot data trends for 2019

Let's be social

jgperrin.substack

/in/jgperrin

/jgperrin

Help share:

Help share:

Help share:

Like this:

Help share:

Like this:

Help share:

Like this:

Help share:

Like this:

Help share:

Help share:

Help share:

Help share:

Let's be social

jgperrin.substack

/in/jgperrin

/jgperrin