Drsti (pronounced drishti) is an effortless data visualization that interfaces easily with Apache Spark
Do your best work ever for Call for Code
I have been a mentor and judge for Call for Code. However, this year, I have other projects limiting my time to contribute to this world-changing initiative. That’s why I […]
Building Enterprise Software Today
I am an enterprise architect. Among the things I work on, I am bridging technology and business at the enterprise level. When I am on the technology side, I am talking to a lot of engineers and architects of various levels. When I am on the business side, I am trying to explain our technology constraints. That’s why I wanted to level-set vocabulary and concepts that I considered critical. I have cut the content into three twenty-minute videos available on YouTube.
Get your own copy of Spark in Action 2e
Spark in Action, second edition, has been out for about a month and was running a little low in some stores, so here is a convenient list of stores where […]
Hot July planning
Despite 2020 being a mess so far, and after a very calm period in terms of events, it’s time to get back on stage. I will have two similar events, […]
Awaited Apache Spark v3.0.0 is finally released
Apache Spark v3.0.0 has been released on June 18th, 2020, just before Spark + AI Summit 2020, which is being held virtually this year (like most big conferences). Spark v3 […]
DataFriday: manipulating the schemas of Spark dataframes
In this fifth episode of DataFriday, I will dig into the schema linked to each dataframe in Apache Spark. I will rename columns, create columns, and analyze the result. I […]
DataFriday: extracting metadata from photos
Following episode 3, where I talked about metadata in relational databases, this week, I am talking about metadata as found in digital photography (and specifically EXIF). Wikipedia defines EXIF as […]
DataFriday: what is Metadata?
In this episode, I will explain what is metadata, at least, some metadata, more specifically metadata on relational databases. It’s a quick introduction. I will also run a small Java […]
DataFriday: basic ETL ops with Apache Spark
In this episode, you will learn about doing a basic ETL (extract, transform, and load) operation using Apache Spark. You will load a basic CSV file with Apache Spark, make […]