As AI ramifies in our society, as citizens, we need to demand explainability to guarantee trust.
Data SLA
In this article, I am trying to join data quality and service-level indicators in the context of data management. Since I wrote my article about the CACTAR acronym for Data […]
Five essential & free resources for learning Data Science
Giving talks and writing about tech is something I am passionate about. Why learn something if it is to keep it for yourself? This article is meant for all the […]
It’s painful how much data rules the world
On September 15th, 2021, after more than 18 months, I was finally able to give a talk in person. My conference schedule did not really go down during the pandemic, […]
Bringing vision to Apache Spark
Drsti (pronounced drishti) is an effortless data visualization that interfaces easily with Apache Spark
Do your best work ever for Call for Code
I have been a mentor and judge for Call for Code. However, this year, I have other projects limiting my time to contribute to this world-changing initiative. That’s why I […]
Awaited Apache Spark v3.0.0 is finally released
Apache Spark v3.0.0 has been released on June 18th, 2020, just before Spark + AI Summit 2020, which is being held virtually this year (like most big conferences). Spark v3 […]
DataFriday: manipulating the schemas of Spark dataframes
In this fifth episode of DataFriday, I will dig into the schema linked to each dataframe in Apache Spark. I will rename columns, create columns, and analyze the result. I […]
DataFriday: extracting metadata from photos
Following episode 3, where I talked about metadata in relational databases, this week, I am talking about metadata as found in digital photography (and specifically EXIF). Wikipedia defines EXIF as […]
DataFriday: what is Metadata?
In this episode, I will explain what is metadata, at least, some metadata, more specifically metadata on relational databases. It’s a quick introduction. I will also run a small Java […]