Chapter 9 is now available on MEAP, let me show you the path to advanced ingestion inĀ Spark
Chapter 9 is now available on MEAP, let me show you the path to advanced ingestion inĀ Spark

Chapter 9 still covers Spark ingestion (like chapter 7 and chapter 8), but this time, itā€™s aboutĀ  “anything can become a Spark datasource.”

When I was working for Zaloni, we had the issue of creating a custom data source for Spark, so we could connect to a REST endpoint. The team worked thoroughly to make this in Java. I presented this work at Spark Summit in Dublin last year, and now it is one of the topics of chapter 9 of Spark with Java, published in MEAP (Manning early access program) at Manning Publications.

In this chapter, you will see how to find third-party data sources for ingestion, understand the benefits of building your own data source, build your own data source, and, finally build a JavaBean data source, allowing you to have anything as a data source to Spark. Sweet, no? The practical example I chose is to import EXIF (photo metadata) as a dataframe. Thereā€™s a lot of code in this chapter, and itā€™s all Java.

Appendix S is out too, standing for enough of Scala. Nope: it is not a diatribe against Scala. Promised.

The chapterā€™s code is on GitHub. The book is available on Manningā€™s website.

This chapter is released as the first set of reviews for the book are coming (around the first third of the book). I must admit I am pretty pleased with the feedback, despite some technical issues that impaired the review. Here are a few of the quotes.

I would say that this is the best book on Spark I’ve read. -Kelvin Johnson


One of the most simple, but powerful introductions and dive-ins that you can ever have on an Apache library! -Igor Franca

Another one:

A great book for beginners and prospective experts. -Markus Breuer

Could be worse, right? Promised, I did not pay them. As a result, overall, the group,Ā composed of twelve reviewers, granted me a 4.0 Amazon-like ratingā€¦ Not so bad. What would you rate it?

On a side note, I will be speaking at NC Tech on August 22nd 2018 and All Things Open, the great open source event on OctoberĀ 21st to 23rd, 2018. You probably guessed the topic…