Advanced Spark Ingestion

Chapter 9 is now available on MEAP, let me show you the path to advanced ingestion in Spark

Chapter 9 still covers Spark ingestion (like chapter 7 and chapter 8), but this time, it’s about “anything can become a Spark datasource.”

When I was working for Zaloni, we had the issue of creating a custom data source for Spark, so we could connect to a REST endpoint. The team worked thoroughly to make this in Java. I presented this work at Spark Summit in Dublin last year, and now it is one of the topics of chapter 9 of Spark with Java, published in MEAP (Manning early access program) at Manning Publications.

In this chapter, you will see how to find third-party data sources for ingestion, understand the benefits of building your own data source, build your own data source, and, finally build a JavaBean data source, allowing you to have anything as a data source to Spark. Sweet, no? The practical example I chose is to import EXIF (photo metadata) as a dataframe. There’s a lot of code in this chapter, and it’s all Java.

Appendix S is out too, standing for enough of Scala. Nope: it is not a diatribe against Scala. Promised.

The chapter’s code is on GitHub. The book is available on Manning’s website.

This chapter is released as the first set of reviews for the book are coming (around the first third of the book). I must admit I am pretty pleased with the feedback, despite some technical issues that impaired the review. Here are a few of the quotes.

I would say that this is the best book on Spark I’ve read. -Kelvin Johnson

And:

One of the most simple, but powerful introductions and dive-ins that you can ever have on an Apache library! -Igor Franca

Another one:

A great book for beginners and prospective experts. -Markus Breuer

Could be worse, right? Promised, I did not pay them. As a result, overall, the group, composed of twelve reviewers, granted me a 4.0 Amazon-like rating… Not so bad. What would you rate it?

On a side note, I will be speaking at NC Tech on August 22nd 2018 and All Things Open, the great open source event on October 21st to 23rd, 2018. You probably guessed the topic…

Advanced Spark Ingestion

Let's be social

jgperrin.substack

/in/jgperrin

/jgperrin

Help share:

Let's be social

jgperrin.substack

/in/jgperrin

/jgperrin