The checklist is updated for ATO 2019!
All Things Open 2018 (ATO 2018), a premier open source conference, will open its doors on October 21st 2018 in the Raleigh Convention Center, in the heart of North Carolina’s capital.
I will be giving a double session on October 22nd, titled Big Data Made Easy With a Spark. As you can easily imagine, I am going to speak Big Data, Java, and Spark. If you are interested in those topics, come over and have fun and get a chance to win a copy of Spark in Action, second edition.
The promise of this session is:
In this hands-on session, you will learn how to do a full Big Data scenario from ingestion to publication. You will see how we can use Java and Apache Spark to ingest data, perform some transformations, save the data. You will then perform a second lab where you will run your very first Machine Learning algorithm!
As you can see in the title, it’s a hands-on tutorial. It means, you will have to do things on your own! Don’t think it’s this kind of lessons where you just have a seat and listen…
However, seriously, to make things smoother, read this quick article and try to have the material downloaded and installed. It will simplify our work!
Prerequisites
Make sure:
- You have administrator access to your machine.
- You have the right to install stuff on your machine.
Material to download & install
- You will need a JDK (Java Development Kit) v8 on your machine, you can download at http://bit.ly/javadk8 (or https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html). Note that I will only use JDK 8 in this tutorial. Unfortunately, I will not be able to support any other version of Java, yes, I still like Java 8.
- To ease our development experience, I will use Eclipse as my IDE. I am not opposed to other products, however I will not be able to support any other IDE or Eclipse prior to Oxygen (Eclipse 4.7). You can download Eclipse at https://www.eclipse.org/downloads/packages/. On my side, I will use Eclipse 2019-09 on stage.
- Nice to have but not required: Maven, SourceTree, or git on the command line.
Source code
Lab #1 – file ingestion
The code can be downloaded from the examples of my book Spark in Action, second edition. Go to: https://github.com/jgperrin/net.jgp.books.spark.ch01.
Lab #2 – a bit of analytics
This example is in chapter 15, where I speak about aggregations. Logically, you can access the code at: https://github.com/jgperrin/net.jgp.books.spark.ch15.
Lab #3 – an even smaller bit of AI (artificial intelligence)
This example is not in the book (and will never be). Chapter 99 is the secret bonus chapter with extra stuff). You can download from GitHub at at: https://github.com/jgperrin/net.jgp.books.spark.ch99.
Slides
The 2018 slides are available SlideShare. 2019 slides will be added after the 2019 conference.
https://www.slideshare.net/jgperrin/big-data-made-easy-with-a-spark/jgperrin/big-data-made-easy-with-a-spark
Please share your feedback in the comments below or via Twitter, my handle is @jgperrin.
Update
- 2018-10-24 slides and link to slides added.