Apache Spark v3.0.0 hits the road, let’s celebrate!

Apache Spark v3.0.0 has been released on June 18th, 2020, just before Spark + AI Summit 2020, which is being held virtually this year (like most big conferences). Spark v3 comes only a few days after Spark v2.4.6, a maintenance version on the v2.x branch.

The community worked very hard for this amazing release. Congratulations and thanks to the all the contributors!

More articles will come on the specifics of Spark v3. I wanted to share the release notes as soon as possible.

My favorite two features, for now, is enforcing Python v3, helping Python to move to a consistent feature and the leveraging GPU (graphics processing units) for ML (machine learning), by making Catalyst even more aware of the hardware made available to the optimizer. Share your favorite feature in the comments!

Here is the list of all the Spark repositories I maintain, indicating support for both Apache Spark v2.x and Spark v3.x. Most repositories are linked to Spark in Action, 2nd edition, which is up-to-date on Spark v3. Prior to the release of Spark v3, all the examples from the book were tested against Spark v3 preview 1 and preview 2.

RepositorySpark v2.xSpark v3.xMaster branch
Spark in Action, chapter 1v2.4.6v3.0.0v3.0.0
Spark in Action, chapter 2v2.4.6v3.0.0v3.0.0
Spark in Action, chapter 3v2.4.6v3.0.0v3.0.0
Spark in Action, chapter 4v2.4.4v3.0.0v3.0.0
Spark in Action, chapter 5v2.4.4v3.0.0v3.0.0
Spark in Action, chapter 7v3.0.0v3.0.0
Spark in Action, chapter 8v2.4.4v3.0.0v3.0.0
Spark in Action, chapter 9v2.4.4v3.0.0v3.0.0
Spark in Action, chapter 10v2.4.4v3.0.0v3.0.0
Spark in Action, chapter 11v2.4.4v3.0.0v3.0.0
Spark in Action, chapter 12v2.4.4v3.0.0v3.0.0
Spark in Action, chapter 13v3.0.0v3.0.0
Spark in Action, chapter 14v2.4.4v3.0.0v3.0.0
Spark in Action, chapter 15v2.4.4v3.0.0v3.0.0
Spark in Action, chapter 16v2.4.4v3.0.0v3.0.0
Spark in Action, chapter 17v2.4.4v3.0.0v3.0.0
Spark in Action, chapter 99v2.4.4v3.0.0v3.0.0
Spark Labs in Javav2.4.6v3.0.0v3.0.0
Soccer analytics with Apache SparkIn progressIn progress
List of Java-oriented repositories containing examples using Apache Spark, with their supported version

Other repsoirtories are left as-is and may not be updated to future versions of Apache Spark.

The repositories linked to the book (chapter 1 to chapter 17) also contain Python (PySpark) and Scala code. As specified with Spark v3 requirements, Python code only uses Python v3.

Updates

  • 2020-06-20 Update list of repositories as testing continues.
  • 2020-06-21 Validation of all chapters.