Apache Spark v3.0.0 has been released on June 18th, 2020, just before Spark + AI Summit 2020, which is being held virtually this year (like most big conferences). Spark v3 comes only a few days after Spark v2.4.6, a maintenance version on the v2.x branch.
The community worked very hard for this amazing release. Congratulations and thanks to the all the contributors!
More articles will come on the specifics of Spark v3. I wanted to share the release notes as soon as possible.
My favorite two features, for now, is enforcing Python v3, helping Python to move to a consistent feature and the leveraging GPU (graphics processing units) for ML (machine learning), by making Catalyst even more aware of the hardware made available to the optimizer. Share your favorite feature in the comments!
Here is the list of all the Spark repositories I maintain, indicating support for both Apache Spark v2.x and Spark v3.x. Most repositories are linked to Spark in Action, 2nd edition, which is up-to-date on Spark v3. Prior to the release of Spark v3, all the examples from the book were tested against Spark v3 preview 1 and preview 2.
Repository | Spark v2.x | Spark v3.x | Master branch |
---|---|---|---|
Spark in Action, chapter 1 | v2.4.6 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 2 | v2.4.6 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 3 | v2.4.6 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 4 | v2.4.4 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 5 | v2.4.4 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 7 | v3.0.0 | v3.0.0 | |
Spark in Action, chapter 8 | v2.4.4 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 9 | v2.4.4 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 10 | v2.4.4 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 11 | v2.4.4 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 12 | v2.4.4 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 13 | v3.0.0 | v3.0.0 | |
Spark in Action, chapter 14 | v2.4.4 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 15 | v2.4.4 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 16 | v2.4.4 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 17 | v2.4.4 | v3.0.0 | v3.0.0 |
Spark in Action, chapter 99 | v2.4.4 | v3.0.0 | v3.0.0 |
Spark Labs in Java | v2.4.6 | v3.0.0 | v3.0.0 |
Soccer analytics with Apache Spark | In progress | In progress |
Other repsoirtories are left as-is and may not be updated to future versions of Apache Spark.
The repositories linked to the book (chapter 1 to chapter 17) also contain Python (PySpark) and Scala code. As specified with Spark v3 requirements, Python code only uses Python v3.
Updates
- 2020-06-20 Update list of repositories as testing continues.
- 2020-06-21 Validation of all chapters.