Hortonworks Data Platform (HDP) v2.6 has been released and you can download the platform from their website. The sandbox is not yet available in v2.6.
New Versions of Key Components
Although it does not contain the latest version of Apache Hadoop, v2.8.0, which was probably released too late to make the cut, it contains other nice evolutions of key features. Apache Spark v2.1 is now GA (General Availability) compared to a preview in the previous HDP version. Same thing for Hive, where v2.1 is the standard (and a faster standard).
Two technology previews which are on a good trajectory are also included: Apache Zeppelin v0.7.0 and Apache Atlas v0.8.0. Zeppelin is a powerful system of notebooks to enable experimenting with data. Atlas is a solution for managing Metadata (more on that later).
Focus on Data Science
If you are a data scientist, you will be excited by:
- Extensive support for machine learning algorithms available from Spark v2.1 (Spark v1.6.3 is also there if your app is not compatible with v2.x), Zeppelin v0.7 and Livy REST API. I will cover Livy soon.
- Hive LLAP for Production, Hortonworks claims a 10x gain through faster join performance with dynamic runtime filtering.
- Package support for PySpark (Spark Python API) is included. SparkR, the latest greatest thing to Spark is also there. With both those features, data scientists using Spark with R or Python can now deploy their favorite R or Python package with their Spark job.
Other features I like is support for Ubuntu v16.04 (Xenial), all my test systems have been running the latest version of Ubuntu and it was really a problem for me as I did not want to reinstall all the OSes. With the support of Ubuntu v16, comes support for MySQL v5.6 (system requirements).
If you are ready for it, simply jump to the HDP documentation, and start by reading Apache Ambari installation. I can’t wait to do it!
Comments are closed.