Spark is everywhere, let's celebrate!
Spark is everywhere, let’s celebrate!

Yesterday, during Ignite 2018, Microsoft announced that they will integrate Apache Spark more tightly with SQL Server 2019.

If you missed previous announcements around SQL Server, it now runs on Linux and in containers. I was even able to run it in on my Mac fairly easily (and I am less of a tinker now than I used to be). Clearly SQL Server is not anymore the RDBMS for Windows only, but embraces a broader spectrum of features. I am not trying to start a database road war, but some of the features were available in other engines long before. Data virtualization through the PolyBase external tables sounds very similar to virtual tables. Virtual tables were introduced in 1996 by Informix with their datablades.

Microsoft’s goal is to provide an integrated analytics platform to provide AI (artificial intelligence) services directly in SQL Server.

On the packaging side, SQL Server 2019 integrates HDFS (Hadoop file system), Spark, Knox, Ranger, and Livy. HDFS is a smart move as it allows better data virtualization. However, I am still debating on Livy. I love the idea behind Livy as it provides a REST interface to Spark. Previously, my teams had to build two REST interfaces on top of Spark for two different companies who needed a lightweight interface. Lightweight is not how I define Livy. With the choice of Livy, Knox, and Ranger, it also means that Microsoft is going more and more towards Hortonworks (like IBM). One can wonder how the game between the Big Data 1.0 players (Hortonworks, Cloudera, and Mapr) will end.

Nevertheless, SQL Server is becoming an analytics “middleware.” It federates not only small data but also big data, with preferred connection to Power BI Report Server.

SQL Server 2019 analytics platform to bring AI (Artificial Intelligence) directly into SQL Server with Apache Spark (Source: Microsoft)
SQL Server 2019 analytics platform to bring AI (artificial intelligence) directly into SQL Server with Apache Spark (Source: Microsoft). I love when I see Java on a Microsoft diagram.

Microsoft continues its adoption of open source tools within their own platforms, a very similar strategy to IBM. Late last year, Microsoft announced a partnership with Databricks to bring their analytics platform on Azure. I wonder if there are plans to ease .net integration with Spark or if Microsoft wants us to go through Livy.

The engineer in me likes this evolution of SQL Server. The business minded person, I sometimes am, wonders if this will bring any appeal to anyone who is not already a SQL Server customer?



More on the topic: