Yesterday, during Ignite 2018, Microsoft announced that they will integrate Apache Spark more tightly with SQL Server 2019. If you missed previous announcements around SQL Server, it now runs on […]
Lazy is good: understand why it’s good for you that Spark is lazy
This new chapter, chapter 4, of Spark with Java (https://www.manning.com/books/spark-with-java) is not only about celebrating laziness, it also teaches, through examples and experiments, the fundamental differences in building a data […]
The majestic dataframe in Apache Spark
Chapter 3 of Spark with Java is focusing on the dataframe. There is something majestic with Apache Spark’s dataframe, like those mountains of Montana. Apache Spark revolves around the concept of […]
Advanced Spark Ingestion
Chapter 9 still covers Spark ingestion (like chapter 7 and chapter 8), but this time, it’s about “anything can become a Spark datasource.” When I was working for Zaloni, we […]
File Ingestion in Apache Spark
In a typical Big Data analytics scenario, you will probably be tempted to ingest files. You know, those pesky CSV files where the comma is sometimes a semicolon or a […]
Apache Spark Maturity on the Rise
Spark Summit Europe 2017 just concluded, here, in Dublin. More than 102 speakers, 1200 attendees, and an impressive Databricks team attended the 3-day long celebration. Spark is reaching a new […]
Spark is Making Big Data Easy at NCDevCon
NCDevCon is a yearly event in the Triangle, targeted for developers of all breeds, from front-end to back-end. Its origin starts in the ol’ days of Adobe ColdFusion, and thus […]
Getting Ready for This Pint of Guinness
Next month, I’ll be heading to Dublin, the capital of Ireland. I have been to Ireland quite a few times – I was 3 the first time. However this time, […]
Apache Spark 2.2 is Out
The Best Spark in Town Yesterday, Apache Spark v2.2.0 has been released. Excitement started a few months ago, reaching a “summit” during Spark Summit where a lot of the features […]
Spark Boosts IBM Event Store
IBM just announced Event Store, a hybrid datastore to store events. The originality? Events can be streamed in and it is based on Apache Spark. IBM claims to be able […]