JACEK LASKOWSKI

JACEK LASKOWSKI

JACEK LASKOWSKI

Independent Apache Spark Consultant

An independent consultant who offers development and training services for Apache Spark (and Scala, sbt with a bit of Hadoop YARN, Apache Kafka, Apache Hive, Apache Mesos, Akka Actors/Stream/HTTP, and Docker). I lead Warsaw Scala Enthusiasts and Warsaw Spark meetups.

Speak Spark SQL for better performance (leveraging Catalyst optimizer)

Spark SQL is now the de-facto driving force behind Apache Spark 2.0’s success. It comes with enough cool features to keep you busy for few days and made Spark MLlib even more pleasant to use. In Spark 2.0, Spark SQL comes with Datasets, encoders, logical and physical plans. They are the frontends to the other low-level components called Catalyst optimizer and Tungsten that are supposed to make your queries be faster. During this presentation you will find out how your structured queries end up as Datasets, the difference between Datasets, DataFrames and RDDs, and finally how Spark SQL’s Catalyst optimizer could make your queries faster when properly structured.

Scroll Up