- About Us
Principal Software Engineer @ IBM Spark Technology Center
Holden Karau is transgender Canadian, and an active open source contributor. When not in San Francisco working as a software development engineer at IBM’s Spark Technology Center, Holden talks internationally on Spark and holds office hours at coffee shops at home and abroad. She makes frequent contributions to Spark, specializing in PySpark and Machine Learning. Prior to IBM she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelor of Mathematics in Computer Science. Outside of software she enjoys playing with fire, welding, scooters, poutine, and dancing.
This talk will start with a quick introduction to the two different building blocks of distributed computing in Apache Spark, as with the relative performance differences. This talk will cover on the performance impacts of Datasets, which are becoming the core building block of much Apache Spark starting with Spark 2.0, as well considerations the RDD API. This talk will finish up with exploring the new structured streaming API. Prior knowledge of Spark isn’t required, but a background with Spark will make it more exciting.