What To Do When Data Becomes Big Data

This Workshop

Assumptions:

Basically: you can handle small data already.

My goal for today: give you a basic idea how to use Spark when the data gets bigger.

Summary: If you can use Pandas, then Spark isn't much harder. And it scales out if you need it to.

Who are you and what are you doing here? What big data problems do you have?

First task: start downloading things from the instructions, http://bit.ly/bigdata-workshop-2017 .

Let's make sure you can get things running…

No? Log in as csguestd on the computers here.

If there's time, we can experiment on a cluster too: SSH with username csguestd to gateway.sfucloud.ca.