Big Data Workshop

Monday, May 15, 2017

Experts from San Dieogo SuperComputer Center (SDSC) will present a one day tutorial on big data technologies and on latest SDSC resources. 

The workshop will be composed of lectures and a hands-on session (bring a laptop if you would like to participate in the hands-on session).

Coffee, refreshments and lunch will be provided.

Please register by entering your name and e-mail: Registration (closed). If you want to attend or have questions, please e-mail Burak.

Details and tentative schedule:

9:00 AM – 9:10 AM: Introduction & Welcome

9:10 AM – 10:00 AM: Comet – SDSC’s 2 PetaFLOPS HPC Resource

  • Architecture, queue/partition info, software stack
  • Examples for compute, shared, gpu, and gpu-shared partitions
  • Hands-on on Comet to help prep for next sessions which will use Comet

10:00 AM – 10:30 AM: Science Gateways

10:30 AM – 10:40 AM: Short break

10:40 AM- 12:00 PM:  Introduction to Hadoop on Comet

  • Overview of running Hadoop within scheduler frameworks (using myHadoop)
  • Demonstration/Hands on of Hadoop cluster spin up, interactive usage
  • New technologies/approaches like RDMA-Hadoop and hands on with RDMA-Hadoop

12 PM – 1 PM: Lunch 

1:00 PM – 2:00 PM: Data Analytics and Data Mining 

  • R and parallel execution of R
  • Data mining/machine learning

2:00 PM- 3:00 PM: Python for Scientific Computing

  • How to run Jupyter notebook on Comet
  • Use IPython Parallel for distributed computation
  • Easy multithreading and distributed computing with dask

3:00 PM-3:10 PM: Short break

3:05 PM – 4:30 PM: Spark for Scientific Computing

  • Overview of the capabilities of Spark and how they can be leveraged to solve problems in Scientific Computing
  • Hands-on introduction to Spark, from batch and interactive usage on Comet to running a sample map/reduce example in Python
  • Two key libraries in the Spark ecosystem: Spark SQL, a general purpose query engine that can interface to SQL databases or JSON files and Spark MLlib, a scalable Machine Learning library

Date: May 15, 2017

Time: From 8:30 a.m. to 4:30 p.m. 

Location: Elings Hall, Room 1605

Pre-requisites: Some experience with Linux on a cluster