Experts from San Dieogo SuperComputer Center (SDSC) will present a one day tutorial on big data technologies and on latest SDSC resources.
The workshop will be composed of lectures and a hands-on session (bring a laptop if you would like to participate in the hands-on session).
Coffee, refreshments and lunch will be provided.
Please register by entering your name and e-mail: Registration (closed). If you want to attend or have questions, please e-mail Burak.
Details and tentative schedule:
9:00 AM – 9:10 AM: Introduction & Welcome
9:10 AM – 10:00 AM: Comet – SDSC’s 2 PetaFLOPS HPC Resource
- Architecture, queue/partition info, software stack
- Examples for compute, shared, gpu, and gpu-shared partitions
- Hands-on on Comet to help prep for next sessions which will use Comet
10:00 AM – 10:30 AM: Science Gateways
10:30 AM – 10:40 AM: Short break
10:40 AM- 12:00 PM: Introduction to Hadoop on Comet
- Overview of running Hadoop within scheduler frameworks (using myHadoop)
- Demonstration/Hands on of Hadoop cluster spin up, interactive usage
- New technologies/approaches like RDMA-Hadoop and hands on with RDMA-Hadoop
12 PM – 1 PM: Lunch
1:00 PM – 2:00 PM: Data Analytics and Data Mining
- R and parallel execution of R
- Data mining/machine learning
2:00 PM- 3:00 PM: Python for Scientific Computing
- How to run Jupyter notebook on Comet
- Use IPython Parallel for distributed computation
- Easy multithreading and distributed computing with dask
3:00 PM-3:10 PM: Short break
3:05 PM – 4:30 PM: Spark for Scientific Computing
- Overview of the capabilities of Spark and how they can be leveraged to solve problems in Scientific Computing
- Hands-on introduction to Spark, from batch and interactive usage on Comet to running a sample map/reduce example in Python
- Two key libraries in the Spark ecosystem: Spark SQL, a general purpose query engine that can interface to SQL databases or JSON files and Spark MLlib, a scalable Machine Learning library
Date: May 15, 2017
Time: From 8:30 a.m. to 4:30 p.m.
Location: Elings Hall, Room 1605
Pre-requisites: Some experience with Linux on a cluster