追求卓越 成就未来

追求卓越 成就未来

Glory Last Forever

名校科研-计算机与人工智能

CS & AI

加州大学伯克利分校电子计算机科学科研项目

UC Berkeley EECS Research

 

科研方向

 

1.Understanding the optimization of software performance and the relationship and impact of hardware design

2.Make tradeoffs among various performance indicators (such as speed, accuracy, power, storage)

3. Beginning with the computing kernel that controls the performance of scientific computing and information retrieval applications, automate the performance tuning process

4. Performance Modeling and Evaluation of Future Computer Architecture

 

项目关键词

 

Deep Learning【深度学习】

Supercomputing【超级计算】

distributed systems【分布式系统】

 

科研参考课题

 

I. [DL+System] Large-Scale Deep Neural Networks

Training on Supercomputers

Keywords: Deep Learning, supercomputing, distributed systems

Candidates: Students must have strong programming background (C/C++ and Python) and good machine learning knowledge to begin with. Students need to be good at Linux command lines. Students with good knowledge of TensorFlow programming, linear algebra, optimization,and parallel/distributed programming are preferred. After the research project, in addition to a technical report or a paper, the students should learn the following skills:

● The process of computer science research: analyzing the pros and cons of an algorithm,designing numerical experiments, and writing a good scientific paper;

● The application of distributed systems and Supercomputers on emerging applications like deep learning;

● The codesign between system (supercomputer) and algorithm (deep learning technique). Introduction: Deep neural network (i.e. Deep Learning) is the most successful artificial intelligence technique. However, deep neural networks training is extremely slow. For example,finishing 90-epoch ImageNet-1k training with ResNet-50 model on a NVIDIA M40 GPU takes 14 days. It can take several months on a MAC laptop. This training requires 10^18 single precision operations in total. On the other hand, the world's current fastest supercomputer can finish 2 *10^17 single precision operations per second. If we can make full use of the supercomputer for DNN training, we should be able to finish the 90-epoch ResNet-50 training in five seconds.

However, the current bottleneck for fast DNN training is in the algorithm level. Specifically, the current batch size (e.g. 512) is too small to make efficient use of many processors. In this project, students will focus on design a new optimization algorithm that can make full use thousands of computing servers.

The students are also welcome to propose their own project in related areas.

Specific ideas cannot be disclosed via this introduction, but raw directions include:

● Explore and explain why extremely large batches often lose accuracy. It is will be good if the students can give either a mathematical or empirical answer.

● Studying advanced optimization methods and trying to replace Momentum SGD or state-of-the-art adaptive optimization solvers. Ideally, the new proposed optimization solver should scale the batch size to at least 64K without losing accuracy for ImageNet training.

● The students can try designing some new parallel machine learning algorithms like model-parallelism approach or asynchronous approach.

A tentative 4-week plan:

● Week 1: get familiar with programming on supercomputers and build the platforms like distributed Tensorflow or Uber Horovod. Read 3-5 related papers.

● Week 2: reproduce the results of state-of-the-art approaches. Evaluate the pros and cons of existing approaches.

● Week 3: design our own algorithm and write the design documentation.

● Week 4: implement the proposed algorithm, conduct experiments, and write a technical report.

References:

1. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

https://arxiv.org/pdf/1706.02677.pdf

2. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

https://arxiv.org/pdf/1609.04836.pdf

3. Large Batch Training of Convolutional Networks

https://arxiv.org/pdf/1708.03888.pdf

4. Large Scale GAN Training for High Fidelity Natural Image Synthesis

https://arxiv.org/pdf/1809.11096.pdf

5. Train longer, generalize better: closing the generalization gap in large batch training of neural networks

https://arxiv.org/pdf/1705.08741.pdf

6. Don't Decay the Learning Rate, Increase the Batch Size

https://arxiv.org/pdf/1711.00489.pdf


    文件下载:

扫一扫
关注广留服

广留服客服