1. 磐创AI-开放猫官方网站首页
  2. 机器学习
  3. TensorFlow

分布式 tensorflow 指南

分布式 tensorflow 指南

本指南是一个分布式训练样例集合(可以作为样板代码)和一个基本的分布式tensorflow教程。许多的例子集中在著名的分布式训练方案的实施,如作者的博客文章探讨过的 分布式 keras。
几乎所有的示例都可以在一台带有CPU的机器上运行,所有的示例只使用数据并行(即在图形复制之间)。

项目地址:https://github.com/tmulc18/Distributed-TensorFlow-Guide

This guide is a collection of distributed training examples (that can act as boilerplate code) and a tutorial of basic distributed TensorFlow. Many of the examples focus on implementing well-known distributed training schemes, such as those available in Distriubted Keras which were discussed in the author’s blog post.

Almost all the examples can be run on a single machine with a CPU, and all the examples only use data-parallelism (i.e. between-graph replication).

The motivation for this guide stems from the current state of distributed deep learning. Deep learning papers typical demonstrate successful new architectures on some benchmark, but rarely show how these models can be trained with 1000x the data which is usually the requirement in industy. Furthermore, most successful distributed cases use state-of-the-art hardware to bruteforce massive effective minibatches in a synchronous fashion across high-bandwidth networks; there has been little research showing the potential of asynchronous training (which is why there are a lot of those examples in this guide). Finally, the lack of documenation for distributed TF was the real reason this project was started. TF is a great tool that prides itself on its scalability, but unfortunately there are few examples that show how to make your model scale with datasize.

The aim of this guide is to aid all interested in distributed deep learning, from beginners to researchers.

原创文章,作者:fendouai,如若转载,请注明出处:https://panchuang.net/2017/10/27/distributed-tensorflow-guide/

发表评论

登录后才能评论

联系我们

400-800-8888

在线咨询:点击这里给我发消息

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息