滴滴云A100 40G+TensorFlow1.15.2 +Ubuntu 18.04 性能测试

2023-10-25 14:50

本文主要是介绍滴滴云A100 40G+TensorFlow1.15.2 +Ubuntu 18.04 性能测试,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

今天拿到了滴滴云内测版A100,跑了一下 TensorFlow基准测试,现在把结果记录一下!

 

运行环境

 

平台为:滴滴云

系统为:Ubuntu 18.04

显卡为:A100-SXM4-40GB

Python版本: 3.6

TensorFlow版本:1.15.2 NV编译版

 

系统环境:

 

测试方法

TensorFlow benchmarks测试方法:

https://github.com/tensorflow/benchmarks

 

resnet50_v1.5

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50_v1.5
Step    Img/sec total_loss
1       images/sec: 602.4 +/- 0.0 (jitter = 0.0)        7.847
10      images/sec: 606.8 +/- 1.2 (jitter = 5.4)        8.053
20      images/sec: 606.3 +/- 0.8 (jitter = 4.4)        8.102
30      images/sec: 605.8 +/- 0.8 (jitter = 3.8)        8.117
40      images/sec: 606.2 +/- 0.7 (jitter = 3.8)        7.893
50      images/sec: 606.1 +/- 0.5 (jitter = 3.0)        7.919
60      images/sec: 606.2 +/- 0.5 (jitter = 2.9)        8.104
70      images/sec: 606.6 +/- 0.5 (jitter = 2.9)        7.985
80      images/sec: 606.6 +/- 0.4 (jitter = 2.8)        7.805
90      images/sec: 606.6 +/- 0.4 (jitter = 2.8)        7.973
100     images/sec: 606.7 +/- 0.4 (jitter = 2.8)        7.644
----------------------------------------------------------------
total images/sec: 606.23
----------------------------------------------------------------

 

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50_v1.5 --use_fp16

 

Step    Img/sec total_loss
1       images/sec: 1327.1 +/- 0.0 (jitter = 0.0)       7.972
10      images/sec: 1321.2 +/- 5.7 (jitter = 27.6)      7.885
20      images/sec: 1323.5 +/- 4.4 (jitter = 25.9)      8.073
30      images/sec: 1323.6 +/- 3.7 (jitter = 27.3)      7.934
40      images/sec: 1322.1 +/- 3.3 (jitter = 32.9)      8.102
50      images/sec: 1321.4 +/- 3.0 (jitter = 27.7)      7.876
60      images/sec: 1322.2 +/- 2.8 (jitter = 32.3)      7.883
70      images/sec: 1322.3 +/- 2.5 (jitter = 32.6)      7.962
80      images/sec: 1324.0 +/- 2.4 (jitter = 32.2)      8.049
90      images/sec: 1324.2 +/- 2.2 (jitter = 31.2)      7.909
100     images/sec: 1325.1 +/- 2.1 (jitter = 29.6)      7.874
----------------------------------------------------------------
total images/sec: 1322.76
----------------------------------------------------------------

 

 

 

Resnet50 BS64

 

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50
Step    Img/sec total_loss
1       images/sec: 653.5 +/- 0.0 (jitter = 0.0)        8.219
10      images/sec: 646.2 +/- 2.0 (jitter = 6.0)        7.879
20      images/sec: 646.1 +/- 1.4 (jitter = 7.2)        7.909
30      images/sec: 646.0 +/- 1.2 (jitter = 6.0)        7.820
40      images/sec: 646.2 +/- 1.0 (jitter = 6.3)        8.006
50      images/sec: 646.0 +/- 1.0 (jitter = 8.6)        7.769
60      images/sec: 646.0 +/- 0.9 (jitter = 8.6)        8.114
70      images/sec: 645.7 +/- 0.9 (jitter = 9.5)        7.811
80      images/sec: 645.8 +/- 0.8 (jitter = 9.5)        7.979
90      images/sec: 645.8 +/- 0.8 (jitter = 8.0)        8.095
100     images/sec: 645.8 +/- 0.7 (jitter = 6.4)        8.038
----------------------------------------------------------------
total images/sec: 645.26
----------------------------------------------------------------

 

--use_fp16

 

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --use_fp16
Step    Img/sec total_loss
1       images/sec: 1300.1 +/- 0.0 (jitter = 0.0)       8.101
10      images/sec: 1310.1 +/- 7.5 (jitter = 7.4)       7.758
20      images/sec: 1309.7 +/- 8.0 (jitter = 42.3)      7.912
30      images/sec: 1315.0 +/- 5.9 (jitter = 32.1)      7.776
40      images/sec: 1315.5 +/- 4.7 (jitter = 28.2)      7.918
50      images/sec: 1317.5 +/- 3.9 (jitter = 27.7)      7.895
60      images/sec: 1316.5 +/- 3.4 (jitter = 18.6)      7.711
70      images/sec: 1317.3 +/- 3.1 (jitter = 16.1)      8.008
80      images/sec: 1316.9 +/- 2.8 (jitter = 11.4)      7.777
90      images/sec: 1317.7 +/- 2.6 (jitter = 11.8)      7.808
100     images/sec: 1317.1 +/- 2.4 (jitter = 9.9)       8.036
----------------------------------------------------------------
total images/sec: 1315.11
----------------------------------------------------------------

 

 

AlexNet BS512

 

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet
Step    Img/sec total_loss
1       images/sec: 8294.2 +/- 0.0 (jitter = 0.0)       nan
10      images/sec: 8290.2 +/- 1.6 (jitter = 5.3)       nan
20      images/sec: 8290.6 +/- 1.0 (jitter = 3.7)       nan
30      images/sec: 8290.8 +/- 0.7 (jitter = 2.8)       nan
40      images/sec: 8291.3 +/- 0.6 (jitter = 2.7)       nan
50      images/sec: 8289.8 +/- 1.4 (jitter = 2.9)       nan
60      images/sec: 8290.2 +/- 1.2 (jitter = 2.9)       nan
70      images/sec: 8290.4 +/- 1.3 (jitter = 3.6)       nan
80      images/sec: 8291.1 +/- 1.1 (jitter = 3.5)       nan
90      images/sec: 8291.9 +/- 1.0 (jitter = 4.4)       nan
100     images/sec: 8291.9 +/- 1.1 (jitter = 5.2)       nan
----------------------------------------------------------------
total images/sec: 8282.46
----------------------------------------------------------------

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=512 --model=alexnet --use_fp16
Step    Img/sec total_loss
1       images/sec: 10618.6 +/- 0.0 (jitter = 0.0)      7.250
10      images/sec: 10607.7 +/- 4.4 (jitter = 16.3)     7.251
20      images/sec: 10602.5 +/- 3.0 (jitter = 13.1)     7.251
30      images/sec: 10604.1 +/- 2.3 (jitter = 11.2)     7.251
40      images/sec: 10601.0 +/- 2.5 (jitter = 13.4)     7.251
50      images/sec: 10601.7 +/- 2.5 (jitter = 13.8)     7.251
60      images/sec: 10603.0 +/- 2.2 (jitter = 14.0)     7.250
70      images/sec: 10605.1 +/- 2.1 (jitter = 12.5)     7.251
80      images/sec: 10605.4 +/- 1.9 (jitter = 12.2)     7.251
90      images/sec: 10605.4 +/- 1.7 (jitter = 12.1)     7.251
100     images/sec: 10605.8 +/- 1.7 (jitter = 12.3)     7.251
----------------------------------------------------------------
total images/sec: 10587.67
----------------------------------------------------------------

 

Inception v3 BS64

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3
Step    Img/sec total_loss
1       images/sec: 436.8 +/- 0.0 (jitter = 0.0)        7.276
10      images/sec: 437.9 +/- 1.2 (jitter = 0.8)        7.337
20      images/sec: 437.8 +/- 1.0 (jitter = 2.2)        7.269
30      images/sec: 437.9 +/- 0.8 (jitter = 2.2)        7.422
40      images/sec: 437.9 +/- 0.6 (jitter = 3.5)        7.299
50      images/sec: 438.6 +/- 0.6 (jitter = 4.1)        7.277
60      images/sec: 439.2 +/- 0.5 (jitter = 3.7)        7.363
70      images/sec: 439.5 +/- 0.5 (jitter = 4.8)        7.347
80      images/sec: 440.3 +/- 0.5 (jitter = 5.3)        7.410
90      images/sec: 440.3 +/- 0.5 (jitter = 5.2)        7.325
100     images/sec: 440.3 +/- 0.4 (jitter = 5.0)        7.346
----------------------------------------------------------------
total images/sec: 440.01
----------------------------------------------------------------

 

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3 --use_fp16
Step    Img/sec total_loss
1       images/sec: 901.5 +/- 0.0 (jitter = 0.0)        7.305
10      images/sec: 945.5 +/- 7.0 (jitter = 5.0)        7.354
20      images/sec: 945.6 +/- 4.9 (jitter = 7.1)        7.330
30      images/sec: 945.3 +/- 3.9 (jitter = 6.9)        7.382
40      images/sec: 946.3 +/- 3.2 (jitter = 7.3)        7.278
50      images/sec: 946.6 +/- 2.8 (jitter = 7.5)        7.373
60      images/sec: 946.3 +/- 2.5 (jitter = 7.6)        7.299
70      images/sec: 946.8 +/- 2.3 (jitter = 7.5)        7.323
80      images/sec: 946.5 +/- 2.1 (jitter = 7.6)        7.317
90      images/sec: 946.6 +/- 2.0 (jitter = 7.6)        7.357
100     images/sec: 947.2 +/- 1.8 (jitter = 7.3)        7.327
----------------------------------------------------------------
total images/sec: 946.03
----------------------------------------------------------------

 

VGG16 BS64

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16
Step    Img/sec total_loss
1       images/sec: 442.1 +/- 0.0 (jitter = 0.0)        7.321
10      images/sec: 442.4 +/- 0.1 (jitter = 0.4)        7.315
20      images/sec: 442.4 +/- 0.1 (jitter = 0.3)        7.269
30      images/sec: 442.4 +/- 0.0 (jitter = 0.2)        7.271
40      images/sec: 442.4 +/- 0.0 (jitter = 0.2)        7.282
50      images/sec: 442.4 +/- 0.0 (jitter = 0.2)        7.291
60      images/sec: 442.4 +/- 0.0 (jitter = 0.2)        7.250
70      images/sec: 442.4 +/- 0.1 (jitter = 0.2)        7.278
80      images/sec: 442.4 +/- 0.0 (jitter = 0.2)        7.274
90      images/sec: 442.4 +/- 0.0 (jitter = 0.2)        7.286
100     images/sec: 442.4 +/- 0.0 (jitter = 0.2)        7.283
----------------------------------------------------------------
total images/sec: 442.20
----------------------------------------------------------------

 

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16 --use_fp16
Step    Img/sec total_loss
1       images/sec: 687.4 +/- 0.0 (jitter = 0.0)        7.279
10      images/sec: 688.2 +/- 0.2 (jitter = 0.5)        7.255
20      images/sec: 688.0 +/- 0.1 (jitter = 0.5)        7.283
30      images/sec: 688.0 +/- 0.1 (jitter = 0.7)        7.254
40      images/sec: 687.9 +/- 0.1 (jitter = 0.7)        7.283
50      images/sec: 687.8 +/- 0.1 (jitter = 0.7)        7.249
60      images/sec: 687.7 +/- 0.1 (jitter = 0.8)        7.294
70      images/sec: 687.6 +/- 0.1 (jitter = 0.9)        7.278
80      images/sec: 687.6 +/- 0.1 (jitter = 0.9)        7.268
90      images/sec: 687.7 +/- 0.1 (jitter = 0.9)        7.264
100     images/sec: 687.6 +/- 0.1 (jitter = 0.9)        7.268
----------------------------------------------------------------
total images/sec: 687.07
----------------------------------------------------------------

 

GoogLeNet BS128

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=googlenet
Step    Img/sec total_loss
1       images/sec: 1577.4 +/- 0.0 (jitter = 0.0)       7.104
10      images/sec: 1565.9 +/- 4.1 (jitter = 12.5)      7.105
20      images/sec: 1561.7 +/- 3.1 (jitter = 20.4)      7.094
30      images/sec: 1562.3 +/- 2.5 (jitter = 15.1)      7.087
40      images/sec: 1561.5 +/- 2.2 (jitter = 16.1)      7.067
50      images/sec: 1561.6 +/- 2.0 (jitter = 15.6)      7.091
60      images/sec: 1561.5 +/- 1.8 (jitter = 15.7)      7.049
70      images/sec: 1560.3 +/- 1.9 (jitter = 15.3)      7.074
80      images/sec: 1558.8 +/- 1.9 (jitter = 17.2)      7.077
90      images/sec: 1558.2 +/- 1.8 (jitter = 17.2)      7.079
100     images/sec: 1557.5 +/- 1.8 (jitter = 17.6)      7.066
----------------------------------------------------------------
total images/sec: 1556.06
----------------------------------------------------------------

 

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=128 --model=googlenet --use_fp16
Step    Img/sec total_loss
1       images/sec: 2690.1 +/- 0.0 (jitter = 0.0)       7.173
10      images/sec: 2675.3 +/- 13.9 (jitter = 35.5)     7.068
20      images/sec: 2682.4 +/- 9.9 (jitter = 55.4)      7.086
30      images/sec: 2686.6 +/- 8.3 (jitter = 36.6)      7.075
40      images/sec: 2687.8 +/- 6.9 (jitter = 30.6)      7.084
50      images/sec: 2686.7 +/- 6.0 (jitter = 36.4)      7.076
60      images/sec: 2687.5 +/- 5.4 (jitter = 36.4)      7.075
70      images/sec: 2681.0 +/- 6.8 (jitter = 41.6)      7.075
80      images/sec: 2683.2 +/- 6.1 (jitter = 34.0)      7.065
90      images/sec: 2684.1 +/- 5.6 (jitter = 35.6)      7.092
100     images/sec: 2683.9 +/- 5.2 (jitter = 36.1)      7.052
----------------------------------------------------------------
total images/sec: 2680.27
----------------------------------------------------------------

 

ResNet152 BS32

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152
Step    Img/sec total_loss
1       images/sec: 225.6 +/- 0.0 (jitter = 0.0)        9.060
10      images/sec: 228.3 +/- 1.0 (jitter = 2.0)        8.594
20      images/sec: 228.3 +/- 0.6 (jitter = 2.0)        8.635
30      images/sec: 228.2 +/- 0.5 (jitter = 2.5)        8.719
40      images/sec: 227.9 +/- 0.5 (jitter = 2.8)        8.599
50      images/sec: 228.1 +/- 0.5 (jitter = 2.9)        8.791
60      images/sec: 228.3 +/- 0.4 (jitter = 3.6)        8.668
70      images/sec: 228.3 +/- 0.4 (jitter = 3.3)        9.072
80      images/sec: 228.3 +/- 0.4 (jitter = 3.5)        8.874
90      images/sec: 228.4 +/- 0.3 (jitter = 3.7)        9.030
100     images/sec: 228.4 +/- 0.3 (jitter = 3.7)        8.839
----------------------------------------------------------------
total images/sec: 228.29
----------------------------------------------------------------

 

--use_fp16

python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=32 --model=resnet152 --use_fp16
Step    Img/sec total_loss
1       images/sec: 392.9 +/- 0.0 (jitter = 0.0)        9.147
10      images/sec: 397.9 +/- 2.8 (jitter = 6.0)        9.000
20      images/sec: 399.0 +/- 2.1 (jitter = 8.6)        8.842
30      images/sec: 393.7 +/- 2.9 (jitter = 14.7)       8.813
40      images/sec: 394.4 +/- 2.3 (jitter = 15.2)       8.984
50      images/sec: 394.9 +/- 2.0 (jitter = 13.9)       8.647
60      images/sec: 395.7 +/- 1.8 (jitter = 13.9)       8.838
70      images/sec: 396.5 +/- 1.6 (jitter = 15.3)       8.941
80      images/sec: 395.9 +/- 1.4 (jitter = 13.4)       8.913
90      images/sec: 396.2 +/- 1.3 (jitter = 14.1)       8.807
100     images/sec: 395.7 +/- 1.3 (jitter = 14.5)       8.729
----------------------------------------------------------------
total images/sec: 395.34
----------------------------------------------------------------

 

性能对比

A100 和V100 和 2080ti 性能对比:

 

https://www.tonyisstark.com/383.html

 

这篇关于滴滴云A100 40G+TensorFlow1.15.2 +Ubuntu 18.04 性能测试的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/283283

相关文章

Python如何使用__slots__实现节省内存和性能优化

《Python如何使用__slots__实现节省内存和性能优化》你有想过,一个小小的__slots__能让你的Python类内存消耗直接减半吗,没错,今天咱们要聊的就是这个让人眼前一亮的技巧,感兴趣的... 目录背景:内存吃得满满的类__slots__:你的内存管理小助手举个大概的例子:看看效果如何?1.

新特性抢先看! Ubuntu 25.04 Beta 发布:Linux 6.14 内核

《新特性抢先看!Ubuntu25.04Beta发布:Linux6.14内核》Canonical公司近日发布了Ubuntu25.04Beta版,这一版本被赋予了一个活泼的代号——“Plu... Canonical 昨日(3 月 27 日)放出了 Beta 版 Ubuntu 25.04 系统镜像,代号“Pluc

Redis中高并发读写性能的深度解析与优化

《Redis中高并发读写性能的深度解析与优化》Redis作为一款高性能的内存数据库,广泛应用于缓存、消息队列、实时统计等场景,本文将深入探讨Redis的读写并发能力,感兴趣的小伙伴可以了解下... 目录引言一、Redis 并发能力概述1.1 Redis 的读写性能1.2 影响 Redis 并发能力的因素二、

Ubuntu中Nginx虚拟主机设置的项目实践

《Ubuntu中Nginx虚拟主机设置的项目实践》通过配置虚拟主机,可以在同一台服务器上运行多个独立的网站,本文主要介绍了Ubuntu中Nginx虚拟主机设置的项目实践,具有一定的参考价值,感兴趣的可... 目录简介安装 Nginx创建虚拟主机1. 创建网站目录2. 创建默认索引文件3. 配置 Nginx4

Golang中拼接字符串的6种方式性能对比

《Golang中拼接字符串的6种方式性能对比》golang的string类型是不可修改的,对于拼接字符串来说,本质上还是创建一个新的对象将数据放进去,主要有6种拼接方式,下面小编就来为大家详细讲讲吧... 目录拼接方式介绍性能对比测试代码测试结果源码分析golang的string类型是不可修改的,对于拼接字

mysql线上查询之前要性能调优的技巧及示例

《mysql线上查询之前要性能调优的技巧及示例》文章介绍了查询优化的几种方法,包括使用索引、避免不必要的列和行、有效的JOIN策略、子查询和派生表的优化、查询提示和优化器提示等,这些方法可以帮助提高数... 目录避免不必要的列和行使用有效的JOIN策略使用子查询和派生表时要小心使用查询提示和优化器提示其他常

Ubuntu 22.04 服务器安装部署(nginx+postgresql)

《Ubuntu22.04服务器安装部署(nginx+postgresql)》Ubuntu22.04LTS是迄今为止最好的Ubuntu版本之一,很多linux的应用服务器都是选择的这个版本... 目录是什么让 Ubuntu 22.04 LTS 变得安全?更新了安全包linux 内核改进一、部署环境二、安装系统

SpringBoot中整合RabbitMQ(测试+部署上线最新完整)的过程

《SpringBoot中整合RabbitMQ(测试+部署上线最新完整)的过程》本文详细介绍了如何在虚拟机和宝塔面板中安装RabbitMQ,并使用Java代码实现消息的发送和接收,通过异步通讯,可以优化... 目录一、RabbitMQ安装二、启动RabbitMQ三、javascript编写Java代码1、引入

Nginx设置连接超时并进行测试的方法步骤

《Nginx设置连接超时并进行测试的方法步骤》在高并发场景下,如果客户端与服务器的连接长时间未响应,会占用大量的系统资源,影响其他正常请求的处理效率,为了解决这个问题,可以通过设置Nginx的连接... 目录设置连接超时目的操作步骤测试连接超时测试方法:总结:设置连接超时目的设置客户端与服务器之间的连接

Springboot中分析SQL性能的两种方式详解

《Springboot中分析SQL性能的两种方式详解》文章介绍了SQL性能分析的两种方式:MyBatis-Plus性能分析插件和p6spy框架,MyBatis-Plus插件配置简单,适用于开发和测试环... 目录SQL性能分析的两种方式:功能介绍实现方式:实现步骤:SQL性能分析的两种方式:功能介绍记录