Not logged inRybka Chess Community Forum
Up Topic The Rybka Lounge / Computer Chess / Tesla v100 vs 2 gtx 1080?
- - By h1a8 (***) [us] Date 2018-09-07 15:01
For Leela zero (in terms of chess speed)

Approximately how much faster is one Tesla v100 over two 1080s in a 2-way sli?

And approximately how much faster are two Tesla v100 on a 2 way over a single Tesla v100?
Parent - By Labyrinth (*****) [us] Date 2018-09-07 23:44
Would be better to ask this on the leela forum. Not sure what sort of metrics they use exactly. By one cryptomining metric the v100 was over twice as fast (0.76 kh/s vs. 2.02 kh/s), so a single card would be faster than two 1080Ti cards in that case.
Parent - - By h1a8 (***) [us] Date 2018-09-08 15:03
Thanks

Why is there a big difference?
Titan v 31k nps 20x256 id10048
Titan v 51k nps 15x192 id476
Parent - By pawel.newyork (*) [us] Date 2018-09-09 20:36
I think this is about how  large these networks are . 15x192 so for example you have 15 rows and 192 weights in each row vs 20 rows and 256 weights each or opposite 256 rows and 20 weights.  So to search first network you need less time than the second one, therefore your relative speed is higher but output is less exact.  This is just an example how those nets works, I watched some youtube videos! :) If Im wrong please correct me..
Parent - By Carl Bicknell (*****) [gb] Date 2018-09-11 12:56

> For Leela zero (in terms of chess speed)
>
> Approximately how much faster is one Tesla v100 over two 1080s in a 2-way sli?
>
> And approximately how much faster are two Tesla v100 on a 2 way over a single Tesla v100?


It would help to understand:

a) The way to work this out is by knowing the speed of the graphics card which you care about is its computational ability measured in TFLOPS.

b) Lc0 scales to 2 cards but not more at the moment

c) Lc0 can use FP16 which is supported by some cards and not by others. The 1080 is a fast FP32 card (The 1080 Ti is much better by the way but still only FP32) but it doesn't do FP16 (half precision) very well. The V100 does.

The 1080 Ti is about 11 TFLOPS (FP32) and the V100 is about 30 TFLOPS (FP16)

I think the speedup from 1 to 2 cards is 1.8x so......

1.8 x 11 = about 20

So a V100 will be 50% quicker. However, if LC0 can use Tensor cores on the V100 then the V100 would be about 7x quicker.
Parent - - By Carl Bicknell (*****) [gb] Date 2018-09-11 13:01
But really the big question you should be asking is whether the new gaming cards due to be released very soon support FP16. That's the big question. I think they might, but we'll see on launch.
Parent - - By gsgs (***) [de] Date 2018-09-12 05:07 Edited 2018-09-14 00:07
I've read it goes further down to 8-bit integer multiplications.

TOPS (TeraOPS) , not FLOPS
int8 , not FP16

1080ti,600,250,11,22?,-
2080ti,1200,250,13,26?,78
v100,7000,300,15,32,100  other source : 7000,300,63,125,63
jetson xavier,2500,30,13,26?,30
teslaT4,3000?,75,8,16,130
google1,
google2,

{price,watts,32bit TFLOPS,16bit TFLOPS,8bit TOPS}

TPU: 92 TOPS 8bit @ 75W
TESLA V100: 120 TFLOPS @ 300W
TESLA P100: 13 TFLOPS @ 300W

Leela is not yet capable to use 8bit-Int, but they will probably add that soon
Parent - - By gsgs (***) [de] Date 2018-09-14 06:32
I have to correct this.
> Posts may only be edited a limited time after their original submission. This time limit has expired.

1080ti,600,250,11,22?,-
2080ti,1200,250,13,26?,78
v100,7000,300,15,32,100  other source : 7000,300,63,125,63
jetson xavier,2500,30,13,26?,30
teslaT4,3000?,75,32?,65,130
google1,
google2,

{price,watts,32bit TFLOPS,16bit TFLOPS,8bit TOPS}

https://www.tomshardware.com/news/nvidia-tesla-t4-turing-gpu,37788.html
Parent - - By Labyrinth (*****) [us] Date 2018-09-14 09:58

>v100,7000,300,15,32,100  other source : 7000,300,63,125,63


Per Nvidia's own datasheet the PCI-E version does 14 Teraflops 32-bit (single precision). The 'NVLink' version for custom mainboards can do 15.7, but that's basically just for the HPC (supercomputer) crowd.

The Tesla T4 does 8.1 Tflops single precision (32-bit). Keep in mind that it's only a 75 watt device, V100 is 250 W+.

The 65 Tflops value is for FP16, but Nvidia also lists it as "mixed precision" between 16 and 32-bit. So not sure what that means exactly. It also has a special 4-bit integer mode where it can pull 260 TOPS, and I heard they were experimenting with a 1-bit integer mode (weird).
Parent - By Carl Bicknell (*****) [gb] Date 2018-09-15 13:51

> The Tesla T4 does 8.1 Tflops single precision (32-bit). Keep in mind that it's only a 75 watt device, V100 is 250 W+.


Big step up in performance per watt.

> The 65 Tflops value is for FP16, but Nvidia also lists it as "mixed precision" between 16 and 32-bit. So not sure what that means exactly.


It means Tensors can do either FP16 or FP32.
Parent - - By Carl Bicknell (*****) [gb] Date 2018-09-15 13:38

> Leela is not yet capable to use 8bit-Int, but they will probably add that soon


Maybe. Hopefully. Presumably there is a (small) penalty for dropping precision because the weight values cannot be calculated so exactly, but the increase in nps will be more than worth it.

At the moment FP16 performance is the decisive factor in choosing a card. For the latest cards they can do this two ways: CUDA and Tensor. The latter is faster, despite having fewer cores.
Parent - - By Carl Bicknell (*****) [gb] Date 2018-09-15 21:34
UGH.
Looks like the 2080 Ti has been deliberately crippled in FP16 performance. Very sad.
Parent - - By Carl Bicknell (*****) [gb] Date 2018-09-21 10:58
I now know much more about this than I did a week ago. In summary:

- fp16 is indeed the crucial factor for buying a card. Forget Int8 or anything below, it looses too much accuracy and the gain in nps aren't worth it.

- fp16 on the new Nvidia cards is great, despite my comments above. If you really want to know why I can point you to the technical answer.
Parent - - By gsgs (***) [de] Date 2018-09-24 09:56
Rohan Ryan  
09:41 (vor 2 Stunden)
Nachricht auf Deutsch ├╝bersetzen 
This is a pinned post on Discord by Ankan

with cudnn 7.3 and 411.63 driver available at nvidia.com
minibatch-size=512, network id: 11250, go nodes 1000000

                           fp32    fp16   
GTX 1080Ti:      8996     -----
Titan V:             13295   29379
RTX 2080:          9708   26678
RTX 2080Ti:     12208   32472

It appears that as far as Lzzero goes, the RTX 2080 is almost three times faster than GTX 1080 Ti. Look forward to benchmarks for the RTX 2070. It should offer great value to Lczero fans looking to invest in a graphics card.

Note: LcZero currently uses fp16. And the GTX 1080Ti does not support fp16, only fp32. So, the performance of fp16 in Titan V, RTX 2080, RTX 2080 Ti needs to be compared to fp32 performance of GTX 1080Ti.
Parent - By gsgs (***) [de] Date 2018-10-03 16:39
the first person with a 2080ti reports 30000 games per day on the Lc0 forum
Up Topic The Rybka Lounge / Computer Chess / Tesla v100 vs 2 gtx 1080?

Powered by mwForum 2.27.4 © 1999-2012 Markus Wichitill