The rCUDA Team is very happy to inform that we continue making progress with the new version of rCUDA. Our latest achievement is that TensorFlow is starting to work with rCUDA. We have been able to successfully execute a few samples, such as Inception or Cifar10, using one remote GPU, two remote GPUs located in the same server, and two remote GPUs located at different servers. More work and testing is still required to make TensorFlow work with rCUDA reliably. In any case, we tried with TensorFlow 1.12, which is the latest version of TensorFlow using CUDA 9.0 (we are focusing on completing the development of rCUDA for CUDA 9.0 before trying newer CUDA versions). We are confident that newer versions of TensorFlow, such as 1.13 or 1.14, will work once we upgrade rCUDA to support CUDA 10.x. Final step will be making TensorFlow 2.x work with rCUDA. This final step is still far away in the horizon.

The rCUDA Team is glad to announce that the new rCUDA version is now able to perform data copies between GPUs located in different remote servers. That is, if an application using rCUDA is assigned two GPUs, each of them located in a different server, now it is possible to directly copy data among them. This can be done for all the data copy functions included in CUDA (1D, 2D, 3D and Array versions of these functions). Also, this can be done regardless of the exact GPU models involved in the data copy (GPUs in both servers can be different).

The new version of rCUDA is easier to be used. Now, the server side of rCUDA can automatically adapt to the network fabric used by the client side. That is, once the rCUDA server is up and running, it can concurrently accept clients that use Ethernet and clients that use InfiniBand. In previous versions of rCUDA, the rCUDA server could only accept either Ethernet clients or InfiniBand clients, depending on the configuration specified when launching the rCUDA server. We will provide more details about the new rCUDA version in next posts.

The rCUDA Team is happy to inform that the development of the new rCUDA version is making good progress. By using the new rCUDA version, we have been able to reliably execute our entire set of almost 300 synthetic samples designed to stress specific features of CUDA in extreme conditions. Also, we have been able to reliably execute 88 of the samples included in the CUDA package, such as BiCGStab, BlackScholes, MonteCarloMultiGPU, etc. These tests have been conducted using one remote GPU and also, for those samples able to use more than one GPU, we have used two remote GPUs located either in one or in two different servers. Different GPU generations were used in the tests (K20, K40, K80, P100, V100). Moreover, some applications are starting to work with rCUDA. For instance, we have been able to execute the Gromacs application. We expect the list of applications to grow in the next weeks.

The rCUDA Team is glad to announce that the new version of rCUDA was born a few days ago. In addition to many bug fixes and minor improvements, the new version of rCUDA integrates three big developments: (1) a new internal architecture intended to provide better support to CUDA applications as well as close-to-native performance; (2) a new communications layer able to get all the bandwidth from the underlying network fabric; (3) support for Slurm. As every baby, the new rCUDA version is small right now. However it is quickly growing. We have been already able to execute a dozen NVIDIA samples. This number grows and grows every day.