After summer vacations, the rCUDA Team is back to work. Just before vacations we released our new rCUDA version (v20.07), which has been very well welcome. Now, our immediate goal is improving the new version of rCUDA so that it provides support for CUDA 10.0. We are also working on providing support for multitenancy.

The rCUDA Team is happy to announce that the new version of the rCUDA middleware has been released. The new version, v20.07, is the result of our hard work during the last year and a half. The new version of rCUDA includes a completely new and disruptive internal architecture both at clients and servers. This new architecture is intended to provide improved performance at the same time that CUDA applications are much better supported. Moreover, the new version of rCUDA also includes a completely new communications layer, which is intended to provide much better performance than previous versions of rCUDA. This new communications layer allows that the rCUDA server can simultaneously provide service across TCP and InfiniBand (RDMA) networks. That is, the rCUDA server can provide service to some applications by using TCP/IP at the same time that other applications are served using the InfiniBand RDMA-based network. This is done transparently to the users. Additionally, support for functions in the CUDA Driver API has been noticeably improved. Also, the use of P2P data copies among GPUs located in different remote nodes has been noticeably simplified, making it fully transparent to users. We hope that rCUDA users enjoy this new version as much as we enjoyed creating it!!

The rCUDA Team is happy to announce that a new tool will be included in the new rCUDA release. The new tool, named 'rCUDA-smi' behaves similarly to the nvidia-smi tool. In this way, the rCUDA-smi tool provides information about the remote GPUs used with rCUDA. The picture shows an example of this new tool. It can be seen in the picture that 8 GPUs, located in 5 different nodes, are used with rCUDA. The first node (node1) provides a K40m GPU. The second node (node2) provides two K80 GPUs, as well as the third node. The fourth node provides two different GPUs: one K40m and one K20. Finally, the last node provides a K40m GPU. It can be seen in this example that the new rCUDA-smi provides the same information as the NVIDIA-smi tool except that remote GPUs are considered instead of the local GPUs as the NVIDIA-smi tool does.

The rCUDA Team is glad to inform that more and more applications are being executed with the new version of rCUDA. In addition to TensorFlow, we have tried with applications such as CUDAmeme, Gromacs, Barracuda, CUDASW, GPU-LIBSVM and HPL linpack. We are currently working with NAMD and LAMMPS. More applications will be tried in the future. Notice that with rCUDA it is possible to use remote GPUs located in different nodes. In this way, it is possible to provide applications with the GPUs installed in all nodes of the cluster. Additionally, those GPUs can be safely shared among several applications.

Due to COVID-19, Spanish universities had to switch from in-class to online teaching. In this context, lab works had to be organized in such a way that students could practice from home. In the subject "Signal and Multimedia Processors" of fourth year of the Degree in Electronic Systems Engineering (taught by the Electronic Technology Department of University of Malaga at the Higher Technical School of Telecommunications Engineering), students had to practice with CUDA. However, most students did not have the required GPU at home. In order to overcome this concern, the teacher, Francisco Javier González Cañete (fgc -(@)- uma -(.)- es), decided to install rCUDA in a server so that students could practice with CUDA without having a CUDA GPU at home. That is, students wrote their own CUDA programs in their home computer, compiled them at their computer, and executed them in a remote GPU by using rCUDA. In this way, the client side of rCUDA was used in the students' computers whereas the server side of rCUDA was running in a shared server providing service to all the students. Communication between home computers and the shared server was carried out across Internet. The experience has been a great success: students were able to practice with CUDA from home at the same time that rCUDA provided service without a single failure. Notice that each student had to devote at least 20 hours over several weeks in order to complete these lab works. The rCUDA Team is very happy about this successful experience. Additionally, given that the rCUDA Team is being much more demanding during the design and implementation of the new rCUDA version, we expect the new version of rCUDA to be very robust.