How RDMA Optimizes Oracle Cloud & Exadata Performance

How RDMA Optimizes Oracle Cloud & Exadata Performance

The need for speed during a transaction is crucial for most organizations. For others, it is mission critical, particularly for those operating in the banking and financial industry, stock exchanges, and global investments. And with ecommerce and electronic payments disrupting every industry, the demand for speed transactions is yet to reach its peak.

In technical terms, this speed is understood as latency. The lower the latency, if not zero, the better.

Renowned for its unparalleled performance in the computing marketplace, Oracle is continually refining its technology to address enterprise low latency computing requirements and now they are leveraging the concept of Remote Direct Memory Access (RDMA) to further elevate the performance expectations from their products.

In following, we discuss the core concepts of RDMA, RDMA over Converged Ethernet (RoCE), and how Oracle is using these technologies to significantly optimize Exadata performance as well as High Performance Computing (HPC) on Oracle Cloud Infrastructure (OCI) in terms of improved price/performance ratio.

What is RDMA?

It is the ability of computers in a network to read and/or write information in a remote machine without engaging any aspects of CPUs (processor, cache, and operating system) of either of the computers. This results in high-throughput and low-latency networking, which is very useful when dealing with massive parallel computer clusters.

What is RoCE?

It stands for RDMA over Converged Ethernet (RoCE), which is a network protocol allowing remote direct memory access over an Ethernet network. It does this by encapsulating an InfiniBand transport packet over the Ethernet.

RDMA, the Secret Sauce of Fast Networks

Something that previously served only supercomputers and large data centers, RDMA technology has evolved leaps and bounds since its conceptualization in 1990s. Now, it is the secret sauce of fast networks everywhere. Technology pioneering companies such as Oracle are employing Remote Direct Memory Access to optimize enterprise computing performances as well as foster Artificial Intelligence (AI) generation.

In 2010, Oracle purchased Sun Microsystems and as part of that acquisition and got access to some of the best hardware for high performance computing with Mellanox cards and Infini-band technology. In 2016, Oracle launched their first standard shape Bare Metal offering on the cloud and then expanded this offering to include a specialty HPC shape with Intel X7 processor and GPU shapes with Pascal and Volta Nvidia GPU’s.

Oracle Cloud Infrastructure Leverages RDMA over RoCE Cluster Networking to Reduce Latency and Win Performance Competition in the Cloud Marketplace

Oracle history with Mellanox has resulted in deep InfiniBand expertise and both companies leveraged the Exadata experience to create RDMA over RoCE cluster networking, which has become the differentiator for Exadata as well as HPC (High Performance Computing) on OCI. RDMA supports zero-copy networking by enabling the network adapter to transfer data from the wire directly to application memory or from application memory directly to the wire, eliminating the need to copy data between application memory and the data buffers in the operating system. Such transfers require no work to be done by CPUs, caches, or context switches, and transfers continue in parallel with other system operations. This reduces latency in message transfer.

Oracle Cloud Infrastructure has expanded cluster networking by enabling remote direct memory access (RDMA)-connected clusters of up to 20,000 cores on our BM.HPC2.36 instance type. Oracle’s ground-breaking, backend network fabric lets you use Mellanox’s ConnectX-5, 100-Gbps network interface cards with RDMA over Converged Ethernet (RoCE) v2 to create clusters with the same low-latency networking and application scalability that you expect on premises.[1]

Oracle Cloud Infrastructure is leading the cloud high performance computing (HPC) battle in performance and price. Over the last few months, Oracle have set new cloud standards for internode latency, cloud HPC benchmarks, and application performance. Oracle Cloud Infrastructure’s bare metal infrastructure lets you run on-premises performance in the cloud. In addition to connecting bare metal nodes together through RDMA, cluster networking provides a fabric that will enable future instances and products to communicate at extremely low latencies.

[1] https://blogs.oracle.com/cloud-infrastructure/large-clusters,-lowest-latency:-cluster-networking-on-oracle-cloud-infrastructure