Abstract
- Chip that control the flow of bits between Main Memory and some Device Controller without constant CPU intervention, avoids the potential performance hit from Busy Waiting
RDMA
- An extension of Direct Memory Access (DMA) across a network, allows one computer to directly read/write the memory of another computer, bypassing the CPU, kernel, and OS network stack on both sides
Insane throughput and latency
Ultra-low latency and high throughput in distributed systems and HPC!
How does it work?
- NICs (often Infiniband or RoCE cards) are RDMA-capable.
- Applications can register memory regions, then remote nodes can directly read/write them.
- No syscalls per message, no copies between kernel/user buffers → “zero-copy networking”.
Zero Copy
- Zero copy means CPU does not perform the task of copying data from one memory area to another with the help of DMA or in which unnecessary data copies are avoided
Benefits of zero copy
The CPU is consistently involved in copying data between the OS buffer in kernel space and the Kafka buffer in user space, and vice versa. Expensive context switching is also involved.
Using system calls like
sendfile(2)
, data is copied directly from the OS buffer to the NIC buffer.Unlike
read
andwrite
, which require transferring data to and from user space, copying withsendfile
occurs entirely within kernel space. The actual data transfer is offloaded to the DMA, freeing the CPU for other computational tasks.
sendfile
is particularly useful when the application in user space does not need to process the data, and the data is ready to be sent out via the NIC (Network Interface Card).