The vast majority of the world’s data remains untapped, and enterprises are looking to generate value from this data by creating the next wave of generative AI applications. According to the NVIDIA Technical Blog, retrieval-augmented generation (RAG) pipelines are a key part of this, enabling users to interact with large corpuses of data, transforming documents into interactive AI applications.
Challenges in Implementing RAG Pipelines
Enterprises face several challenges when implementing RAG pipelines. Handling both structured and unstructured data is complex, and processing and retrieving data is computationally intensive. Additionally, privacy and security must be integrated into these pipelines.
To address these issues, NVIDIA and Oracle have collaborated to demonstrate how various segments of the RAG pipeline can utilize the NVIDIA accelerated computing platform on Oracle Cloud Infrastructure (OCI). This integration aims to help enterprises better leverage their data, enhancing the quality and reliability of generative AI outputs.
Embedding Generation with NVIDIA GPUs and Oracle Autonomous Database
In data-rich enterprise environments, harnessing large amounts of text data for generative AI is crucial for enhancing efficiency and productivity. NVIDIA and Oracle have demonstrated how customers can access NVIDIA GPUs through Oracle Machine Learning (OML) Notebooks in Autonomous Database. This allows users to load data directly from an Oracle Database table into an OCI NVIDIA GPU-accelerated virtual machine (VM) instance, generate vector embeddings using the GPU, and store those vectors in Oracle Database for efficient searching using AI Vector Search.
Accelerated Vector Search Indexes and Oracle Database 23ai
NVIDIA cuVS is an open-source library for GPU-accelerated vector search and clustering. A key capability of cuVS is its ability to significantly improve index build time, a crucial component of vector search. NVIDIA and Oracle have demonstrated a proof of concept that accelerates vector index builds for the Hierarchical Navigable Small World (HNSW) algorithm. This approach pairs GPUs with CPUs, resulting in faster index generation and improved performance for AI workloads.
Performant LLM Inference with NIM on OCI
NVIDIA NIM provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across various environments. NIM microservices are designed for NVIDIA-accelerated infrastructure, enabling smooth integration with existing tools and applications. Developers can quickly deploy LLMs with minimal code, whether on-premises or in Kubernetes-managed cloud environments.
Deploying NVIDIA NIM on OCI offers several benefits, including improved total cost of ownership (TCO) with low-latency, high-throughput inference, faster time to market with prebuilt microservices, and enhanced security and control of applications and data.
For the Oracle CloudWorld demonstration, NVIDIA and Oracle showcased how using NIM for LLMs can achieve higher throughput compared to off-the-shelf open-source alternatives, particularly in text generation and translation use cases.
Get Started
NVIDIA has partnered with the OCI and Oracle Database teams to demonstrate how bulk generation of vector embeddings, HNSW index creation, and inferencing elements can be accelerated using NVIDIA GPUs and software. This approach helps organizations leverage the performance gains available from the NVIDIA accelerated computing platform, enabling them to utilize AI to manage the vast amounts of data stored in Oracle databases.
Learn more about cuVS. To try NVIDIA NIM, visit ai.nvidia.com and sign up for the NVIDIA Developer Program to gain instant access to the microservices. You can also start using NVIDIA GPU-enabled notebooks on Autonomous Database and Oracle Database 23ai AI Vector Search with Oracle Database 23ai Free.
Image source: Shutterstock
Credit: Source link