Deep|A discussion about RAG, Vector Database , and Their Monetization Potential, as well as an Analysis of Elastic as a Company-Part-3

Some Expert Views about RAG or VectorDB

Nov 29, 2024

∙ Paid

In this department, we also interviewed some relevant experts who provided additional insights into RAG and VectorDB.

1. Cost Differences Between CSPs and Independent Database Projects

Regarding cost differences, only deep users will pay particular attention to this aspect. For instance, many large vendors develop conversational capabilities that allow clients to interact with their systems. In these interactions, the system can understand prior information exchanges based on context rather than starting anew with each question. A typical example is Notion.AI. Their models are not only applicable to Q&A but also relevant to business contexts, especially in CRM systems designed for maintaining customer relationships. These systems typically need to retain contextual information, but the context for each client is not necessarily extensive. The challenge arises from having numerous user tenants; for example, OpenAI or Notion AI might have millions of users, each with low engagement. This contrasts sharply in the application of RAG for startup companies focused on single enterprises.

In this context, clients with millions of users and high activity levels will be more concerned with costs and service stability since these clients are often high-value customers. The previously mentioned long-context LLMs and RAG, along with subsequent fine-tuning, involve different application scenarios. In practice, many companies may adopt hybrid strategies to address cost issues. For instance, in an early-stage conversational application with a large user base, the active user ratio is often very low, perhaps only 2% to 5% of the total. In such cases, cost issues emerge. Although vector databases are relatively low-cost, most users are inactive, making it impractical to maintain service for such a vast user base. To tackle this challenge, hybrid solutions can be considered. For inactive users, offline storage could be an option, or even not purely memory-based solutions. For occasionally returning inactive users, initial delays may be acceptable, while active users require consistent service quality. For users with extremely low frequency, only their context can be recorded, and when they return, this context can be provided to a large model, which incurs high processing costs but remains manageable overall due to the low access frequency. In real large-scale scenarios, such hybrid solutions are often considered.

2. Knowledge/Content Management Applications and Their Relation to RAG and Vector Databases

We see many this type of applications utilizing RAG, but in both industry and academia, numerous scenarios still rely on traditional keyword searches. The key to this issue lies in ROI. For instance, in some large Telcos or financial institutions, supporting internal customer service often requires high costs to connect with specialists, usually starting with cheaper customer service representatives who escalate issues only when they cannot be resolved. This model incurs relatively high costs. New approaches now allow us to enhance efficiency in this process. For example, while a low-cost customer service rep speaks with a user, automated prompts can be provided to extract key information in real time, helping the representative answer questions better. If the system has high confidence in its answers, it can avoid escalating the user to a specialist, thus improving efficiency.

However, there is a challenge: a massive knowledge base must be finely managed according to core business needs. While the business scope is vast and the foundational knowledge base is often large, not all content holds high value. Therefore, a hybrid strategy must be employed: for high-value businesses, more resources will be allocated, such as introducing graph databases, using multi-encoding, recall, and ranking technologies; whereas for relatively less significant issues, relying on Elastic or other simple recall solutions may suffice. This effectively allocates resources, ensuring that high-value businesses receive adequate support.

3. Data Inflation After Vectorization

How much does data typically inflate after vectorization, and how much will storage costs increase for enterprises using vector databases in their knowledge bases? Generally, the data inflation rate is about threefold. This means that for the original data, one copy exists, and the embedding portion requires another. Although embedding is seen as an encoding process, the scale of these embedding vectors is similar to that of the original product. Additionally, a copy is needed for indexing the embeddings. Therefore, the overall inflation rate is approximately between 2.5 to 3 times. While there are many optimization solutions for indexing and embeddings, the overall inflation will not reach an exaggerated tenfold.

4. A Split of Spending of RAG Clients in All Gen AI Software

The degree of data inflation after vectorization is closely related to the complexity of the business. For heavily search-focused applications, costs are higher; if self-hosting and model tuning are required, inference costs will also increase. Therefore, it is difficult to generalize specific spending split.

The impact of agent businesses on RAG in the future is expected to be substantial. For example, agents being developed by Salesforce primarily target sales customer service. Although they have a large user base, applications in this area are still in early stages, with relatively low data volumes. This early stage mainly involves foundational tasks in the sales process, such as customer outreach and maintaining engagement. If we enter the agent stage, greater demand may arise. In the North American market, we hope agents can help us analyze customer organizational structures, understand who the decision-makers are, how internal structures work, and identify who is using our products. To contact these individuals, we would need to analyze publicly available information from platforms like LinkedIn.

Currently, RAG is in a very early stage, primarily focused on unidirectional knowledge retrieval. Users may lack background knowledge and need to find relevant information from existing knowledge. This is akin to an open-book exam, where we have several books to help users locate relevant materials. However, the work of an agent involves multi-turn, bidirectional interactions, where conclusions are recorded while searching, followed by further research and conclusion generation. This process is actually similar to how humans solve problems and is a deeply iterative process. Currently, the tasks RAG can handle have not been fully realized. If costs can be reduced, the data volume in agent model calls and vector database queries will certainly exceed that of current RAG processes. It seems that the demand for agents will indeed surge, similar to a process of repeated calculations, which will also increase inference costs. Thus, the transition from open-book exams to open-book search iterations will increase the complexity of the problems addressed, necessitating significant changes in both search depth and breadth. While we are excited about this development, the actual application of agents still appears to be some distance away.

5. Large Enterprises’ Attitude towards RAG

The key to this issue is whether it can create value. Once value is realized, an explosion will naturally follow, and this is not a technology-driven problem. This is related to how industries adopt RAG, for instance, in developing AI applications where we can clearly calculate the benefits, or in customer service automation, where efficiency is improved by saving manpower and costs. Traditional enterprises are still able to operate successfully in these areas, and this value is widely recognized. There is also a category of enterprises that rely on AI, such as those using meeting software or Notion AI, where early increments have indeed created value. Out of roughly 100 projects, perhaps only a handful are successful, but given the rapid growth previously, doubling every year at this stage is considered good. These enterprises gradually understand their core business and are even willing to invest manpower to drive progress. For example, YouTube and Facebook in the first half of last year clearly saw revenue growth by applying AI to recommendation systems and advertising. In their core business, even a growth of 1/1000 is worthwhile. They never ask us how to do it, but rather focus on the strong correlation between effectiveness and value, constantly demanding high-quality large-scale services from us. Among industry clients, there are indeed some leading customers who have invested funds and achieved certain outcomes. Next, we can observe growth spreading from leading customers to the top ten and mid-level clients.

This process is characterized by an initially thriving state with everyone seemingly experimenting or, due to a sense of panic, opting for full commitment. In this phase, everyone is probing until eventually, some areas demonstrate positive ROI, at which point the number of customers willing to invest gradually increases. Thus, from initially one or two customers, to the top ten, and then to mid-level customers, the amount of participation in this process is significant.

When considering budgets for RAG and vector databases, customers tend to think in terms of traditional database budgets. I believe customers can be divided into two categories: one includes many companies from traditional industries, like manufacturing, pharmaceuticals, telcos and so forth, which are well-funded and do have budgets. We have had contact with such enterprises, and while they are not particularly concerned about the results, some customers insist on spending the budget fully, not focusing much on ROI; such situations do exist.

Another type of customer involves business teams that focus on business value and ROI. They view this as a cost item because these business teams are generally results-oriented. They do not have enough manpower to design an online database for in-house development; rather, they prefer to purchase ready-made services and directly seek solutions from service providers when issues arise. Therefore, they are very cost-sensitive, with the ultimate goal being to compete with industry rivals, rather than merely comparing the quality of databases, emphasizing end-to-end business value.

6. Extension from Traditional Databases to Vector DB

This business model does exist, for example, by extending capabilities through plugin additions. In our interactions, we find it falls into two categories: one includes businesses running effectively on MySQL, PostgreSQL, or Elastic, which have accumulated for a long time. When they understand vector search capabilities and validate them on the existing business infrastructure, finding it effective, it is feasible to invest an additional 5% to 10% based on the original setup. These customers do not need to completely replace their infrastructure; simply adding vector capabilities within Elastic or Postgre Vector in PostgreSQL suffices. Many customers also inform us of this approach after understanding the scenarios. Migration indeed incurs costs, involving the original Elastic or large-scale business logic, thus potentially necessitating certain adaptation capabilities, which can transform into Elastic after adjustments. This situation is not the main direction of our overall business.

Another category arises when users discover, after adding vector capabilities, that the original load was primarily on keyword searches, say in Elastic, but shifted to vector processing, realizing that 70%-80% of the load now involves vector processing. At this point, they may need to switch to specialized vector databases. Hence, this remains very sensitive, primarily hinging on the source of the load. Regarding future competition, AWS and Azure's advantages do not lie in their products but in other aspects, such as AWS's downstream integration, RAG involvement, and even some models, as it offers complete solutions. Many mid-range and lower-end customers only require a comprehensive solution, which lowers the overall cost for users, avoiding piecemeal solutions. Certain specialized users will be clear on whether Elastic or OpenSearch is more suited to their needs.

7. RAG and Vector Databases in Handling Traditional Structured Data

Vector databases primarily handle unstructured data, and RAG and vector databases cannot hold significant advantages over traditional databases when it comes to structured data because of cost. It can be said that their main applications are in unstructured data. Unless there are feature-specific needs, such as applications related to large language models, traditional databases still excel in processing numerical data. For example, in the field of quantitative trading, fundamental data like market data and financial statements are numerical, from which various feature signals are derived. While the generation and extraction of these feature signals can vary greatly, this processing does not require vector database functionalities. Overall, the primary application of vector databases remains in handling unstructured data.

The mainstream belief is that the interest in vector databases stems from their support for AI RAG. However, in my view, the value of vector databases lies in the fact that all previous databases were based on numerical analysis. For instance, when using SQL, we mainly perform arithmetic operations, and our queries are limited to numerical expressions. Now, the biggest change with AI is its reliance on deep semantics rather than just numerical values. The same applies to keyword queries in traditional databases and Elastic: if there is no keyword match, no result can be obtained. All analyses based on numerical values are strongly coupled with numbers and are confined to numerical analytical bounds. This applies to some semi-structured databases as well; a database must be specifically designed for particular data types. In contrast, AI's analysis approach shifts towards the semantic level rather than remaining at the numerical level. This is why vector databases are more suited for handling unstructured data. We can now construct a database that processes different data types uniformly based on semantics. Therefore, I believe the core of vector databases is their capability to handle various data forms, including user preferences, text, images, video, and audio. All data carries information, and information encapsulates semantics. By extracting and encoding these semantics through models, forming vectors, this is the foundational logic of vector databases. Thus, they can support numerous applications, meeting the unified processing needs of unstructured data, enabling semantic-level analysis. From this perspective, RAG indeed forms a part of enhancing textual semantic analysis towards large models.

8. Prospects of RAG and Vector Databases

The concept of RAG will be necessary throughout the entire process of AI development, including its culmination. The reason is that even the most brilliant experts, when asked about the specifics of a task, might need to refer to a manual. This implies that, in the realm of real-time learning, large models might not have the opportunity to break through, suggesting the ongoing need for a private knowledge concept.

In a way, the RAG approach competes with public-domain large models. Specifically, private RAG schemes are essentially knowledge-based, while public schemes rely on public information with capabilities in information retrieval and understanding. These represent entirely different competitive stances, thereby highlighting the value of RAG in this context.

9. On the Competitive and Cooperative Relationship of RAG and Vector Databases

Keep reading with a 7-day free trial

Subscribe to FundaAI to keep reading this post and get 7 days of free access to the full post archives.