Proxy-Pointer RAG: Reaching Vectorless Accuracy at Vector RAG Scale and Value

launch of PageIndex just lately, is a part of a broader shift in AI structure towards “Vectorless RAG” or “Reasoning-Primarily based Retrieval.” As a substitute of the usual technique of splitting paperwork into random chunks and looking out by way of mathematical similarity, PageIndex builds a “Sensible Desk of Contents” (a hierarchical tree) that permits LLMs to navigate paperwork like a human knowledgeable would. Quite a few blogs (together with this one from Microsoft), define the working ideas (no vector database, no chunking, enhanced explainability) together with 98.7% accuracy achieved on a monetary benchmark. Nevertheless, they’re additionally cautious to notice that Vectorless RAG is greatest fitted to deep-dive queries on complicated structured or semi-structured paperwork (equivalent to monetary statements), than looking out throughout many unbiased paperwork, equivalent to buyer help data bases, the place we must always proceed to make use of vector RAG.

Why is that?

If Vectorless RAG utilizing PageIndex supplies higher (or no less than pretty much as good) outcomes on virtually any question, why not use it for a big assortment of paperwork. The first cause is that PageIndex’s tree-based strategy can not virtually scale to multi-document eventualities. The hierarchical tree index that may be a pre-requisite ingestion step is sluggish and costly to construct utilizing a LLM. Moreover, the retrieval is a 2 step course of: use an LLM to stroll the tree, find essentially the most related nodes, then use the content material of these nodes as context for the response synthesis step utilizing the LLM.

Compared, constructing a vector index is quick and cheap, and the retrieval step makes use of a LLM solely as soon as throughout the synthesis step. Additionally, Ingestion utilizing an embedding mannequin prices a lot lower than summarization of the total doc by an LLM.

What when you may get the wonderful structure-aware reasoning accuracy of Vectorless RAG, together with the low latency and value of a vector RAG, in a means that’s scalable throughout the enterprise database? On this article, I’ll stroll by an actual use case on a big, complicated doc to construct Proxy-Pointer RAG—an ingestion and retrieval pipeline that achieves this by a set of novel engineering steps. Alongside the way in which, we’ll discover and reveal the next:

Why precisely is PageIndex so correct? And why it’s tough to virtually scale the idea to multi-document data bases.
A fast comparability of Vectorless RAG utilizing PageIndex vs Flat vector RAG to ascertain a baseline.
How can we incorporate the ideas of PageIndex right into a vector index with not one of the related latency and price?
Comparability of all kinds of queries utilizing PageIndex and Proxy-Pointer to check the standard of retrievals.

Use Case Setup

We are going to use a World Financial institution report named South Asia Development Update, April 2024: Jobs for Resilience (License: CC BY 3.0 IGO). It is a 131 web page report comprising a number of chapters, complicated charts, tables, content material in packing containers and many others. and is an effective candidate for PageIndex to show its functionality. I’ve used gemini-3-flash because the LLM to construct the pageindex tree and gemini-3.1-flash-lite for retrievals. I extracted the report pdf to a markdown file utilizing the Adobe PDF Extract API, however every other technique equivalent to utilizing a VLM which preserves the integrity of the tables, charts and many others would work simply as properly. For vector database, FAISS is used.

How does PageIndex work?

As a substitute of the “chunk your doc, embed the chunks, retrieve the top-Ok, feed them to an LLM” pipeline of a vector RAG, PageIndex takes a radically totally different strategy to doc retrieval. As a substitute of treating a doc as a flat sequence of chunks, it builds a semantic skeleton tree — a hierarchical map of each part, sub-section, and content material block within the doc — after which makes use of an LLM to navigate that tree at question time.

Part 1: Indexing(as soon as per doc)

PageIndex parses the doc’s heading construction (Markdown headers, PDF outlines, and many others.) right into a nested tree. Every node will get:

A title (extracted from the heading)
A node ID (distinctive identifier like 0012)
Line boundaries (begin and finish line within the supply doc)
A abstract (generated by an LLM — that is the costly and time-consuming half)

The result’s a JSON that appears like this:

{
  "node_id": "0011",
  "title": "Chapter 1. Misleading Energy",
  "abstract": "Covers South Asia's development outlook, inflation tendencies, monetary vulnerabilities, local weather dangers, and coverage challenges...",
  "line_num": 621,
  "nodes": [
    {
      "node_id": "0012",
      "title": "Introduction",
      "summary": "Summarizes the chapter's key themes including regional growth driven by India...",
      "line_num": 625
    },
    ...
  ]
}

Part 2: Retrieval (Per Question)

When a person asks a query, PageIndex fingers the complete tree of summaries to an LLM and says, “Which nodes comprise the reply?”. That is in contrast to a vector RAG which depends on mathematical similarity between question and chunk embeddings to construct the related context.

The LLM reads the summaries — not the total textual content — and returns a brief record of node IDs. PageIndex then makes use of the road boundaries to slice the actual, contiguous, full part from the unique markdown file and passes it to the synthesis LLM.

Why this works so properly?

PageIndex excels due to three architectural benefits:

1. Structural Navigation, Not Sample Matching

After we ask “What are the principle messages of Chapter 1?”, PageIndex doesn’t seek for chunks containing these phrases. It reads the abstract of node 0011 (“Chapter 1. Misleading Energy”) which says “Covers development outlook, inflation, monetary vulnerabilities, local weather dangers, and coverage challenges” — and instantly is aware of that is the fitting node. It causes about relevance, not semantic and lexical similarity.

2. Contiguous Context Extraction

As soon as the fitting nodes are recognized, PageIndex extracts the total, unbroken part that the node represents, from the unique Markdown — headers, sub-headers, bullet factors, determine references, and all. The synthesis LLM receives context that reads like a correctly authored doc part, not a fragmented chunk with arbitrary boundaries.

3. Zero Chunk Boundary Artifacts

There aren’t any overlapping chunks, no break up sentences, no context home windows that begin mid-paragraph. Every bit of context has a pure starting (the part header) and a pure finish (the subsequent part’s begin). This dramatically reduces hallucination from ambiguous context.

Nevertheless, this power comes at a price—and that is the place the strategy begins to interrupt down.

Why that is tough to scale?

The explanations are Value and Latency (ingestion in addition to retrieval time). For our 131-page report, we get 137 structural nodes within the tree and subsequently, PageIndex wants 137 LLM calls only for the indexing section — one abstract per node. For a corpus of fifty paperwork, that’s ~7,000 LLM calls earlier than we’ve got used it to reply a single query.

Here’s a comparative value of PageIndex vs vector Index:

Metric	PageIndex	Normal vector RAG
Indexing LLM calls	~137 per doc	0
Indexing time	5-10 minutes per doc	30 seconds per doc
Indexing value	$$$ (LLM API charges)	$ (embedding API solely)
Multi-document help	Requires per-document tree traversal throughout retrieval	Single unified index, quick and environment friendly
Retrieval high quality	★★★★★	★★★☆☆

For a single vital doc equivalent to a monetary report, this value is well justified. For a 500-document enterprise data base on customer support SOPs that updates weekly? It’s a non-starter.

That is the place PageIndex begins changing into impractical for actual programs at scale.

Earlier than proposing an alternate, let’s set up a baseline: how significantly better is PageIndex in comparison with commonplace vector RAG?

Comparability of Vectorless vs Flat vector RAG

To ascertain a baseline, I in contrast the outcomes of a various vary of queries utilizing Vectorless and flat vector RAG. As anticipated, the standard of Vectorless RAG was higher in all circumstances. Listed below are two examples:

Question about desk content material

Right here is the response from Vectorless RAG:

=== VECTORLESS RAG INTERACTION ===
Query: what does annexure desk a2.1.1 say
Nodes Retrieved: 0125, 0131
Response:
Annex Desk A2.1.1, titled "Research included within the literature assessment," supplies a complete record of the educational and coverage papers utilized to estimate the expansion elasticity of employment (Annex Determine A2.1.1; Annex 2.1).

The desk catalogs 25 particular research, detailing the next info for every:
*   **Paper:** The creator(s) and publication supply.
*   **Interval of most important pattern:** The timeframe coated by the research (starting from the Sixties by 2019).
*   **Pattern protection:** The geographical or organizational scope (e.g., Worldwide, Europe, South Asia, Africa, Asia).
*   **Area:** The particular area analyzed.
*   **Economies:** The particular nations included within the research pattern.

In keeping with the offered textual content, the articles listed on this desk had been chosen primarily based on particular standards: they deal with development elasticity of employment utilizing country-level estimates. The choice excludes articles that current completely subnational, gender-specific, or business/sector-specific estimates, in addition to these centered solely on Okun's legislation and unemployment elasticity of development (Annex 2.1).

And the identical utilizing vector RAG, which fully misses the content material

=== vector RAG INTERACTION ===
Query: what does annexure desk a2.1.1 say
Retrieved 5 chunks.
Response:
Primarily based on the offered context, there isn't a info out there describing the contents of a desk labeled "A2.1.1."

The paperwork record "A2.1.1" within the desk of contents as **"Research included within the literature assessment"** (web page 93). Nevertheless, the detailed content material offered in your search outcomes pertains to different tables, particularly:

*   **Annex 2.1:** Describes a abstract of the literature on the expansion elasticity of employment, noting that "an in depth literature search was carried out of research on the elasticity of employment with respect to output development."
*   **Annex Determine A2.1.1 (referenced as A2..1.11.1 within the textual content):** Titled "Estimates of output elasticity of employment," which states, "The literature supplies a variety of estimates on the expansion elasticity of complete employment."
*   **Annex Desk A2.3.3:** Offers particular regression information relating to the robustness assessments for the feminine employment ratio.

There isn't any particular information or content material offered for Annex Desk A2.1.1 past its title within the desk of contents.

Question requiring Structural data of the doc

The question is what are the questions answered by chapter 2? Right here is the response from Vectorless RAG. It exactly positioned Chapter 2 and the part of “Questions” inside that and printed it verbatim, together with the grounding info for explainability.

=== VECTORLESS RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Nodes Retrieved: 0098
Response:
Primarily based on the offered excerpts, Chapter 2 addresses the next questions:

*   **How does South Asia’s employment evaluate with that of different rising market and growing economies (EMDEs)?**
*   **What are the correlates of long-run employment ratios amongst EMDEs?**
*   **How can South Asia create the roles wanted to soak up its rising working-age inhabitants, whereas additionally boosting productiveness?**

These questions are explicitly listed within the part titled **"Questions."**

The chunks in vector RAG don’t preserve the structural hierarchy and part integrity, so whereas it locates the “Questions” part, it’s not capable of finding the precise questions in it.

=== vector RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Retrieved 5 chunks.
Response:
Primarily based on the offered doc, Chapter 2—titled "Jobless Growth"—is structured to handle particular inquiries relating to labor markets and financial development.

The chapter explicitly identifies a "Questions" part (web page 65) to border its evaluation. Whereas the doc doesn't record the precise textual content of those questions, it signifies that the chapter covers the next thematic areas and analytical elements:

*   **Tendencies and Productiveness:** The chapter examines "Tendencies in employment and labor productiveness" (web page 67) and addresses "Labor productiveness development" (web page 73).
*   **Employment Ratios:** A good portion of the chapter is devoted to "Lengthy-run employment ratios" (web page 75) and their varied "Correlates," together with:
    *   **Coverage and Structural Elements:** Tax regimes, coverage uncertainty (web page 77), and commerce openness (web page 79).
    *   **Market and Monetary Elements:** Entry to finance (web page 81), in addition to product, labor, and land markets (web page 82).
    *   **Human Capital and Agency Traits:** Training (web page 83) and institution dimension (web page 78).
*   **Progress Dynamics:** The chapter explores the "Engines of output development in EMDEs" (web page 73) and the "Sectoral employment composition" (web page 72).
*   **Coverage Implications:** The analysis concludes by addressing "Coverage choices to spice up employment" (web page 85).

So the query turns into: can we retain these benefits with out paying the fee?

Engineering a Higher Retriever — Proxy-Pointer RAG

Let’s construct Proxy-Pointer RAG to reply that query. The core perception is intuitive and as follows:

You don’t want LLM summaries to offer a vector database structural consciousness. You simply must encode the construction into the embeddings themselves.

The system makes use of the identical structural tree utilizing PageIndex — however with out the costly summarization flag set. Constructing this skeletal tree requires no costly LLM calls throughout indexing. The tree is constructed purely from regex-based heading detection, which runs in milliseconds.

Then, as an alternative of asking an LLM to navigate the tree, we let FAISS do the retrieval — however we engineer the chunks in order that FAISS “understands” the place every chunk lives within the doc’s hierarchy.

Here’s a view of the Ingestion pipeline:

Construct a Skeleton Tree

PageIndex’s tree parser doesn’t truly want an LLM to construct the structural hierarchy. The heading detection is regex-based — it finds Markdown headers (#, ##, ###) and builds the nesting from indentation ranges. The LLM is simply used to summarize every node.

We name the LLM-free model a Skeleton Tree: similar construction, similar node IDs, similar line boundaries — however no summaries.

# Construct skeleton tree — no LLM, runs in milliseconds
pageindex = PageIndex(doc_path, enable_ai=False)
tree = pageindex.build_structure()  # Pure regex parsing

The skeleton tree and the summarized tree produced for the sooner Vectorless RAG produce similar buildings — similar 137 nodes, similar nesting depths, similar line numbers, similar titles. The one distinction is the lacking abstract area.

Value: $0. Time: < 1 second.

Structural Metadata Pointers (The Core Differentiator)

That is the center of why PageIndex works so properly — and the trick we’ll undertake.

In commonplace vector RAG, a retrieved chunk is the context. No matter 500 phrases FAISS returns, that’s what the LLM sees. If the chunk begins mid-sentence or ends earlier than the important thing information level, the response will miss the intent of the question fully (as illustrated within the earlier part on Vectorless vs vector RAG comparability).

PageIndex does one thing essentially totally different: the chunk is rarely the context. Every node within the tree is aware of its actual place within the unique doc — its title, its node ID, and crucially, the begin and finish line numbers of your complete part it represents. When retrieval selects a node, PageIndex goes again to the unique Markdown file and slices out the full, contiguous part between these line boundaries.

We replicate this precisely. Each chunk we embed into the vector index carries wealthy structural metadata from the tree node:

metadata = {
    "doc_id": "SADU",           # Which doc
    "node_id": "0012",          # Which structural node
    "title": "Introduction",    # Part heading
    "start_line": 624,          # The place the part begins within the unique file
    "end_line": 672             # The place the part ends
}

At retrieval time, we don’t feed the matched chunks to the LLM. As a substitute, we:

Use the chunks as proxies — they’re solely there to establish which nodes are related. Take away duplicate (doc_id, node_id) combos to get distinctive top-k.
Observe the metadata pointers — open the unique Markdown, slice traces of nodes, e.g. 624 to 672
Ship the total sections — the LLM receives the whole, pristine, structurally-intact textual content

Here’s a view of the retrieval pipeline:

This implies even when a bit solely matched on a single sentence deep inside a bit, the synthesis LLM will get the complete part — with its header, its context, its figures, its conclusions. The chunk was disposable; the pointer is what issues.

This is the reason I name it Proxy-Pointer RAG: the vectors are proxies for location, the metadata are tips to the actual content material.

Value: $0. Affect: Transforms context high quality from fragmented chunks to finish doc sections.

That is key to reply the queries associated to particular part of the doc (equivalent to Chapter 2). Normal vector RAG embeds uncooked textual content:

"Whereas personal funding development has slowed in each South Asia and different EMDEs..."

FAISS has no thought this chunk comes from Chapter 1, below Financial Exercise, inside Field 1.1. So when person asks “most important messages of Chapter 1,” this chunk gained’t rank extremely — it doesn’t comprise the phrases “Chapter 1” or “most important messages.”

Breadcrumb injection prepends the total ancestry path from the Skeleton Tree to each chunk earlier than embedding:

"[Chapter 1. Deceptive Strength > Economic activity > Regional developments > BOX 1.1 Accelerating Private Investment]
Whereas personal funding development has slowed in each South Asia and different EMDEs..."

Now the embedding vector encodes each the content material AND its structural location. When somebody asks about “Chapter 1,” FAISS is aware of which chunks belong to Chapter 1 — as a result of the phrases “Chapter 1. Misleading Energy” are current within the embedding.

# Construct breadcrumb from ancestry
current_crumb = f"{parent_breadcrumb} > {node_title}"

# Prepend to chunk textual content earlier than embedding
enriched_text = f"[{current_crumb}]n{section_text}"
chunks = text_splitter.split_text(enriched_text)

It is a zero-cost encoding of the tree construction into the vector area. We’re utilizing the identical embeddings API, the identical FAISS index, the identical retrieval code. The one distinction is what we feed into the embedder.

Value: $0 additional. Affect: Transforms retrieval high quality for structural queries.

Normal vector RAG applies a sliding window throughout the complete doc — a 2000-character window that strikes ahead with some overlap, fully oblivious to the doc’s construction. A bit would possibly begin mid-paragraph within the Introduction and finish mid-sentence in a Determine caption. The boundaries are arbitrary, and each chunk is an island in itself, with no data of its place within the total doc construction.

Proxy-Pointer does one thing essentially totally different: we stroll the tree, not the textual content.

For every node within the skeleton tree, we extract solely its personal part textual content — from start_line to end_line — after which apply the textual content splitter to that remoted part. If a bit is brief sufficient, it turns into a single chunk. If it’s longer, the splitter divides it — however strictly inside that part’s boundaries.

Normal RAG:  Blind sliding window throughout complete doc
[====chunk1====][====chunk2====][====chunk3====]...
    ↑ would possibly begin in Introduction, finish in Determine caption

Proxy-Pointer: Chunk inside every node's boundaries
Introduction (traces 624-672)     → [chunk A] [chunk B]
Financial Exercise (traces 672-676) → [chunk C]
BOX 1.1 (traces 746-749)          → skipped (< 100 chars)
Inflation (traces 938-941)        → [chunk D]

This ensures three issues:

Chunks by no means cross part boundaries — a bit from Introduction won’t ever overlap with Financial Exercise
Every chunk belongs to precisely one node — so the node_id metadata is all the time exact
Breadcrumbs are correct per-chunk — they replicate the precise structural container, not a guess

Importantly, when a node is skipped (as a result of its textual content is just too quick — e.g., a “BOX 1.1” heading with no physique content material), the tree stroll nonetheless recurses into its kids. The precise content material lives in little one nodes like “Introduction,” “Options,” and “Figures” — all of which get embedded with the dad or mum’s title of their breadcrumb (eg: BOX 1.1 Accelerating Non-public Funding > Introduction, BOX 1.1 > Options of...). No content material is ever misplaced; solely empty structural headers are excluded.

Value: $0. Affect: Each chunk is structurally traceable — enabling exact metadata pointers.

Noise Filtering (Eradicating Distractions)

Some sections in any doc are irrelevant for retrievals and contaminate the context: they comprise key phrases that match virtually each question however present no helpful content material.

Desk of Contents — mentions each chapter title (matches “Chapter 1,” “Chapter 2,” every little thing)
Government Abstract — paraphrases each key discovering (matches each subject question)
Abbreviations — lists each acronym used within the doc
Acknowledgments — mentions organizations, nations, and themes

These sections act as distractions in vector area. They sit close to the centroid of the doc’s embedding distribution, pulling retrieval towards shallow overview textual content as an alternative of deep analytical content material.

We apply a easy title-based filter throughout indexing:

NOISE_TITLES = {
    "contents", "abstract of contents", "desk of contents",
    "abbreviations", "acknowledgments", "foreword",
    "govt abstract", "references"
}

if title.strip().decrease() in NOISE_TITLES:
    proceed  # Skip this node fully

This filter eliminated 7 nodes from our check doc. The influence was instant: for the question “What are the principle messages of Chapter 1?”, the system stopped retrieving the Government Abstract and Desk of Contents and began retrieving the precise Introduction part.

Value: $0. Affect: Eliminates the commonest retrieval failure mode.

That’s it. Proxy-Pointer RAG is now prepared to check.

Check Proxy-Pointer pipeline

Whereas detailed comparability is talked about within the subsequent part, listed below are the outcomes of the identical two queries that had been cited within the “Vectorless vs Flat vector RAG” part. As you may evaluate, the solutions are as correct and detailed as Vectorless RAG utilizing PageIndex.

Question about desk content material

=== PROXY-POINTER RAG INTERACTION ===
Query: what does annexure desk a2.1.1 say
Distinctive Context Nodes Extracted:
  - [SADU] Node 0131: 'ANNEX TABLE A2.1.1 Research included within the literature assessment' | Strains: 3952-4157
  - [SADU] Node 0092: 'ANNEX TABLE SL.B1.1.1 Marginal chance of research documenting above-average impact for poor-average households' | Strains: 2537-2745
  - [SADU] Node 0052: 'TABLE B1.1.1 Correlates of the chance of personal funding accelerations beginning' | Strains: 1663-1899

Response:
Primarily based on the offered paperwork, **Annex Desk A2.1.1** is titled **"Research included within the literature assessment."**

This desk supplies a abstract of assorted analysis research, detailing the next info for every:
*   **Paper:** The creator(s) and yr of the research (e.g., Kapsos (2005), Saget (2000), Papola and Sahu (2012), and many others.).
*   **Interval of most important pattern:** The timeframe coated by the research (starting from the Sixties to 2019).
*   **Pattern protection:** The dimensions or variety of entities analyzed (e.g., "160 nations," "90 growing nations," "47 African nations").
*   **Area:** The geographic scope of the analysis (e.g., Worldwide, South Asia, Africa, Europe, Center East, Asia).
*   **Economies:** The particular nations or financial groupings included within the analysis (e.g., "India," "Bulgaria, Czech Republic, Estonia...", "G20 nations").

The desk paperwork a various set of analysis spanning from 1960 by 2019, masking areas and economies throughout the globe, together with particular information on transition economies, varied OECD nations, and particular areas equivalent to South Asia and Africa.

Question requiring Structural data

=== PROXY-POINTER RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Distinctive Context Nodes Extracted:
  - [SADU] Node 0099: 'FIGURE 2.3 Progress fashions' | Strains: 3127-3155
  - [SADU] Node 0098: 'Questions' | Strains: 3115-3127
  - [SADU] Node 0124: 'ANNEX 2.1 Abstract of literature on the expansion elasticity of employment' | Strains: 3807-3812

Response:
Primarily based on the offered textual content, Chapter 2 addresses the next questions relating to South Asia’s financial development and labor market:

*   **How does South Asia’s employment evaluate with that of different rising market and growing economies (EMDEs)?** (Supply: *Chapter 2, Questions part*)
*   **What are the correlates of long-run employment ratios amongst EMDEs?** (Supply: *Chapter 2, Questions part*)
*   **How can South Asia create the roles wanted to soak up its rising working-age inhabitants, whereas additionally boosting productiveness?** (Supply: *Chapter 2, Questions part*)

Moreover, the chapter explores these points with the purpose of understanding how the area can harness its working-age inhabitants to speed up convergence towards the revenue ranges of superior economies.

Whereas we’ve got demonstrated the fee benefit to this point, does the structure overcome the scalability problem of PageIndex?

Is Proxy-Pointer Scalable?

Right here is why the structure is scalable throughout an enterprise data base. PageIndex pays a scalability penalty at each ends: ~137 LLM calls per doc throughout indexing, and a further LLM reasoning step per question for tree navigation. Proxy-Pointer eliminates each.

No LLM at indexing. The skeleton tree is regex-built in milliseconds. The one API calls are to the embedding mannequin — similar to straightforward vector RAG.
No tree navigation at retrieval. Queries go straight to the vector index. No LLM studying summaries, no per-document traversal.

Proxy-Pointer is commonplace vector RAG with clever metadata baked in. The structural consciousness lives contained in the embeddings (by way of breadcrumbs) and the chunk metadata (by way of node pointers) — not in an LLM reasoning loop. It inherits all of vector RAG’s scalability: unified multi-document indexes, sub-linear search, incremental updates, and 0 per-query LLM overhead past the ultimate synthesis.

Fail-safe for unstructured paperwork: If a doc has no headings — or the skeleton tree produces solely a single root node — the system detects this throughout chunking and falls again to a normal sliding window. Chunks are flagged with empty node_id and line boundaries. At retrieval time, flagged chunks are used instantly as LLM context as an alternative of following pointers again to the supply. The system gracefully degrades to straightforward vector RAG — no errors, no particular dealing with required.

Let’s evaluate Vectorless RAG and Proxy-Pointer head-to-head.

Vectorless vs Proxy-Pointer RAG

I ran a wide range of queries — broad structural, cross-reference, particular factual, determine particular and many others. And let Claude choose the responses for a complete comparability. You will discover the detailed responses from Vectorless and Proxy-Pointer together with the total Quality Comparison report here.

The next desk encapsulates the decision. The Last Rating: PageIndex 2 — Proxy 4 — Ties 4 . In different phrases, Proxy-Pointer matches or beats PageIndex on 8 out of 10 queries. And all on the scalability and value of a Flat vector RAG.

Right here is the abstract verdict:

#	Question Kind	Winner
1	Broad structural (Ch.1 messages)	🔴 PageIndex
2	Broad structural (Ch.2 messages)	🔴 PageIndex (slim)
3	Particular factual (Field 1.1 options)	🟡 Tie
4	Cross-reference (inflation tables)	🟢 Proxy-Pointer
5	Comparative (India vs area)	🟢 Proxy-Pointer
6	Determine-specific (B1.1.1 tendencies)	🟢 Proxy-Pointer
7	Direct lookup (Annexure A2.1.1)	🟡 Tie
8	Entity-specific (foreign money disaster nations)	🟡 Tie
9	Navigational (Ch.2 questions)	🟡 Tie
10	Inferential/coverage (govt vs shocks)	🟢 Proxy-Pointer

And right here is the fee comparability:

Metric	PageIndex	Proxy-Pointer	Normal vector RAG
Indexing LLM calls	~137 per doc	0	0
Indexing time	5-10 min/doc	< 30 sec/doc	< 30 sec/doc
Retrieval high quality	★★★★★	★★★★★ (8/10 vs PageIndex)	★★★☆☆
Multi-doc scalability	Poor (per-doc tree nav)	Wonderful (unified vector index)	Wonderful
Structural consciousness	Full (LLM-navigated)	Excessive (breadcrumb-encoded)	None
Index rebuild on replace	Costly (re-summarize)	Low cost (re-embed affected nodes)	Low cost
Explainability	Excessive (part titles + doc IDs)	Excessive (part titles + doc IDs)	Low (opaque chunks)

Key Takeaways

Construction is the lacking ingredient in RAG. The standard hole between naive vector RAG and PageIndex isn’t about higher embeddings — it’s about preserving hierarchy.
You don’t want an LLM to encode construction. Breadcrumb injection and structural metadata give the vector index, structural consciousness with none value.
Noise filtering beats higher embeddings. Eradicating 7 low-value nodes from the index had extra influence on retrieval high quality than any mannequin swap may.
Pointers beat chunks. Chunks act as proxies for the total part, which is what the synthesizer LLM sees.

Conclusion

Proxy-Pointer RAG proves a easy thesis: you don’t want an costly LLM to make a retriever structurally conscious — you simply should be intelligent about what you embed.

5 zero-cost engineering strategies — skeleton timber, metadata pointers, breadcrumbs, structure-guided chunking, and noise filtering — shut the standard hole with a full LLM-navigated system, whereas conserving the velocity and scalability of ordinary vector RAG. On our 10-query benchmark, Proxy-Pointer matched or beat PageIndex on 8 out of 10 queries, at the price of a normal vector RAG.

The following time you’re constructing RAG on your structured (or unstructured) doc repository, don’t attain for an even bigger mannequin. Attain for Proxy-Pointer index.

Join with me and share your feedback at www.linkedin.com/in/partha-sarkar-lets-talk-AI

Reference

World Bank. 2024. South Asia Development Update, April 2024: Jobs for Resilience — License: CC BY 3.0 IGO.

_{Pictures used on this article are generated utilizing Google Gemini. Code created by me.}

vector-rag-scale-and-cost/”>Supply hyperlink

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.

Proxy-Pointer RAG: Reaching Vectorless Accuracy at Vector RAG Scale and Value

Use Case Setup

How does PageIndex work?

Part 1: Indexing(as soon as per doc)

Part 2: Retrieval (Per Question)

Why this works so properly?

Why that is tough to scale?

Comparability of Vectorless vs Flat vector RAG

Question about desk content material

Question requiring Structural data of the doc

Engineering a Higher Retriever — Proxy-Pointer RAG

Construct a Skeleton Tree

Structural Metadata Pointers (The Core Differentiator)

Breadcrumb Injection (Structural Context)

Construction Guided Chunking (No Blind Sliding Home windows)

Noise Filtering (Eradicating Distractions)

Check Proxy-Pointer pipeline

Question about desk content material

Question requiring Structural data

Is Proxy-Pointer Scalable?

Vectorless vs Proxy-Pointer RAG

Key Takeaways

Conclusion

Reference

Table of contents [hide]

Read More