Corporate RAG — a permission-scoped knowledge platform for ~500 internal users
A UK building repair, maintenance and refurbishment group (under NDA) ran their bid and commercial work on terabytes of past projects scattered across SharePoint. Estimators routinely spent ~15 minutes hunting for the one analogous job that would let them scope and price a new client request. We built a permission-scoped RAG platform on AWS Kendra, Bedrock and OpenFGA that turned that into a question and an answer. Then the rest of the business found it.
Client
A UK building repair, maintenance and refurbishment group (under NDA). Project delivered for the bid & commercial department, with role-based access for ~500 internal users across departments.
Engagement
Discovery → PoC → full internal portal → expansion into broader internal use cases. Delivered as a self-hosted platform on the client's AWS estate (eu-west-1 / eu-west-2).
Terabytes of project history, none of it findable when a bid was on the clock.
The bid team is responsible for scoping and pricing repair, maintenance and refurbishment work across the UK. Every new request from a client is, in practice, a question against the firm's own history: have we done this type of building before, in this region, at this scale — and what did it cost and how did it go? The answer was always somewhere in SharePoint: in tender packs, scopes of work, site visit notes, completion certificates, meeting summaries, letters, photo packs. The estimators just couldn't find it inside the time budget of a bid.
Two constraints turned this from "ship a RAG demo" into a real engineering problem. First: permissions. Not every estimator can see every project. SharePoint already encoded an ACL graph by department, role, and folder — and the platform had to respect it. A RAG system that returns content the user wasn't allowed to see is not a productivity tool; it's a data incident. Second: provenance. A pricing decision based on a hallucinated past project is worse than no answer at all. Every claim had to point back to a specific document at a specific page in SharePoint, behind the same login the user already uses.
Permissions before retrieval. ReAct agent with CRAG grading. Schema-typed answers with SharePoint citations.
Five layers, each tied to a specific failure mode of "RAG over SharePoint" in real enterprise life.
Identity and authorization, not just an API key. Every request enters through Keycloak SSO (the client's existing IdP). On top of Keycloak we layered OpenFGA for fine-grained authorization: users belong to departments, departments are granted access to applications and document scopes, and roles (admin, member, super_admin) gate write vs. read. Retrieval is filtered before the LLM sees anything — Kendra results are intersected with the FGA decision graph for the current user.
Structured ingestion of real-world office documents. Past projects live as PDFs, PPTX slide decks, DOCX tender responses, XLSX cost models, and email archives. We built an AWS Lambda ingestion pipeline with dedicated layers per format (pdf-parse / pdf-lib for PDF, node-pptx-parser for slides, LibreOffice for legacy formats, an XLSX layer for cost models). Documents land in S3, are parsed into Kendra with metadata preserved (department, project ID, document type, SharePoint URL, owner, last-updated date), and stay re-syncable from SharePoint.
Retrieval as a ReAct agent, not a single embedding call. Every question flows through an intent classifier (knowledge query, small talk, clarification, off-topic, meta, ambiguous) and a query analyzer that scores complexity, extracts entities and identifies temporal references. Complex multi-part questions are routed through a query decomposer that emits ordered sub-queries with dependency types (sequential, conditional, contextual). The ReAct agent then runs thought–action–observation loops over Kendra, with a CRAG-style document grader that labels each candidate highly_relevant / partially_relevant / not_relevant and either proceeds, rewrites the query and retries, or returns nothing.
Schema-typed answers with inline citations. Generation runs on AWS Bedrock (Anthropic Claude Sonnet, eu-west-1) via the Vercel AI SDK. The response is a Zod-typed object with two fields: aiAnswer — text with inline citation markers [1] [2] plus a references array of (title, SharePoint URL, page number) — and documentResults — the supporting documents with type (MSA / SOW / Contract / Proposal / Report), owner, last-updated date and relevance percentage. The UI renders citations as clickable SharePoint links that respect the same Keycloak session, so provenance is one click away, behind the same authorization the user already has.
Quality guards on the way in and on the way out. A prompt-injection / input-sanitization layer guards the prompt path. After generation, a response validator and a hallucination checker cross-reference each claim against the retrieved evidence and grade grounding per claim. The whole pipeline is wired into a promptfoo eval harness — 30+ use cases (document discovery, email summarise / respond / draft, SOW creator, business case writer, social posts, press releases, etc.) with rubric-as-code LLM-judge assertions running on every change.
Architecture (data flow)
Document search collapsed from ~15 minutes to seconds, on a permissioned source of truth that the rest of the business asked for next.
Faster document search
From ~15 minutes of folder-spelunking to a sourced answer in seconds — measured against the estimator workflow the platform replaced.
Internal users
Role-based access across the bid & commercial department and adjacent functions. Every retrieval is scoped through OpenFGA against the user's department membership and document grants.
Indexed corpus
Tender packs, SOWs, contracts, proposals, reports, meeting summaries, letters and email archives — kept in sync with SharePoint, surfaced through a single search and chat surface.
Use cases shipped
Document discovery was the wedge. Email summarise / respond / draft, SOW creator, business case writer, task estimator, meeting-minute refiner, press releases, social posts, and Jira ticket creation followed.
Permission bypasses
Retrieval is intersected with the FGA decision graph before the LLM sees a chunk. Users see only what they were already allowed to see in SharePoint.
Eval gates on every change
promptfoo runs LLM-judge rubric assertions across the use-case library on every pipeline change. Quality regressions are caught before they ship.
Most "enterprise RAG" demos die the moment real permissions show up. Building this platform was the opposite engineering problem from the public RAG benchmarks: identity, authorization and provenance came first, and the retrieval architecture had to live inside that constraint. That's where the value lives.
Five decisions that made the platform survive contact with the actual business.
1. Authorize before you retrieve. OpenFGA isn't an afterthought layer over a search index — it's the gate that decides what Kendra is even allowed to return for this user. The decision graph mirrors SharePoint's department and folder ACLs, so the legal/HR question of "who can see what" never moves into the LLM's prompt.
2. Treat the document estate like a system, not a folder. Real enterprise corpora are PDFs, PPTX decks, DOCX tenders, XLSX cost models, .msg email archives and image-only scans. Dedicated Lambda layers per format (including LibreOffice for legacy conversions) gave us a unified pipeline into Kendra without losing tables, slide notes, or page hierarchy.
3. ReAct with CRAG, not one-shot retrieval. A document grader that can say "this is partially relevant, try a rewrite" beats a higher top-k and a longer context window. Most production failures we've seen are retrieval failures pretending to be generation failures.
4. Schema-typed responses with clickable provenance. A Zod schema that requires aiAnswer + references[] + documentResults[] with SharePoint URLs forces every claim to point somewhere a human can verify. That's also what made bid & commercial trust the system enough to actually use it.
5. Use cases as a portfolio with CI gates. Document discovery was the wedge but not the destination. promptfoo evals on every prompt/use case meant we could add SOW generation, email composition, Jira ticket creation, social posts and meeting-minute refining without quietly regressing search quality.
Discovery → PoC → platform → expand.
- Discovery
Sources, permissions and roles
Mapped SharePoint sources, ACL structure, department/role hierarchy, and the bid workflow the platform had to replace. Designed the OpenFGA schema and the evaluation set.
- PoC
Kendra + Bedrock baseline on a real slice
Stood up Kendra (eu-west-2) and Bedrock (eu-west-1), wired Keycloak + OpenFGA, indexed a representative slice of the corpus, and validated end-to-end retrieval and authorization before scaling.
- Platform
Internal portal, multi-use-case backend
NestJS 11 backend with intent classifier, query analyzer, decomposer, CRAG grader, ReAct agent and Zod-typed generation. React 19 + Vite frontend with a custom
useRagChathook for structured responses. Lambda ingestion pipeline. AWS SES for transactional email, Jira API for ticket creation, md-to-pdf / remark-docx / pdf-lib for document generation. promptfoo eval harness in CI. - Expand
From bid support into broader internal use cases
Document discovery was the wedge. The platform then absorbed email summarise / respond / draft, SOW creation, business case writing, social posts, press releases, task estimation, meeting-minute refinement and Jira ticket creation — each shipped as an evaluated use case behind the same auth and provenance layer.
Same engineering DNA, different problems.
We've built this platform once. We can build it for you.
Bring us your document estate, your auth provider and the workflow you're trying to compress. We'll come back with the architecture, the eval set we'd run against it, and what it costs to ship and run.