Architecture · Deep dive · AI backend

Private AI and RAG Backend Architecture for WordPress

A deep dive into the AI‑Kit backend: local-first WordPress AI, backend fallback, Bedrock-powered generation, Knowledge Base retrieval, frontend/admin API separation and static export friendly protection.

Architecture thesis: AI for WordPress should not be a black-box SaaS proxy hidden behind a plugin button. The safer pattern is a local-first plugin experience with an optional customer-owned AWS backend for model calls, RAG, citations, guardrails, reCAPTCHA, WAF, logs and knowledge-base ingestion.

Explore AI-Kit

Read the RAG solution

Why WordPress AI needs an architecture, not only a prompt box

Adding an AI button to WordPress is easy. Designing where content is processed, which endpoints are public, how grounding works, how model costs are controlled, and how a static frontend can call the backend safely is the hard part.

AI‑Kit starts with the least invasive path: local, on-device browser AI when available. That works well for editor-side rewriting, translation, metadata generation and lightweight tasks. But production websites often need more: fallback when local AI is unavailable, frontend chatbots, DocSearch, multimodal prompts, citations, knowledge-base grounding and protection against public endpoint abuse.

The AI‑Kit backend is the serverless side of that model. It moves backend AI execution into the customer’s AWS account instead of routing every request through the WordPress server or a shared plugin vendor runtime.

System boundary

WordPress editor / Media Library / frontend blocks
        │
        ├─ local mode when browser AI is available
        │
        ▼
AI-Kit JavaScript runtime
        │
        ├─ /admin/* routes for WP Admin and trusted operations
        └─ /frontend/* routes only when public features are enabled
                 │
                 ▼
          Amazon API Gateway
                 │
                 ▼
          AiHandlerFunction
  prompt, writer, rewriter, summarizer,
  translator, language detector, proofreader,
  upload URL helper and knowledge-base listing
                 │
        ┌────────┴────────┐
        ▼                 ▼
Amazon Bedrock        Knowledge Base pipeline
Nova models           S3 documents + S3 Vectors
Guardrails            DynamoDB debounce state
                      EventBridge scheduler
                      KnowledgeBaseSyncFunction

The frontend does not need model provider keys. WordPress does not need to proxy prompts through PHP. Static exports can still use frontend routes when those routes are explicitly enabled and protected with the selected public-access model.

What the AI‑Kit backend stack provisions

Building block	Purpose	Key design choice	Operational note
API Gateway REST API	Exposes admin and optional frontend AI routes	/admin/* is always the trusted surface; /frontend/* appears only when feature toggles enable it	Keep public and privileged routes separate even when they share handler code.
AiHandlerFunction	Unified Lambda handler for prompt and language capabilities	One warm execution path for prompt, write, rewrite, summarize, translate, proofread, detect-language and KB listing	Shared validation, metrics, guardrails and middleware reduce operational spread.
Amazon Bedrock models	Runs generation, translation-style tasks and RAG answer synthesis	Frontend model can be cheaper/lighter than admin model through parameterized model selection	Model IDs are architecture parameters, not hard-coded plugin assumptions.
Bedrock Knowledge Base + S3 Vectors	Provides managed retrieval over documentation or client content	Create a new KB or reuse an existing one via stack parameters	KnowledgeBaseId and DataSourceId outputs are part of the integration contract.
Docs and temp assets S3 bucket	Stores KB documents and temporary prompt image uploads	Separate document prefix, configuration prefix and temp-assets prefix	Temp image objects should expire quickly; KB documents should be versioned deliberately.
KnowledgeBaseSyncFunction	Runs document ingestion workflows	S3/EventBridge/DynamoDB debounce loop avoids starting ingestion for every small file event	Useful when WordPress regenerates multiple KB documents during publishing.
reCAPTCHA + SSM/KMS	Protects open frontend endpoints	Secret stored as encrypted SSM parameter and fetched by handlers	Important when FrontendApiAuthMode is NONE for static-site friendliness.
AWS WAF and throttling	Limits abuse on public/admin API paths	Separate allow/deny/rate rules for frontend and admin surfaces	AI endpoints are cost-bearing; public access needs protection beyond CORS.
CloudWatch, DLQ and alerts	Provides logs, metrics, failed invocation capture and optional notifications	Per-function log retention, custom metrics and shared SQS DLQ	AI features need observability because cost, latency and quality are all runtime concerns.

Endpoint surface: one backend, two trust zones

The stack’s most important product decision is the separation between admin and frontend routes. The same capability may exist in both zones, but the trust model is different.

Capability	Admin route	Frontend route	When frontend is created	Design note
Prompt / chatbot / DocSearch	/admin/prompt	/frontend/prompt	EnableChatbotBackend=true	Can use KB, citations, regeneration, feedback metadata and optional image inputs.
Generate upload URL	/admin/generate-upload-url	/frontend/generate-upload-url	EnableChatbotBackend=true	Uploads images to S3 via presigned PUT, then passes keys to prompt requests.
Summarize	/admin/summarize	/frontend/summarize	EnableSummarizerBackend=true	Summarization usually disables KB by default because the source text is already supplied.
Writer	/admin/write	/frontend/write	EnableLanguageAIBackend=true	Useful as backend fallback for editor or frontend generation features.
Rewriter	/admin/rewrite	/frontend/rewrite	EnableLanguageAIBackend=true	Honors tone, format and length controls while keeping secrets out of the browser.
Translator	/admin/translate	/frontend/translate	EnableLanguageAIBackend=true	Can be paired with automatic language detection flows.
Proofreader	/admin/proofread	/frontend/proofread	EnableLanguageAIBackend=true	Returns corrected text and structured correction metadata.
Language detector	/admin/detect-language	/frontend/detect-language	EnableLanguageAIBackend=true	Uses a backend language detection path instead of assuming the browser always provides it.
Knowledge bases	/admin/knowledge-bases	Not a public route	Admin only	Listing and selecting backend knowledge resources belongs to trusted configuration UX.

This is why “AI backend” is not one checkbox. Chat, summarization and language tools have different exposure, cost and abuse profiles. The stack lets them be enabled independently instead of publishing every route by default.

The RAG pipeline

User question or DocSearch query
        │
        ▼
Prompt route receives request
        │
        ├─ validate auth / reCAPTCHA / WAF path expectations
        ├─ apply guardrail and input checks
        │
        ▼
Query builder predicts category, subcategory and tags
        │
        ▼
Bedrock Knowledge Base retrieval
        │
        ├─ optional strict metadata filtering
        ├─ optional rerank
        └─ retrieved snippets with citation spans
        │
        ▼
Answer template selection
        ├─ KB_ONLY
        ├─ ASK_WHEN_NO_KB
        └─ KB_PREFERRED
        │
        ▼
Bedrock generation with citations and grounding policy
        │
        ▼
AI-Kit renders answer, sources and highlights in WordPress UI

The key point is that RAG is not just “send documents to a model.” The backend has to decide what sources are allowed, how categories map to metadata, whether an answer may fall back to general knowledge, and how citation spans are returned to the WordPress interface.

Grounding policy is a product decision

Different content categories should not all behave the same way. A product documentation chatbot can answer from general knowledge when asked about a generic concept. A medical, legal, financial or internal policy assistant may need to refuse when the knowledge base does not contain an answer.

Grounding mode	When to use	Behavior when KB has no relevant snippets	Editorial implication
KB_ONLY	Regulated, high-stakes or strict documentation answers	State that the documentation does not contain the requested information	Authors must keep the KB complete enough for expected questions.
ASK_WHEN_NO_KB	Ambiguous source sets or category-dependent answers	Ask one clarification question instead of guessing	Metadata taxonomy becomes part of UX design.
KB_PREFERRED	Marketing, product education and general support	Use KB when available; otherwise answer with clear separation from retrieved docs	Good balance for public websites that mix documentation and general explanation.

AI‑Kit exposes this through knowledge-base configuration rather than hard-coding one behavior for every site. Category policies, strict metadata filtering and prompt templates turn content governance into an operational layer that editors and developers can understand.

Knowledge Base ingestion lifecycle

A RAG system is only as useful as its ingestion lifecycle. For WordPress, that lifecycle starts in familiar places: posts, pages, custom post types, KB sections and editor-controlled source selection. The backend lifecycle starts when those documents are written to the S3 document bucket.

WordPress KB source selected or regenerated
        │
        ▼
Markdown document + optional metadata.json
        │
        ▼
S3 DocsBucket under documents/ prefix
        │
        ▼
S3 event or scheduled ingestion path
        │
        ▼
DynamoDB debounce lock/state
        │
        ▼
EventBridge Scheduler
        │
        ▼
KnowledgeBaseSyncFunction
        │
        ▼
Bedrock ingestion job updates Knowledge Base vectors

The debounce layer matters because publishing systems often update several files at once. Starting a full ingestion job for every small object event is noisy and expensive. A single sync worker with DynamoDB state and EventBridge scheduling gives the system room to batch changes into a more predictable ingestion rhythm.

Static export friendly AI

Static WordPress complicates many traditional plugin assumptions. The public visitor cannot call a WordPress AJAX endpoint if WordPress is not in the request path. AI‑Kit handles this by making frontend AI calls browser-to-backend calls, not browser-to-WordPress-to-model calls.

Static-safe

No PHP proxy required

The browser calls the configured backend API directly. WordPress can be offline or private after publishing, as long as the static site and backend API are reachable.

Public-safe

No model keys in the browser

The frontend receives an API URL and feature configuration. Bedrock access remains in Lambda IAM permissions, not JavaScript source code.

Abuse-aware

Open endpoints need controls

When frontend auth mode is NONE, the architecture expects controls such as reCAPTCHA, WAF, rate limits and route-level feature toggles.

Authentication and protection matrix

Surface	Default posture	Possible auth modes	Recommended protection	Why
Admin AI routes	Trusted/admin	IAM or Cognito	IAM by default, optional IP allow list, logs and alerts	These routes can expose broader capabilities and should not be public.
Frontend chatbot	Public or member-facing	NONE, IAM or Cognito	reCAPTCHA + WAF for public; Cognito scopes for member-only	Chat endpoints are cost-bearing and can receive arbitrary user input.
Frontend summarizer/language tools	Feature-gated public surface	NONE, IAM or Cognito	Only enable needed routes; throttle and validate payload size	Each route adds an abuse surface and a model-cost surface.
Image upload helper	Temporary asset ingress	Follows prompt surface	Strict content type, size, key prefix and lifecycle expiry	Presigned uploads are powerful and should be bounded tightly.
Knowledge Base management	Admin only	IAM or privileged Cognito	Never expose as anonymous frontend endpoint	KB selection and backend resources are configuration, not visitor UX.

Model selection and cost posture

The backend deliberately separates model choice from plugin code. Admin routes can default to a stronger model, while frontend routes can use a lighter model first and fall back according to stack parameters. That matters for agencies because public chatbot usage, editor features and document search usually have different cost profiles.

Workload	Cost pressure	Quality requirement	Architecture choice
Editor rewrite / translate / proofread	Usually moderate and admin-controlled	Consistent output and low friction	Try local AI first, use backend fallback when unavailable or when policy requires backend processing.
Frontend chatbot	Potentially high because visitors can trigger usage	Grounded, safe, understandable answers	Enable only needed routes, use lighter frontend model, reCAPTCHA/WAF, and bounded response settings.
DocSearch	Depends on search volume and retrieved context	Good citations and accurate source selection	RAG-first approach; optional rerank only where relevance improvement justifies extra calls.
Multimodal prompt	Higher because images add storage and processing	Useful for selected support or content workflows	Use presigned S3 uploads, object size limits and lifecycle expiration.

How the deploy wizard changes the operating model

The AI‑Kit backend is easier to explain when the Deployment Wizard is treated as part of the architecture, not just a helper page. It collects the few decisions that matter to a WordPress administrator — frontend feature set, auth mode and optional protections — and leaves developer-controlled values such as deployment version outside the normal UI.

Wizard decision	CloudFormation effect	WordPress effect
Frontend features	Enables only the backend routes required for chatbot, DocSearch, summarization or language tools.	The plugin can expose only the surfaces the site actually wants to support.
Admin auth mode	Defaults toward Cognito for admin operations and can enforce scopes.	Admin/backend actions are not accidentally treated as public AI calls.
Template source	Uses an S3 template URL that CloudFormation can read.	The user sees an AWS-native stack review flow instead of a hidden SaaS provisioning step.
Outputs	Produces `ApiBaseUrl` and other stack outputs.	WordPress stores the endpoint contract; it does not own the backend runtime.

Operational runbook

Use the AI‑Kit Deployment Wizard to select only the frontend features, auth modes and optional protections the site actually needs, then open the prefilled CloudFormation Create stack review URL.
Choose auth mode per surface: IAM/Cognito for admin, NONE + reCAPTCHA/WAF or Cognito for frontend.
Copy the ApiBaseUrl stack output into WordPress → AI‑Kit Settings → API Settings, then connect any additional feature-specific outputs or identifiers required by the chosen backend mode.
Define KB metadata categories, subcategories and tags before publishing many documents.
Configure grounding policy for categories where hallucination risk matters.
Publish KB documents from WordPress and run or schedule ingestion.
Test editor, static frontend and authenticated/member flows separately.
Watch CloudWatch logs, metrics, DLQ and model-call latency before increasing public exposure.

What makes this different from a typical AI plugin

Typical plugin

Vendor-owned runtime

Requests often leave WordPress for a shared SaaS endpoint, with limited visibility into model choice, logs, guardrails or retrieval architecture.

Local-only plugin

Great when available

Browser AI can be privacy-friendly and cheap, but availability, capability and frontend production needs vary across devices and browsers.

WP Suite pattern

Local-first + owned backend

Use on-device AI where it works, then fall back to a customer-owned AWS backend for RAG, public widgets, admin tooling and governed model access.

When this architecture is a good fit

WordPress sites that need AI features without sending all content through a shared plugin SaaS backend.
Static WordPress frontends that still need chatbot, DocSearch or frontend AI capabilities.
Agencies building repeatable private AI deployments for clients with different content governance requirements.
Documentation, support, healthcare-adjacent, legal-adjacent or enterprise sites where grounding policy matters.
Teams that want AWS logs, IAM, WAF, guardrails, S3 document storage and model access in the customer account.

When not to use it

The site only needs occasional editor-side rewriting and local browser AI already covers the workflow.
The team does not want to own AWS deployment, monitoring, WAF rules, ingestion and model-cost governance.
The chatbot can safely be a simple static FAQ with no retrieval, no personalization and no public AI endpoint.
The content taxonomy is too messy to define useful KB categories, metadata or grounding behavior.

Pillar

WordPress on AWS Reference Architecture

The broader WP Suite model for content, delivery, identity, runtime APIs, AI and workflows.

Runtime split

Static WordPress with Dynamic Runtime on AWS

How static delivery and browser-side runtime calls fit together across identity, AI, forms and custom APIs.

Solution

Private AI for WordPress

The less technical business and implementation framing for privacy-first AI features in WordPress.

FAQ

Does AI‑Kit always send content to AWS?

No. AI‑Kit is local-first where supported browser AI is available. The backend is used for Pro backend-only or fallback modes, frontend chatbot/DocSearch features, and workloads that need Bedrock, RAG, guardrails or server-side processing.

Can this work on a static WordPress site?

Yes. Frontend AI features call the configured backend API directly from the browser. WordPress does not need to proxy the request at page-view time, which makes the architecture compatible with static exports when CORS, auth and feature toggles are configured correctly.

Why separate admin and frontend endpoints?

Admin features and public visitor features have different trust boundaries. Admin routes can require IAM or privileged Cognito access, while frontend routes may need reCAPTCHA, WAF, throttling or member authentication. Keeping the surfaces separate makes the risk model clear.

Is the Knowledge Base mandatory?

No. The backend can run generation and language tasks without KB retrieval. A Knowledge Base becomes important when answers must be grounded in client documents, product docs, policies, support content or a curated public knowledge source.

Why not call model APIs directly from JavaScript?

Because browser-side model keys are not safe, public endpoints need abuse controls, and RAG requires backend orchestration. The backend keeps provider access, retrieval, guardrails, logs and cost controls behind an AWS service boundary.

Run WordPress AI where the trust boundary belongs.

Use local AI when it is enough, and deploy a customer-owned AWS backend when your WordPress site needs governed model access, RAG, citations and static-friendly frontend AI.