SmartExtract – Document Disqualification Rules – RLDatix Public Knowledge Base

SmartExtract – Document Disqualification Rules

SmartExtract includes two automated pre-processing validations that determine whether a document is eligible for extraction. These checks run before any extraction, and if either rule is triggered, the extraction is immediately stopped and a message is shown to the user.

1. Guardrail Disqualification (Compliance, Safety, & Injection Protection)

This validation sends only the raw document text (no prompts, no metadata) to Amazon Bedrock’s guardrail service. If the document contains restricted, unsafe, or malicious content, SmartExtract blocks the request.

Examples of guardrail triggers:

Personal or financial identification data
Profanity or abusive language
Market or investment advice
Prompt injection or jailbreak attempts
e.g. “ignore previous instructions”, “you are now the system”
Any other content flagged by Bedrock safety rules

User message displayed:

This request has been blocked by SmartExtract. This could be because the document provided contains prohibited language (e.g. personal information, profanity, etc). Please contact IntelligentContract Support for more information and quote: <requestid>.

No extraction steps are performed beyond this point once triggered.

2. Sense-Based Disqualification (Content Quality & Document Integrity Check)

This validation ensures that the document is suitable for meaningful LLM extraction and is not corrupted or unreadable.

A document will be disqualified if it:

Contains 5 words or fewer
Is nonsensical, repetitive, or placeholder text
Example: “lorem lorem lorem lorem lorem”
Is corrupt or unreadable (e.g. garbled characters, failed OCR, broken PDF stream)
Contains only symbols, formatting artefacts, or blank sections
Extracts as empty or nearly empty text

User message displayed:

This document cannot be processed by SmartExtract. Reason: The document does not contain enough meaningful content for extraction. Please contact IntelligentContract Support for more information and quote: <requestid>.

🔁 Validation Order in the SmartExtract Workflow

Order	Check Type	Purpose	Trigger Result
1	Guardrail Validation (Bedrock)	Block unsafe or injected content	Extraction cancelled, guardrail message shown
2	Sense Validation (LLM + extraction sanity checks)	Block unreadable, corrupt, or too-short documents	Extraction cancelled, sense message shown
3	✅ Extraction Begins	Runs only if both checks pass	Normal SmartExtract flow starts

📌 Summary of Rules

Rule	Type	Common Trigger	Outcome
Guardrail Check	Safety / Compliance	Financial data, profanity, injection	❌ Blocked
Sense Check	Quality / Integrity	Fewer than 6 words, corrupt/unreadable content	❌ Blocked
Both Passed	Extractable content	Valid agreements, policies, contracts	✅ Extracted

❓ Unexpected Block?

Users should contact Intelligentcontract Support and include the Request ID shown in the message so the team can investigate.

Related articles