SmartExtract – Document Disqualification Rules
SmartExtract includes two automated pre-processing validations that determine whether a document is eligible for extraction. These checks run before any extraction, and if either rule is triggered, the extraction is immediately stopped and a message is shown to the user.
1. Guardrail Disqualification (Compliance, Safety, & Injection Protection)
This validation sends only the raw document text (no prompts, no metadata) to Amazon Bedrock’s guardrail service. If the document contains restricted, unsafe, or malicious content, SmartExtract blocks the request.
Examples of guardrail triggers:
- Personal or financial identification data
- Profanity or abusive language
- Market or investment advice
- Prompt injection or jailbreak attempts
e.g. “ignore previous instructions”, “you are now the system” - Any other content flagged by Bedrock safety rules
User message displayed:
This request has been blocked by SmartExtract. This could be because the document provided contains prohibited language (e.g. personal information, profanity, etc). Please contact IntelligentContract Support for more information and quote: <requestid>.
No extraction steps are performed beyond this point once triggered.
2. Sense-Based Disqualification (Content Quality & Document Integrity Check)
This validation ensures that the document is suitable for meaningful LLM extraction and is not corrupted or unreadable.
A document will be disqualified if it:
- Contains 5 words or fewer
- Is nonsensical, repetitive, or placeholder text
Example: “lorem lorem lorem lorem lorem” - Is corrupt or unreadable (e.g. garbled characters, failed OCR, broken PDF stream)
- Contains only symbols, formatting artefacts, or blank sections
- Extracts as empty or nearly empty text
User message displayed:
This document cannot be processed by SmartExtract. Reason: The document does not contain enough meaningful content for extraction. Please contact IntelligentContract Support for more information and quote: <requestid>.
🔁 Validation Order in the SmartExtract Workflow
| Order | Check Type | Purpose | Trigger Result |
|---|---|---|---|
| 1 | Guardrail Validation (Bedrock) | Block unsafe or injected content | Extraction cancelled, guardrail message shown |
| 2 | Sense Validation (LLM + extraction sanity checks) | Block unreadable, corrupt, or too-short documents | Extraction cancelled, sense message shown |
| 3 | ✅ Extraction Begins | Runs only if both checks pass | Normal SmartExtract flow starts |
📌 Summary of Rules
| Rule | Type | Common Trigger | Outcome |
|---|---|---|---|
| Guardrail Check | Safety / Compliance | Financial data, profanity, injection | ❌ Blocked |
| Sense Check | Quality / Integrity | Fewer than 6 words, corrupt/unreadable content | ❌ Blocked |
| Both Passed | Extractable content | Valid agreements, policies, contracts | ✅ Extracted |
❓ Unexpected Block?
Users should contact Intelligentcontract Support and include the Request ID shown in the message so the team can investigate.
Comments
0 comments
Please sign in to leave a comment.