OCR Is Not Enough: Why I Combine AWS Textract with Bedrock for Identity Document Extraction

DevOps & Cloud Engineer — building scalable, automated, and intelligent systems. Developer of sorts | Automator | Innovator
Processing identity documents seems simple at first glance:
“Just scan the ID and get the fields out.”
But the real world isn’t that clean.
Different states, countries, printers, camera angles, lighting conditions, fonts, abbreviations, and layouts make identity extraction deeply inconsistent. Even if the text is readable, the meaning isn’t always clear.
And this is where many teams trip up:
They use OCR thinking they'll get structured data
But OCR only provides text, not understanding
This is the moment I learned:
OCR and LLMs solve two very different problems.
And together, they solve the whole problem.
The Core Difference: Textract vs Bedrock
Let’s define them properly:
| Feature | AWS Textract | AWS Bedrock LLMs (Claude, Mistral, etc.) |
| What it does | OCR + form extraction | Language understanding + reasoning |
| Input | Images / PDFs | Text (your extracted fields) |
| Output | Text + Key/Value pairs | Clean structured JSON / normalized data |
| Strength | Detecting text spatially | Interpreting and standardizing meaning |
| Underlying tech | Vision ML models + document layout analysis | Large Language Models (transformers) |
How Textract Works Internally
Textract uses:
Computer vision models
Character segmentation + recognition
Document layout graph analysis
It literally looks at pixels.
It can confidently tell:
This piece of text is located here → likely a label
This piece of text is aligned next to it → likely the value
But Textract does not interpret what the text means.
For example, Textract does not know that:
DOB, Birthdate, D.O.B., Date of Birth → all mean the same thing
How Bedrock Models Work Differently
Models like Claude or Mistral inside Bedrock are:
Trained on massive text corpora
Context-aware
Pattern-recognizing
Language-understanding models
They can answer:
"12-01-95" → Is this DD-MM-YY or MM-DD-YY?
"DOE JOHN" → Likely means "John Doe"
"LIC NO", "DL NO", "ID#": → All refer to driver's license numbers
They infer meaning, not just characters.
Why You Need Both
Textract = Reads the document
Bedrock = Understands the document
If Textract is the eyes,
then Bedrock is the brain.
One without the other leads to either:
Beautifully extracted but unusable fields
Or intelligent reasoning but no text to reason over
Architecture Overview
When building intelligent document processing workflows on AWS, Textract and Bedrock play two different but complementary roles.
| Component | Responsibility | Example Outcome |
| Amazon Textract | Extracts text exactly as it appears in the document (OCR). | Detects fields like Name, Address, ID Number. |
| Amazon Bedrock (LLMs) | Understands meaning, cleans data, infers missing values, converts to structured formats. | Normalizes the data into JSON, corrects spelling, interprets ambiguous fields. |

Full Working Code (Textract + Bedrock)
import boto3, json, os
AWS_REGION = "us-east-1"
MODEL_ID = "<MY MODEL ID>"
textract = boto3.client("textract", region_name=AWS_REGION)
bedrock = boto3.client("bedrock-runtime", region_name=AWS_REGION)
def extract_text_fields(image_path):
with open(image_path, "rb") as f:
doc_bytes = f.read()
result = textract.analyze_document(
Document={"Bytes": doc_bytes},
FeatureTypes=["FORMS"]
)
kv_pairs = {}
blocks = {b["Id"]: b for b in result["Blocks"]}
for block in result["Blocks"]:
if block["BlockType"] == "KEY_VALUE_SET" and "KEY" in block.get("EntityTypes", []):
key = extract_text(block, blocks)
val_id = next(
(rel["Ids"][0] for rel in block.get("Relationships", []) if rel["Type"] == "VALUE"),
None,
)
val = extract_text(blocks.get(val_id), blocks) if val_id else ""
kv_pairs[key] = val
return kv_pairs
def extract_text(block, blocks):
text = []
for rel in block.get("Relationships", []):
if rel["Type"] == "CHILD":
for cid in rel["Ids"]:
w = blocks[cid]
if w["BlockType"] == "WORD":
text.append(w["Text"])
return " ".join(text)
def normalize_with_bedrock(kv_pairs):
prompt = f"""
Convert the following key-value pairs from a driver's license into structured JSON:
{json.dumps(kv_pairs, indent=2)}
Return JSON with keys:
name, DL_number, date_of_birth, issue_date, expiration_date, sex, height, weight, eye_color, donor_status, address.
Return only the JSON.
"""
response = bedrock.converse(
modelId=MODEL_ID,
messages=[{"role": "user", "content": [{"text": prompt}]}],
inferenceConfig={"maxTokens": 500, "temperature": 0}
)
return json.loads(response["output"]["message"]["content"][0]["text"])
if __name__ == "__main__":
kv = extract_text_fields("driver_license.jpg")
structured = normalize_with_bedrock(kv)
print(json.dumps(structured, indent=2))
Here:
- Textract extracts raw key/value fields (
Name → JOHN DOE, etc.)
- Bedrock LLM converts them into clean, consistent JSON
Before & After Example
Raw Textract Output
{
"Name": "DOE JOHN",
"DOB": "12.01.95",
"DL No.": "D1234567",
"Eyes": "BRN"
}
Bedrock-Normalized Output
{
"name": "John Doe",
"DL_number": "D1234567",
"date_of_birth": "1995-01-12",
"eye_color": "Brown"
}
Same data. Completely different usability.
Takeaway
| Task | Best Tool | Why |
| Read text from image | Textract | Computer vision + layout analysis |
| Turn text into structured meaning | Bedrock LLM | Language reasoning + normalization |
You don’t replace Textract with Bedrock.
You pair them.
Conclusion
So yes, OCR works.
But OCR alone doesn’t understand.
And in identity workflows:
Understanding is everything.






