OCR Is Not Enough: Why I Combine AWS Textract with Bedrock for Identity Document Extraction

Processing identity documents seems simple at first glance:

“Just scan the ID and get the fields out.”

But the real world isn’t that clean.

Different states, countries, printers, camera angles, lighting conditions, fonts, abbreviations, and layouts make identity extraction deeply inconsistent. Even if the text is readable, the meaning isn’t always clear.

And this is where many teams trip up:

They use OCR thinking they'll get structured data
But OCR only provides text, not understanding

This is the moment I learned:

OCR and LLMs solve two very different problems.
And together, they solve the whole problem.

The Core Difference: Textract vs Bedrock

Let’s define them properly:

Feature	AWS Textract	AWS Bedrock LLMs (Claude, Mistral, etc.)
What it does	OCR + form extraction	Language understanding + reasoning
Input	Images / PDFs	Text (your extracted fields)
Output	Text + Key/Value pairs	Clean structured JSON / normalized data
Strength	Detecting text spatially	Interpreting and standardizing meaning
Underlying tech	Vision ML models + document layout analysis	Large Language Models (transformers)

How Textract Works Internally

Textract uses:

Computer vision models
Character segmentation + recognition
Document layout graph analysis

It literally looks at pixels.

It can confidently tell:

This piece of text is located here → likely a label
This piece of text is aligned next to it → likely the value

But Textract does not interpret what the text means.

For example, Textract does not know that:

DOB, Birthdate, D.O.B., Date of Birth → all mean the same thing

How Bedrock Models Work Differently

Models like Claude or Mistral inside Bedrock are:

Trained on massive text corpora
Context-aware
Pattern-recognizing
Language-understanding models

They can answer:

"12-01-95" → Is this DD-MM-YY or MM-DD-YY?
"DOE JOHN" → Likely means "John Doe"
"LIC NO", "DL NO", "ID#": → All refer to driver's license numbers

They infer meaning, not just characters.

Why You Need Both

Textract = Reads the document
Bedrock = Understands the document

If Textract is the eyes,
then Bedrock is the brain.

One without the other leads to either:

Beautifully extracted but unusable fields
Or intelligent reasoning but no text to reason over

Architecture Overview

When building intelligent document processing workflows on AWS, Textract and Bedrock play two different but complementary roles.

Component	Responsibility	Example Outcome
Amazon Textract	Extracts text exactly as it appears in the document (OCR).	Detects fields like Name, Address, ID Number.
Amazon Bedrock (LLMs)	Understands meaning, cleans data, infers missing values, converts to structured formats.	Normalizes the data into JSON, corrects spelling, interprets ambiguous fields.

Full Working Code (Textract + Bedrock)

import boto3, json, os

AWS_REGION = "us-east-1"
MODEL_ID = "<MY MODEL ID>"

textract = boto3.client("textract", region_name=AWS_REGION)
bedrock = boto3.client("bedrock-runtime", region_name=AWS_REGION)

def extract_text_fields(image_path):
    with open(image_path, "rb") as f:
        doc_bytes = f.read()

    result = textract.analyze_document(
        Document={"Bytes": doc_bytes},
        FeatureTypes=["FORMS"]
    )

    kv_pairs = {}
    blocks = {b["Id"]: b for b in result["Blocks"]}

    for block in result["Blocks"]:
        if block["BlockType"] == "KEY_VALUE_SET" and "KEY" in block.get("EntityTypes", []):
            key = extract_text(block, blocks)
            val_id = next(
                (rel["Ids"][0] for rel in block.get("Relationships", []) if rel["Type"] == "VALUE"),
                None,
            )
            val = extract_text(blocks.get(val_id), blocks) if val_id else ""
            kv_pairs[key] = val

    return kv_pairs


def extract_text(block, blocks):
    text = []
    for rel in block.get("Relationships", []):
        if rel["Type"] == "CHILD":
            for cid in rel["Ids"]:
                w = blocks[cid]
                if w["BlockType"] == "WORD":
                    text.append(w["Text"])
    return " ".join(text)


def normalize_with_bedrock(kv_pairs):
    prompt = f"""
Convert the following key-value pairs from a driver's license into structured JSON:

{json.dumps(kv_pairs, indent=2)}

Return JSON with keys:
name, DL_number, date_of_birth, issue_date, expiration_date, sex, height, weight, eye_color, donor_status, address.
Return only the JSON.
"""

    response = bedrock.converse(
        modelId=MODEL_ID,
        messages=[{"role": "user", "content": [{"text": prompt}]}],
        inferenceConfig={"maxTokens": 500, "temperature": 0}
    )

    return json.loads(response["output"]["message"]["content"][0]["text"])


if __name__ == "__main__":
    kv = extract_text_fields("driver_license.jpg")
    structured = normalize_with_bedrock(kv)
    print(json.dumps(structured, indent=2))

Here:

Textract extracts raw key/value fields (Name → JOHN DOE, etc.)

Bedrock LLM converts them into clean, consistent JSON

Before & After Example

Raw Textract Output

{
  "Name": "DOE JOHN",
  "DOB": "12.01.95",
  "DL No.": "D1234567",
  "Eyes": "BRN"
}

Bedrock-Normalized Output

{
  "name": "John Doe",
  "DL_number": "D1234567",
  "date_of_birth": "1995-01-12",
  "eye_color": "Brown"
}

Same data. Completely different usability.

Takeaway

Task	Best Tool	Why
Read text from image	Textract	Computer vision + layout analysis
Turn text into structured meaning	Bedrock LLM	Language reasoning + normalization

You don’t replace Textract with Bedrock.
You pair them.

Conclusion

So yes, OCR works.
But OCR alone doesn’t understand.

And in identity workflows:

Understanding is everything.

OCR Is Not Enough: Why I Combine AWS Textract with Bedrock for Identity Document Extraction

The Core Difference: Textract vs Bedrock

How Textract Works Internally

How Bedrock Models Work Differently

Why You Need Both

Architecture Overview

Full Working Code (Textract + Bedrock)

Before & After Example

Raw Textract Output

Bedrock-Normalized Output

Takeaway

Conclusion

Comments

More from this blog

When SSL Lies: Debugging PostgreSQL “server does not support SSL” in Kubernetes

A Real World Journey Building on Tencent Cloud

Lessons Learned Building a CI Pipeline That Auto-Tags and Deploys Docker Images

What I Learned Migrating a Real App from Docker Compose to Kubernetes

Running Apache Flink on Kubernetes: From Zero to a Fully Utilized Cluster

Command Palette

The Core Difference: Textract vs Bedrock

How Textract Works Internally

How Bedrock Models Work Differently

Why You Need Both

Architecture Overview

Full Working Code (Textract + Bedrock)

Before & After Example

Raw Textract Output

Bedrock-Normalized Output

Takeaway

Conclusion

Comments

More from this blog