Skip to main content

Command Palette

Search for a command to run...

OCR Is Not Enough: Why I Combine AWS Textract with Bedrock for Identity Document Extraction

Updated
4 min read
OCR Is Not Enough: Why I Combine AWS Textract with Bedrock for Identity Document Extraction
S

DevOps & Cloud Engineer — building scalable, automated, and intelligent systems. Developer of sorts | Automator | Innovator

Processing identity documents seems simple at first glance:

“Just scan the ID and get the fields out.”

But the real world isn’t that clean.

Different states, countries, printers, camera angles, lighting conditions, fonts, abbreviations, and layouts make identity extraction deeply inconsistent. Even if the text is readable, the meaning isn’t always clear.

And this is where many teams trip up:

  • They use OCR thinking they'll get structured data

  • But OCR only provides text, not understanding

This is the moment I learned:

OCR and LLMs solve two very different problems.
And together, they solve the whole problem.


The Core Difference: Textract vs Bedrock

Let’s define them properly:

FeatureAWS TextractAWS Bedrock LLMs (Claude, Mistral, etc.)
What it doesOCR + form extractionLanguage understanding + reasoning
InputImages / PDFsText (your extracted fields)
OutputText + Key/Value pairsClean structured JSON / normalized data
StrengthDetecting text spatiallyInterpreting and standardizing meaning
Underlying techVision ML models + document layout analysisLarge Language Models (transformers)

How Textract Works Internally

Textract uses:

  • Computer vision models

  • Character segmentation + recognition

  • Document layout graph analysis

It literally looks at pixels.

It can confidently tell:

This piece of text is located here → likely a label
This piece of text is aligned next to it → likely the value

But Textract does not interpret what the text means.

For example, Textract does not know that:

DOB, Birthdate, D.O.B., Date of Birth → all mean the same thing

How Bedrock Models Work Differently

Models like Claude or Mistral inside Bedrock are:

  • Trained on massive text corpora

  • Context-aware

  • Pattern-recognizing

  • Language-understanding models

They can answer:

"12-01-95" → Is this DD-MM-YY or MM-DD-YY?
"DOE JOHN" → Likely means "John Doe"
"LIC NO", "DL NO", "ID#": → All refer to driver's license numbers

They infer meaning, not just characters.


Why You Need Both

Textract = Reads the document
Bedrock = Understands the document

If Textract is the eyes,
then Bedrock is the brain.

One without the other leads to either:

  • Beautifully extracted but unusable fields

  • Or intelligent reasoning but no text to reason over


Architecture Overview

When building intelligent document processing workflows on AWS, Textract and Bedrock play two different but complementary roles.

ComponentResponsibilityExample Outcome
Amazon TextractExtracts text exactly as it appears in the document (OCR).Detects fields like Name, Address, ID Number.
Amazon Bedrock (LLMs)Understands meaning, cleans data, infers missing values, converts to structured formats.Normalizes the data into JSON, corrects spelling, interprets ambiguous fields.


Full Working Code (Textract + Bedrock)

import boto3, json, os

AWS_REGION = "us-east-1"
MODEL_ID = "<MY MODEL ID>"

textract = boto3.client("textract", region_name=AWS_REGION)
bedrock = boto3.client("bedrock-runtime", region_name=AWS_REGION)

def extract_text_fields(image_path):
    with open(image_path, "rb") as f:
        doc_bytes = f.read()

    result = textract.analyze_document(
        Document={"Bytes": doc_bytes},
        FeatureTypes=["FORMS"]
    )

    kv_pairs = {}
    blocks = {b["Id"]: b for b in result["Blocks"]}

    for block in result["Blocks"]:
        if block["BlockType"] == "KEY_VALUE_SET" and "KEY" in block.get("EntityTypes", []):
            key = extract_text(block, blocks)
            val_id = next(
                (rel["Ids"][0] for rel in block.get("Relationships", []) if rel["Type"] == "VALUE"),
                None,
            )
            val = extract_text(blocks.get(val_id), blocks) if val_id else ""
            kv_pairs[key] = val

    return kv_pairs


def extract_text(block, blocks):
    text = []
    for rel in block.get("Relationships", []):
        if rel["Type"] == "CHILD":
            for cid in rel["Ids"]:
                w = blocks[cid]
                if w["BlockType"] == "WORD":
                    text.append(w["Text"])
    return " ".join(text)


def normalize_with_bedrock(kv_pairs):
    prompt = f"""
Convert the following key-value pairs from a driver's license into structured JSON:

{json.dumps(kv_pairs, indent=2)}

Return JSON with keys:
name, DL_number, date_of_birth, issue_date, expiration_date, sex, height, weight, eye_color, donor_status, address.
Return only the JSON.
"""

    response = bedrock.converse(
        modelId=MODEL_ID,
        messages=[{"role": "user", "content": [{"text": prompt}]}],
        inferenceConfig={"maxTokens": 500, "temperature": 0}
    )

    return json.loads(response["output"]["message"]["content"][0]["text"])


if __name__ == "__main__":
    kv = extract_text_fields("driver_license.jpg")
    structured = normalize_with_bedrock(kv)
    print(json.dumps(structured, indent=2))

Here:

  • Textract extracts raw key/value fields (Name → JOHN DOE, etc.)
  • Bedrock LLM converts them into clean, consistent JSON

Before & After Example

Raw Textract Output

{
  "Name": "DOE JOHN",
  "DOB": "12.01.95",
  "DL No.": "D1234567",
  "Eyes": "BRN"
}

Bedrock-Normalized Output

{
  "name": "John Doe",
  "DL_number": "D1234567",
  "date_of_birth": "1995-01-12",
  "eye_color": "Brown"
}

Same data. Completely different usability.


Takeaway

TaskBest ToolWhy
Read text from imageTextractComputer vision + layout analysis
Turn text into structured meaningBedrock LLMLanguage reasoning + normalization

You don’t replace Textract with Bedrock.
You pair them.


Conclusion

So yes, OCR works.
But OCR alone doesn’t understand.

And in identity workflows:

Understanding is everything.

More from this blog

C

CodeOps Studies

39 posts

Simple write-ups on day to day code or devops experiments, tests etc.