LLM API Latency: OpenRouter vs Bedrock

Large Language Models (LLMs) are now integral to modern software applications, powering tasks such as summarization, code generation, and technical explanations. When evaluating multiple LLM APIs, latency, response quality, and consistency are critical. Today, I share a detailed analysis of latency comparison between OpenRouter and Bedrock, along with methodology, visualization, and insights.

Experiment Overview

The primary objective of this experiment was to measure and compare response latency for OpenRouter and Bedrock across multiple prompts. The experiment was designed to capture not only the speed of each API but also its consistency across repeated queries.

Prompts Used

Three representative prompts were chosen for the comparison:

Explain Kubernetes in simple terms for a beginner.
Write a Python function to reverse a linked list.
Summarize the book Atomic Habits in three sentences.

Each prompt was sent five times to each API to generate multiple latency measurements for statistical analysis.

Data Collection Methodology

The latency comparison was conducted using Python, with the following approach:

OpenRouter API Calls:
- Sent HTTP POST requests to the OpenRouter API with the prompt, specifying the gpt-oss-20b model.
- Measured start and end timestamps to calculate latency.
- Extracted the text response from the API JSON payload.
AWS Bedrock API Calls:
- Used the boto3 client to invoke the Bedrock model openai.gpt-oss-20b-1.
- Sent the prompt in the OpenAI-style chat format.
- Measured latency from request initiation to response.
- Extracted the returned text from the API payload.
Data Storage:
- Each query stored the following fields: prompt, repeat number, OpenRouter response, OpenRouter latency, Bedrock response, and Bedrock latency.
- All results were saved into a CSV file (llm_comparison.csv) for analysis and visualization.

This setup ensured a repeatable and reliable dataset for performance analysis and comparison.

Here is a condensed snippet showing the main idea of the comparison script:

for prompt in prompts:
    for i in range(REPEATS):
        or_text, or_time = call_openrouter(prompt)
        print(f"OpenRouter [{i+1}/{REPEATS}] Latency: {or_time:.2f}s")
        br_text, br_time = call_bedrock(prompt)
        print(f"Bedrock   [{i+1}/{REPEATS}] Latency: {br_time:.2f}s")
        data_rows.append({
            "prompt": prompt,
            "repeat": i+1,
            "openrouter_response": or_text,
            "openrouter_latency": or_time,
            "bedrock_response": br_text,
            "bedrock_latency": br_time
        })

This allowed me to build a structured dataset with both responses and latencies for each prompt and repeat.

Latency Analysis

Using the CSV data, we conducted both statistical and visual analysis to compare the APIs.

OpenRouter Latency

Minimum Latency: 2.32 seconds
Maximum Latency: 7.28 seconds
Average Latency: Approximately 4.60 seconds
Observation: OpenRouter exhibited higher variability, particularly for repeated technical explanation prompts.

Bedrock Latency

Minimum Latency: 2.00 seconds
Maximum Latency: 3.24 seconds
Average Latency: Approximately 3.05 seconds
Observation: Bedrock was consistently faster and more stable across repeats and prompt types.

Prompt-Specific Patterns

Kubernetes Explanation: Bedrock consistently responded under 3 seconds, while OpenRouter spiked to over 7 seconds in one repeat.
Python Code Reversal: Both APIs performed similarly in early repeats, but Bedrock remained slightly faster.
Book Summarization: Bedrock maintained both speed and stability, whereas OpenRouter showed variability in later repeats.

Visualization Approach

To better understand latency differences, the following visualizations were created:

Boxplot: Shows overall latency distribution for each API, highlighting median, quartiles, and outliers.
Lineplot Per Prompt: Displays latency across repeats for each prompt, revealing consistency and spikes.

These visualizations make trends immediately clear, allowing developers to make informed choices between APIs.

Python Script for Plotting

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("llm_latency_comparison.csv")

print(df[['openrouter_latency', 'bedrock_latency']].describe())

plt.figure(figsize=(12,6))
sns.boxplot(data=df[['openrouter_latency', 'bedrock_latency']])
plt.title("Latency Comparison: OpenRouter vs Bedrock")
plt.ylabel("Latency (seconds)")
plt.show()

plt.figure(figsize=(14,6))
for prompt in df['prompt'].unique():
    prompt_data = df[df['prompt'] == prompt]
    sns.lineplot(x='repeat', y='openrouter_latency', data=prompt_data, label=f'OpenRouter: {prompt}', marker='o')
    sns.lineplot(x='repeat', y='bedrock_latency', data=prompt_data, label=f'Bedrock: {prompt}', marker='o')
plt.title("Latency Trends Per Prompt Repeat")
plt.xlabel("Repeat Number")
plt.ylabel("Latency (seconds)")
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

We obtained the following from the visualtion:

Insights From the Data

OpenRouter Latency Observations:
- Minimum Latency: 2.32 seconds
- Maximum Latency: 7.28 seconds
- Average Latency: Approximately 4.60 seconds
- Variability: Significant across different prompts and repeats, indicating inconsistent performance under certain queries.
Bedrock Latency Observations:
- Minimum Latency: 2.00 seconds
- Maximum Latency: 3.24 seconds
- Average Latency: Approximately 3.05 seconds
- Variability: Much lower than OpenRouter, indicating more consistent performance.
Prompt-Specific Trends:
- For Kubernetes explanation prompts, OpenRouter latency increased up to 7.28 seconds in the fourth repeat, while Bedrock remained under 3 seconds.
- For code generation prompts, both APIs performed similarly in early repeats, but Bedrock consistently had faster responses.
- For book summarization, Bedrock was faster and more stable, with lower standard deviation.

Takeaways

Consistency Matters: Bedrock is more predictable, making it preferable for real-time applications.
Measure Repeats: Single API calls can be misleading; repeated measurements reveal stability.
Latency vs. Prompt Complexity: Certain prompts can trigger spikes in OpenRouter latency, which developers should consider for production workloads.
Data-Driven Decision Making: Structured data collection enables informed API selection.

Conclusion

This experiment shows that Bedrock provides lower and more consistent latency across prompts and repeated queries compared to OpenRouter. Collecting and visualizing latency not only reveals performance differences but also helps developers make informed choices about which API to integrate for production systems.

By sharing both the data collection and visualization workflow, I hope to provide a practical template for evaluating LLM APIs for real-world projects.

Visualizing Latency Comparisons Between LLM APIs: OpenRouter vs Bedrock

Experiment Overview

Prompts Used

Data Collection Methodology

Latency Analysis

OpenRouter Latency

Bedrock Latency

Prompt-Specific Patterns

Visualization Approach

Python Script for Plotting

Insights From the Data

Takeaways

Conclusion

More from this blog

When SSL Lies: Debugging PostgreSQL “server does not support SSL” in Kubernetes

A Real World Journey Building on Tencent Cloud

Lessons Learned Building a CI Pipeline That Auto-Tags and Deploys Docker Images

What I Learned Migrating a Real App from Docker Compose to Kubernetes

Running Apache Flink on Kubernetes: From Zero to a Fully Utilized Cluster

Command Palette

Experiment Overview

Prompts Used

Data Collection Methodology

Latency Analysis

OpenRouter Latency

Bedrock Latency

Prompt-Specific Patterns

Visualization Approach

Python Script for Plotting

Insights From the Data

Takeaways

Conclusion

More from this blog