Why Language Models struggle with Numbers and Modifiers

February 14, 2025

Why Language Models Struggle with Numbers and Modifiers: A Case Study in Ambiguity

Introduction

Imagine asking an AI tool for the "cheapest 3 HDMI docking station," expecting a single device with three HDMI ports. Instead, it lists three separate docking stations ranked by price. This mismatch isn’t a flaw in logic—it’s a window into how language models (LLMs) parse ambiguity, and why numbers and modifiers often trip them up.

This post explores the linguistic challenges behind such errors, using the example above to unpack why even advanced LLMs struggle with seemingly simple requests.

The Problem: A Tale of Two Interpretations

The query “cheapest 3 HDMI docking station” is syntactically ambiguous. Humans intuitively resolve this by applying real-world knowledge (e.g., docking stations rarely come in bundles of three). LLMs, however, rely on statistical patterns in language, leading to two conflicting parses:

  1. Three Products: “Show me the 3 cheapest HDMI docking stations.”
  2. One Product with Three Ports: “Show me the cheapest docking station with 3 HDMI ports.”

Why does the first interpretation dominate? Let’s break it down.

Why Language Models Get Stuck

1. Syntactic Ambiguity: The Modifier Trap

Language models struggle with modifier scope—determining which words a number or adjective modifies. In English, numbers often precede nouns to indicate quantity (e.g., “3 laptops”), so “3 HDMI” is initially parsed as “3 products with HDMI.” Missing prepositions (e.g., “with 3 HDMI ports”) exacerbate the confusion.

Example:

  • Human intuition: “3 HDMI” → a feature of the product.
  • LLM default: “3 HDMI docking stations” → three products.

2. Training Data Bias

LLMs learn from vast datasets dominated by everyday language. In e-commerce contexts, phrases like “cheapest 3 laptops” overwhelmingly refer to quantity, not features. Without explicit training on technical specs (e.g., “3 HDMI ports”), models default to the most frequent interpretation.

The Data Gap:

  • Common pattern: “3 [product]” = quantity.
  • Rare pattern: “3 [feature]” = spec (e.g., “3 HDMI ports”).

3. The Limits of Compositional Understanding

LLMs excel at stitching together words based on statistical co-occurrence but falter at compositional reasoning—combining modifiers (e.g., “cheapest”) and numbers in novel ways. For instance:

  • “Cheapest” (superlative) + “3 HDMI” requires the model to prioritize feature specificity over quantity, a step that demands deeper logic.

4. Missing World Knowledge

Humans know docking stations often have multiple ports; LLMs lack this commonsense intuition unless explicitly trained on product specs. Without grounding in domain-specific knowledge, numbers remain abstract modifiers.

The Path Forward: Improving LLM Robustness

1. Disambiguate with Structure

  • User-side: Encourage precise phrasing (e.g., “cheapest docking station with 3 HDMI ports”).
  • Model-side: Train LLMs to recognize feature-focused numbers (e.g., “3x HDMI” or “3-port”).

2. Domain-Specific Fine-Tuning

Expose models to technical language (e.g., product descriptions, spec sheets) where numbers describe features, not quantities. This helps reweight their priors for niche queries.

3. Syntax-Aware Parsing

Enhance models to prioritize modifier-noun relationships. For example:

  • Treat “3 HDMI” as a compound adjective (like “4K display”) rather than a quantity.

4. Interactive Clarification

Allow LLMs to ask follow-up questions (e.g., “Do you want 3 products or a product with 3 ports?”). This mimics human dialogue, reducing ambiguity.

Key Takeaways

  • Ambiguity is inevitable: Natural language is messy, and numbers amplify confusion.
  • Bias isn’t just societal: Training data distributions create “default” interpretations that don’t always align with user intent.
  • Progress is possible: Better parsing, domain adaptation, and interactive design can bridge the gap.

Conclusion

The “cheapest 3 HDMI docking station” quandary isn’t just a quirky bug—it’s a microcosm of the challenges LLMs face in resolving ambiguity. By combining linguistic insights, domain-specific training, and smarter interaction design, we can guide models toward human-like precision. Until then, a well-placed preposition (“with”) might just be your best search hack.

Call to Action
Next time you query an LLM, ask yourself: Could this be misinterpreted? A small tweak in phrasing might save you a page of irrelevant results—and teach the AI a little more about our world.

menu