← Back to InHouse America
Research Paper · May 2026 · Case Study 5.11.26

The AI Brain: How a Multi-Stage Language Engine Understands What Shoppers Mean

An empirical study of the natural-language understanding system behind InHouse America's search bar — its architecture, intent resolution, conversational memory, and the design choices that let it answer messy human queries with precision.

Authors: InHouse America Research Published: May 11, 2026 Version: v1.0 5 min Read

Abstract

The AI Brain is the natural-language understanding (NLU) engine behind every search on InHouse America. Where a traditional search bar performs lexical matching against a product index, the AI Brain treats each query as a linguistic event — parsing intent, resolving ambiguity, binding context from prior turns, and routing the resolved meaning to the correct downstream subsystem (catalog search, Pricing Feature, content lookup, or guided navigation). Across 18,900 synthetic queries generated by our internal test harness, the Brain correctly classified user intent in 97.4% of cases, resolved follow-up references ("more like that," "cheaper ones") in 94.1%, and reduced zero-result results by 71% compared to a lexical baseline. This paper documents its architecture, design signals, and measured behavior.

97.4%
Intent classification accuracy
94.1%
Follow-up reference resolution
−71%
Zero-result sessions vs. baseline

1. Real product examples

Before the architecture, here is what the AI Brain actually looks like in production. Each example below is a real screenshot of the #SHOPsmall search bar taking a natural-language query, followed by the exact code path the Brain runs to parse the intent and route it to the right subsystem.

Example 1 · Ladies · Apparel
Single-category find
SHOPsmall search bar with the query 'ladies jeans'
ladies jeans

The simplest shape. The Brain locks the gender facet to ladies, identifies jeans as a single category token, and routes a clean Find intent at 0.99 confidence directly to catalog search — no memory binding required.

// 1. Parse
const intent = parseQuery("ladies jeans");
// → { intent:"find", gender:"ladies", items:["jeans"], confidence:0.991 }

// 2. Route
return brain.route(intent).dispatch(intent, session);
// → catalog_search({ gender:"ladies", category:"jeans" })
Example 2 · Men's · Footwear
Single-category find, gender flip
SHOPsmall search bar with the query 'mens sandals'
mens sandals

Same shape as Example 1 with the gender facet flipped. The catalog is denser in Men's footwear, so the same code path returns a larger candidate set with no change to the parser or router.

const intent = parseQuery("mens sandals");
// → { intent:"find", gender:"mens", items:["sandals"], confidence:0.993 }

const results = await brain.route(intent).dispatch(intent, session);
// → median 22 SKUs returned, latency 96 ms
Example 3 · Ladies · Accessories
Compound find with conjunction
SHOPsmall search bar with the query 'ladies jewelry and ladies bags'
ladies jewelry and ladies bags

The Brain recognizes the conjunction and, splits the query into two product intents, and unions the candidate sets. The duplicate-gender token is collapsed during normalization so the gender facet is set once.

const intent = parseQuery("ladies jewelry and ladies bags");
// → { intent:"find", gender:"ladies", items:["jewelry","bags"], confidence:0.978 }

// Each category is searched independently, then merged
const pools = await Promise.all(
  intent.items.map(cat =>
    db.products.find({ gender: intent.gender, category: cat })
  )
);

return rank(pools.flat(), intent).slice(0, 24);
Example 4 · Men's · Apparel + footwear
Compound find across categories
SHOPsmall search bar with the query 'mens suits and mens loafers'
mens suits and mens loafers

Two unrelated categories share one query. The Brain doesn't try to interpret intent across the conjunction — both candidate pools are retrieved against the same gender facet and merged, with the ranker preserving distinctness via a duplicate penalty.

const intent = parseQuery("mens suits and mens loafers");
// → { intent:"find", gender:"mens", items:["suits","loafers"], confidence:0.982 }

const candidates = await db.products.findMany({
  gender: intent.gender,
  categories: intent.items,        // ["suits","loafers"]
});

return rank(candidates, intent, { duplicate_penalty: 0.4 });
Example 5 · Men's · Wardrobe build
List-style query with four categories
SHOPsmall search bar with the query 'mens tops, mens shorts, mens jackets, mens jeans'
mens tops, mens shorts, mens jackets, mens jeans

Comma-delimited lists are the hardest typed shape — they trigger the Brain's list parser, which keeps each item as its own group and asks the ranker to return a balanced result set rather than letting one dense category dominate.

const intent = parseQuery("mens tops, mens shorts, mens jackets, mens jeans");
// → { intent:"find_list", gender:"mens",
//     items:["tops","shorts","jackets","jeans"], confidence:0.964 }

const grouped = await Promise.all(
  intent.items.map(cat =>
    db.products.find({ gender: intent.gender, category: cat, limit: 6 })
  )
);

// Balanced merge — 6 SKUs per category, 24 total
return interleave(grouped);
Example 6 · Ladies · Wardrobe build
List-style query with four categories
SHOPsmall search bar with the query 'ladies skirts, ladies dresses, ladies shorts and ladies jackets'
ladies skirts, ladies dresses, ladies shorts and ladies jackets

Same list shape as Example 5, but the final item is joined by and instead of a comma. The Brain's normalizer rewrites mixed delimiters into a uniform list before the parser sees them, so the resolved AST is identical in structure.

// Normalizer rewrites "a, b, c and d" → "a, b, c, d"
const intent = parseQuery("ladies skirts, ladies dresses, ladies shorts and ladies jackets");
// → { intent:"find_list", gender:"ladies",
//     items:["skirts","dresses","shorts","jackets"], confidence:0.971 }

return brain.route(intent).dispatch(intent, session);
// → grouped result set, 6 SKUs per category
Example 7 · Mixed gender · Personal care · Budget
Compound find with a price ceiling
SHOPsmall search bar with the query 'ladies deodorant, mens deodorant under $20'
ladies deodorant, mens deodorant under $20

The hardest example in the set. Two genders, one category, and a budget all in one breath. The Brain emits two groups (one per gender), attaches a shared ceiling price band of [0, 20], and routes the resolved intent through the Pricing Feature first, then back into catalog search.

const intent = parseQuery("ladies deodorant, mens deodorant under $20");
// → {
//     intent: "budget_search",
//     confidence: 0.971,
//     groups: [
//       { gender:"ladies", category:"deodorant" },
//       { gender:"mens",   category:"deodorant" }
//     ],
//     price: { mode:"ceiling", value:20, currency:"USD" }
//   }

// Route: Pricing Feature owns the price band, catalog owns the slots
const route = brain.route(intent);
// → { primary:"pricing_feature", secondary:"catalog_search", merge:"group_by_segment" }

return route.dispatch(intent, session);

Everything that follows in this paper — the architecture, intent taxonomy, memory model, and accuracy results — describes what happens inside brain after the user presses enter on one of the screenshots above.

2. Why the AI Brain exists

Real shoppers don't speak in keywords. They speak in fragments, comparisons, and follow-ups: "something cheaper," "the blue one," "ladies version," "what about under twenty." A lexical search bar sees these as noise. The AI Brain sees them as structured intent over a shared context.

Three observations from our internal test scenarios (May 1 – May 10, 2026) drove the design:

  1. 43% of generated typed queries were under three words, and 28% contained a pronoun or comparative phrase that only made sense given the previous turn.
  2. 61% of generated voice queries used colloquial phrasing ("show me cheap ones," "got anything for my dad") that traditional indexes failed to match.
  3. Scenarios where the engine successfully resolved a follow-up projected a 1.9× conversion lift over scenarios where the user had to re-type the full query.

The AI Brain exists so that a shopper can think out loud and still land on the right product.

3. Architecture

The Brain runs as a four-stage pipeline. Each stage is independently observable and replaceable, which is what allows the system to evolve without regressing prior behavior.

2.1 Pipeline

raw query ─► [1] Normalize ─► [2] Parse ─► [3] Resolve ─► [4] Route ─► subsystem
                  │              │            │             │
                  │              │            │             └─ catalog | pricing | content | guide
                  │              │            └─ context binding, coreference, slot fill
                  │              └─ tokens, entities, modifiers, intent candidates
                  └─ casing, unicode, voice→text fixups, profanity scrub

2.2 Stage detail

4. Intent resolution

The Brain recognizes nine top-level intents. Each carries a confidence score; below 0.55 the router escalates to a clarifying suggestion rather than guessing.

IntentExample queryRouted to
Find product"navy crewneck"Catalog search
Budget search"under twenty dollars"Pricing Feature
Compare"cheaper ones," "smaller size"Catalog ranker (re-rank)
Refine"in blue," "long sleeve"Catalog ranker (filter)
Recommend"something for my dad"Recommender
Locate"where's my order"Account / orders
How-to"how do returns work"Content lookup
Greeting / chitchat"hi," "thanks"Acknowledge, no search
Out-of-scope"what's the weather"Polite refusal

5. Conversational memory

Each session carries a short-lived memory object: the last category, last price tier, last comparator, and last result set. The memory is bounded (oldest entries decay after five turns) and is the substrate that makes follow-ups work.

memory = {
  category: "ladies-shorts",
  price_tier: { mode: "ceiling", value: 30 },
  last_results: [sku_a, sku_b, sku_c, ...],
  comparator: null,
  turn: 4
}

When the user types "cheaper ones," the resolver consults memory.price_tier, lowers the ceiling by one tier (e.g. $30 → $20), and re-issues the query against the same category. No re-typing required.

6. Methodology

We evaluated the Brain on 18,900 synthetic queries produced by our internal test harness, covering scenarios constructed between May 1 and May 10, 2026. Queries were stratified across typed (62%), voice (29%), and follow-up (9%) inputs. Three metrics were measured:

An automated regression suite re-validated 1,400 randomly drawn scenarios end-to-end against the harness's reference outputs to confirm machine scores.

7. Results & accuracy

Figure 1. Intent classification accuracy by intent type. The Brain exceeds 95% on every intent except Recommend, where the long tail of casual phrasing ("something nice for my mom") remains the open frontier.
Figure 2. Follow-up reference resolution by reference type. Pronouns and price comparatives are the strongest categories thanks to direct memory binding; vague comparatives ("better ones") still rely on the recommender.
Figure 3. Zero-result rate by query channel. The Brain reduces zero-result sessions across all channels, with the largest gain on voice queries where colloquial phrasing previously failed lexical matching.
Figure 4. End-to-end latency distribution. The full pipeline returns a routed result in a median of 118 ms; the 95th percentile is 240 ms, well under the 400 ms perception threshold for "instant."
Figure 5. Test-harness answer quality versus the Brain's confidence score. Quality tracks confidence almost linearly, validating that the confidence signal is calibrated and safe to act on (e.g. for clarification escalation).

7.1 Summary table

ChannelQueriesIntent acc.Follow-up res.Zero-resultQuality
Typed11,72098.0% strong95.2%2.1%4.6 / 5
Voice5,48096.1% strong92.4%3.4%4.4 / 5
Follow-up1,70097.6% strong94.1%1.8%4.5 / 5
Overall18,90097.4%94.1%2.5%4.5 / 5
"The shopper isn't searching a database. They're having a conversation. The Brain's job is to make sure both sides remember what was just said."

8. Limitations

9. Conclusion

The AI Brain reframes search as a conversation rather than a lookup. By separating understanding from retrieval, the system can interpret fragments, follow-ups, and colloquialisms with measured accuracy above 97% — and route each resolved intent to the subsystem best able to answer it. The result is a search bar that feels less like a query box and more like a knowledgeable salesperson on the floor. Future work extends the Brain to multilingual parsing, longer memory horizons, and a learned router trained on the labeled scenarios produced for this paper.


© 2026 InHouse America Research. AI Brain v5.11.26. For inquiries: legal@inhouseamerica.com.