← Back to The Intel

The Architecture of Taste: Engineering a Cold-Start Wine Recommender

Version 4.0 | Technical Documentation

This document provides the complete technical specification for Unwyned's recommendation engine. It includes mathematical formulations, psychophysical research citations, implementation details, and engineering rationale. For a simplified overview, see The Intel.

1. Introduction: The Digitization of Sensory Experience

Building a recommendation engine for wine presents a unique set of challenges that differ fundamentally from recommending movies, books, or electronics. While traditional recommender systems rely on Collaborative Filtering (predicting preference based on user similarity matrices), this approach fails in a "Cold-Start" scenario where a new user has zero interaction history. Furthermore, wine is a sensory product defined by chemical properties that do not map linearly to consumer language.

This document details the engineering architecture of our proprietary Cold-Start Recommendation Engine. It outlines how we solve the "Semantic Gap" utilizing a Center-Out Psychographic Quiz, handle the mathematical reality of "Impossible Profiles" via Manifold Projection, and utilize Adaptive Sigmoid Calibration to normalize scores across diverse user psychologies.

2. The Data Model: An 8-Dimensional Flavor Space

At the core of our system is a vector space representation of taste. Unlike simple tag-based systems (which treat "Fruity" as a binary boolean), we model both users (U) and wines (W) as continuous vectors in ℝ⁸.

Each dimension is normalized to a floating-point scale of [1.0, 5.0], representing the intensity of specific chemical compounds:

Dimension Scale (1.0 - 5.0) Enological Correlate
Body Watery → Viscous Alcohol by Volume (ABV), Glycerol, Dry Extract
Sweetness Bone Dry → Dessert Residual Sugar (g/L)
Acidity Flat → Tart pH, Total Acidity (Tartaric)
Tannin Silky → Astringent Phenolic content, Proanthocyanidins
Fruit Intensity Savory → Fruit-Bomb Ester concentration
Oak Steel → Heavy Oak Lactones, Vanillin (Time in Barrel)
Earthiness Clean → Funky Geosmin, Brettanomyces
Spice Mild → Peppery Rotundone concentration

3. The Input Protocol: Center-Out Psychographics

Directly asking users "Do you like high tannins?" yields noisy data due to terminology conflation (users often confuse "Dry" with "Bitter"). To solve this, we employ a Proxy Profiling system rooted in psychophysics (Vinotype theory, PROP taster status).

3.1 Vector Initialization Strategy

To prevent "Profile Saturation" (where users hit the 5.0 ceiling too easily), we utilize a Center-Out Scoring Model.

This ensures that a user only reaches an "Extreme" profile (e.g., Tannin=5.0) if they answer consistently across multiple correlated questions.

3.2 Phase I: The Primary Anchors (6 Questions)

These questions establish the broad direction of the user vector.

Question Proxy Target Dimension Logic & Vector Shift Psychophysical Basis
Coffee Style tannin Black → +1.2; Milk/Sugar → -1.2 Bitterness tolerance correlates with PROP sensitivity
Lemon/Sour acidity Love it → +1.2; Too sharp → -1.2 Acid-seeking behavior maps to low pH tolerance
Dessert sweetness Rich Cake → +1.2; Cheese/Savory → -1.2 Direct hedonic preference for sucrose
Texture body Skim Milk → -1.2; Heavy Cream → +1.2 Tactile sensitivity maps to viscosity/ABV
Steak Prep earth, fruit BBQ Glaze → Fruit+, Earth-; Mushrooms → Earth+1.2 Umami vs. Sweet preference splitter
Scents oak Vanilla/Toast → +1.2; Citrus/Clean → -1.2 Olfactory preference for lactones (oak)

3.3 Phase II: The Reinforcement Layer (2 Questions)

We triangulate "Risky" dimensions (Tannin and Acidity) where user self-reporting is least reliable.

3.4 Phase III: The Trade-Off Layer (The "Impossible" Fix)

To prevent the "All 5s" Paradox (a user who wants everything maxed out), we force a choice between conflicting chemical attributes.

Result: A user can reach a 5.0 in one dimension, but the trade-offs ensure they cannot be a 5.0 in all dimensions, keeping the vector within the realm of realistic wine chemistry.

4. Preprocessing: Manifold Projection

Even with trade-off questions, users may generate vectors that are chemically rare (e.g., High Acid + High Body + High Sugar). If we search for this vector [5, 5, 5, …] in our database, standard Euclidean distance measures will push the result away from all valid wines.

4.1 Solution: Manifold Projection

We define K=5 Archetype Centroids representing valid wine clusters (e.g., BoldRed, CrispWhite, Dessert). When a profile is flagged as "Extreme" (Magnitude > Threshold), we project it toward the nearest valid centroid:

Ucorrected = (1 - λ)Uraw + λCnearest

Parameter: λ = 0.3
Rationale: We shift the user 30% toward reality. This preserves their directional intent (e.g., "I want bold") while ensuring the search occurs within the valid chemical feature space.

5. Core Algorithm: Asymmetric Similarity

Standard distance metrics like Euclidean Distance assume symmetry: the penalty for a wine being "too bold" is the same as for it being "too light." In sensory science, this is false.

Penalty Multiplier = {
  1.4 if Δ > 0 (Overshoot/Deal Breaker)
  0.8 if Δ ≤ 0 (Undershoot/Safe Miss)
}

This ensures that "offending" the palate is penalized nearly twice as heavily as simply "boring" the palate.

Note: The overshoot penalty is further modulated by the user's Sensitivity scalar (S) for harshness dimensions. See Section 9.1 for details.

6. Scoring Philosophy: Match vs. Confidence

A critical decision in our UX architecture is the presentation of the recommendation score. We explicitly utilize a "Match Score" rather than a "Confidence Score" or "Predicted Rating."

6.1 Why "Match Score"?

6.2 Sigmoid Normalization

Raw Euclidean distance is unintuitive. We use a Logistic Sigmoid Transformation to map distance to a percentage.

Score = 1 / (1 + ek(d - d₀))

Where:

7. Calibration: Solving the "Individual Cutscore"

Users have different internal baselines for satisfaction. To avoid forcing users to manually set filters (e.g., "Only show me 90% matches"), we employ Adaptive Pivot Scaling.

We dynamically adjust the pivot point d₀ in the scoring formula based on the user's rating history.

This calibration ensures that "90% Match" always means "Strong Recommendation", regardless of whether the user is a critic or an enthusiast.

8. Dynamic Evolution: Bounded Asymptotic Learning

As users rate wines, their profile must evolve. We employ Bounded Asymptotic Learning to update the user vector U.

Unew = Uold + η · dampening · (W - Uold)

Saturation Dampening: To prevent "lock-in" at the extremes (1.0 or 5.0), we apply a dampening factor with a hard floor of 0.25. This ensures that even if a user is currently at an extreme (e.g., Tannin=5.0), consistent negative feedback will eventually pull them back toward the center.

9. User-Level Scalars: Sensitivity (S) and Exploration (E)

Beyond the 8-dimensional taste vector, each user possesses two additional scalar parameters that modulate how recommendations are generated and scored.

9.1 Sensitivity (S) ∈ [0, 1]

Purpose: Captures individual tolerance for sensory intensity in "harshness" dimensions (tannin, acidity, body, oak, spice). A sensitive user (S → 1) experiences bitter compounds more intensely than a tolerant user (S → 0).

Seeding from Quiz

S is initialized from the composite harshness profile derived from quiz answers:

Sinitial = clamp((tannin + acidity + body + oak + spice) / 5 - 2.5, 0, 1)

Users with above-average harshness preferences seed S ≈ 0.5; extreme preferences push toward 1.0.

Modulating Asymmetric Penalties

The base overshoot penalty (1.4×) is amplified for sensitive users on harshness dimensions:

effectivePenaltyd = 1.4 × (1 + αd × S)
Dimension α (Amplification Factor)
Tannin0.8
Acidity0.7
Body0.5
Oak0.4
Spice0.3
Other dimensions0.0

A user with S = 1 and a wine overshooting tannin by 1.0 pays: 1.4 × 1.8 = 2.52 penalty, compared to 1.4 for S = 0.

Learning S from Ratings

When a user rates a wine poorly and the wine overshoots their profile on harshness dimensions, S increases:

residual = (expectedRating - actualRating) / 4
Snew = clamp(Sold + γ × residual × weightedOvershoot, 0, 1)

Where γ = 0.08 is the learning rate (decays with confidence), and weightedOvershoot is the normalized overshoot across harshness dimensions.

9.2 Exploration (E) ∈ [0, 1]

Purpose: Controls willingness to venture beyond established preferences. High-E users receive "wildcard" recommendations from outside their comfort zone.

Seeding from Quiz

E is initialized from quiz answers probing adventure-seeking behavior:

Einitial = (adventure_answer + novel_food_answer) / 2

Normalized to [0, 1] where 0.5 represents moderate exploration appetite.

Wildcard Injection

The number of wildcards injected into recommendations is determined by:

effectiveE = E × (1 - 0.5 × S) (Sensitive users get fewer wildcards)
wildcardCount = round(effectiveE × 2)

Wildcards are selected from wines with:

Learning E from Wildcard Ratings

When users rate wildcards, E adjusts based on whether they exceeded the baseline expectation (60%):

r = (rating - 1) / 4 (Normalized to [0, 1])
Enew = clamp(Eold + γ × (r - 0.6) × noveltyFactor, 0, 1)

Where γ = 0.06, noveltyFactor = min(noveltyDistance / 3.0, 1.0), and ratings above 3★ (r = 0.5) push E up while 1-2★ ratings push E down.

9.3 S+E Interaction

Sensitivity suppresses exploration to protect sensitive users from potentially unpleasant wildcard experiences. The formula effectiveE = E × (1 - 0.5 × S) ensures:

Implementation Note

S and E are stored locally with confidence counters (0-30) that decay learning rates over time. This prevents early ratings from locking users into extreme values while still allowing gradual refinement.

References

  1. Cold-Start Problem in Recommender Systems. Schein et al., 2002.
  2. Vinotype & Sensory Segmentation. Hanni, 2012.
  3. Asymmetric Impact in Attribute Performance. Mikulic et al., 2008.
  4. Manifold Learning & Archetypes. Cutler et al., 1994.
  5. PROP Taster Status and Food Preference. Tepper et al., 2009.
  6. Chemical composition of wine. Waterhouse et al., 2016.

Works Cited

  1. Introduction to Recommender Systems - Oracle, accessed November 28, 2025, https://www.oracle.com/a/ocom/docs/oracle-ds-introduction-to-recommendation-engines.pdf
  2. Current Research Related to Wine Sensory Perception Since 2010 - MDPI, accessed November 28, 2025, https://www.mdpi.com/2306-5710/6/3/47
  3. When to use cosine simlarity over Euclidean similarity - Data Science Stack Exchange, accessed November 28, 2025, https://datascience.stackexchange.com/questions/27726/when-to-use-cosine-simlarity-over-euclidean-similarity
  4. Constraint Satisfaction Problem in AI - AlmaBetter, accessed November 28, 2025, https://www.almabetter.com/bytes/tutorials/artificial-intelligence/constraint-satisfaction-problem-in-ai
  5. Manifold Learning in Machine Learning | by Hey Amit - Medium, accessed November 28, 2025, https://medium.com/@heyamit10/manifold-learning-in-machine-learning-e008e480d036
  6. Nearest centroid classifier - Wikipedia, accessed November 28, 2025, https://en.wikipedia.org/wiki/Nearest_centroid_classifier
  7. Asymmetrical impact of service attribute performance on consumer satisfaction - NIH, accessed November 28, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC9243898/

Questions or Feedback?

If you have technical questions about this implementation, want to discuss our methodology, or have suggestions for improvement, reach out to us at [email protected].