The Architecture of Taste: Engineering a Cold-Start Wine Recommender

Version 4.0 | Technical Documentation

This document provides the complete technical specification for Unwyned's recommendation engine. It includes mathematical formulations, psychophysical research citations, implementation details, and engineering rationale. For a simplified overview, see The Intel.

1. Introduction: The Digitization of Sensory Experience

Building a recommendation engine for wine presents a unique set of challenges that differ fundamentally from recommending movies, books, or electronics. While traditional recommender systems rely on Collaborative Filtering (predicting preference based on user similarity matrices), this approach fails in a "Cold-Start" scenario where a new user has zero interaction history. Furthermore, wine is a sensory product defined by chemical properties that do not map linearly to consumer language.

This document details the engineering architecture of our proprietary Cold-Start Recommendation Engine. It outlines how we solve the "Semantic Gap" utilizing a Center-Out Psychographic Quiz, handle the mathematical reality of "Impossible Profiles" via Manifold Projection, and utilize Adaptive Sigmoid Calibration to normalize scores across diverse user psychologies.

2. The Data Model: An 8-Dimensional Flavor Space

At the core of our system is a vector space representation of taste. Unlike simple tag-based systems (which treat "Fruity" as a binary boolean), we model both users (U) and wines (W) as continuous vectors in ℝ⁸.

Each dimension is normalized to a floating-point scale of [1.0, 5.0], representing the intensity of specific chemical compounds:

Dimension	Scale (1.0 - 5.0)	Enological Correlate
Body	Watery → Viscous	Alcohol by Volume (ABV), Glycerol, Dry Extract
Sweetness	Bone Dry → Dessert	Residual Sugar (g/L)
Acidity	Flat → Tart	pH, Total Acidity (Tartaric)
Tannin	Silky → Astringent	Phenolic content, Proanthocyanidins
Fruit Intensity	Savory → Fruit-Bomb	Ester concentration
Oak	Steel → Heavy Oak	Lactones, Vanillin (Time in Barrel)
Earthiness	Clean → Funky	Geosmin, Brettanomyces
Spice	Mild → Peppery	Rotundone concentration

3. The Input Protocol: Center-Out Psychographics

Directly asking users "Do you like high tannins?" yields noisy data due to terminology conflation (users often confuse "Dry" with "Bitter"). To solve this, we employ a Proxy Profiling system rooted in psychophysics (Vinotype theory, PROP taster status).

3.1 Vector Initialization Strategy

To prevent "Profile Saturation" (where users hit the 5.0 ceiling too easily), we utilize a Center-Out Scoring Model.

Initialization: Every user profile starts as a neutral vector: U = [3.0, 3.0, …, 3.0].
Perturbation: Questions apply vector shifts (Δ) rather than setting absolute values.
- Primary Anchors: Δ = ±1.2 (Moves user to 1.8 or 4.2).
- Reinforcement/Trade-offs: Δ = ±0.8 (Required to reach the 1.0 floor or 5.0 ceiling).

This ensures that a user only reaches an "Extreme" profile (e.g., Tannin=5.0) if they answer consistently across multiple correlated questions.

3.2 Phase I: The Primary Anchors (6 Questions)

These questions establish the broad direction of the user vector.

Question Proxy	Target Dimension	Logic & Vector Shift	Psychophysical Basis
Coffee Style	tannin	Black → +1.2; Milk/Sugar → -1.2	Bitterness tolerance correlates with PROP sensitivity
Lemon/Sour	acidity	Love it → +1.2; Too sharp → -1.2	Acid-seeking behavior maps to low pH tolerance
Dessert	sweetness	Rich Cake → +1.2; Cheese/Savory → -1.2	Direct hedonic preference for sucrose
Texture	body	Skim Milk → -1.2; Heavy Cream → +1.2	Tactile sensitivity maps to viscosity/ABV
Steak Prep	earth, fruit	BBQ Glaze → Fruit+, Earth-; Mushrooms → Earth+1.2	Umami vs. Sweet preference splitter
Scents	oak	Vanilla/Toast → +1.2; Citrus/Clean → -1.2	Olfactory preference for lactones (oak)

3.3 Phase II: The Reinforcement Layer (2 Questions)

We triangulate "Risky" dimensions (Tannin and Acidity) where user self-reporting is least reliable.

The Tea Check (Tannin): "Over-steeped black tea."
- Drink it: Δ = +0.8. (If they also said Black Coffee, Tannin becomes 3.0 + 1.2 + 0.8 = 5.0).
- Pour it out: Δ = -0.8. (Corrects a "Black Coffee" user back to ≈3.4—they like flavor but not astringency).
The Salt Check (Acidity): "Do you salt food before tasting?"
- Yes: Δ = +0.5 Acidity (Salt suppresses bitterness, common in high-acid seekers).

3.4 Phase III: The Trade-Off Layer (The "Impossible" Fix)

To prevent the "All 5s" Paradox (a user who wants everything maxed out), we force a choice between conflicting chemical attributes.

Question: "Which is worse: A wine that is too syrupy sweet, or one that dries your mouth out?"
- Hate Syrup: Sweetness -= 1.0 (Caps Sweetness at 4.0).
- Hate Dry: Tannin -= 1.0 (Caps Tannin at 4.0).
Question: "Smoothness vs. Boldness?"
- Smooth: Tannin -= 0.5, Acidity -= 0.5.
- Bold: Body += 0.8, Tannin += 0.5.

Result: A user can reach a 5.0 in one dimension, but the trade-offs ensure they cannot be a 5.0 in all dimensions, keeping the vector within the realm of realistic wine chemistry.

4. Preprocessing: Manifold Projection

Even with trade-off questions, users may generate vectors that are chemically rare (e.g., High Acid + High Body + High Sugar). If we search for this vector [5, 5, 5, …] in our database, standard Euclidean distance measures will push the result away from all valid wines.

4.1 Solution: Manifold Projection

We define K=5 Archetype Centroids representing valid wine clusters (e.g., BoldRed, CrispWhite, Dessert). When a profile is flagged as "Extreme" (Magnitude > Threshold), we project it toward the nearest valid centroid:

U_corrected = (1 - λ)U_raw + λC_nearest

Parameter: λ = 0.3
Rationale: We shift the user 30% toward reality. This preserves their directional intent (e.g., "I want bold") while ensuring the search occurs within the valid chemical feature space.

5. Core Algorithm: Asymmetric Similarity

Standard distance metrics like Euclidean Distance assume symmetry: the penalty for a wine being "too bold" is the same as for it being "too light." In sensory science, this is false.

The "Deal-Breaker" Effect: Humans are evolutionarily programmed to reject bitterness (poison) and excessive acidity (spoilage).
Implementation: We use an Asymmetric Penalty Function.

Penalty Multiplier = {
1.4 if Δ > 0 (Overshoot/Deal Breaker)
0.8 if Δ ≤ 0 (Undershoot/Safe Miss)
}

This ensures that "offending" the palate is penalized nearly twice as heavily as simply "boring" the palate.

Note: The overshoot penalty is further modulated by the user's Sensitivity scalar (S) for harshness dimensions. See Section 9.1 for details.

6. Scoring Philosophy: Match vs. Confidence

A critical decision in our UX architecture is the presentation of the recommendation score. We explicitly utilize a "Match Score" rather than a "Confidence Score" or "Predicted Rating."

6.1 Why "Match Score"?

Geometric Reality: Our system calculates the geometric proximity between the wine's chemical vector and the user's preference vector. A "98% Match" accurately describes that the vectors are nearly identical in Euclidean space.
Managing Expectations: A "Confidence Score" implies a prediction of the future ("You will like this"). If the user dislikes it, the system appears incompetent. A "Match Score" implies alignment of attributes. If the user dislikes a 98% Match, the interpretation is "The wine matches my profile, but perhaps I don't like this style today."

6.2 Sigmoid Normalization

Raw Euclidean distance is unintuitive. We use a Logistic Sigmoid Transformation to map distance to a percentage.

Score = 1 / (1 + e^{k(d - d₀)})

Where:

d: Asymmetric Distance.
d₀: Pivot point (Distance where Score = 50%).
k: Slope (Sensitivity).

7. Calibration: Solving the "Individual Cutscore"

Users have different internal baselines for satisfaction. To avoid forcing users to manually set filters (e.g., "Only show me 90% matches"), we employ Adaptive Pivot Scaling.

We dynamically adjust the pivot point d₀ in the scoring formula based on the user's rating history.

For the "Picky" User: We decrease d₀. The system becomes stricter; a wine must be mathematically closer to achieve a "90% Match."
For the "Easy-Going" User: We increase d₀. The system relaxes, revealing a wider variety of "Good Matches."

This calibration ensures that "90% Match" always means "Strong Recommendation", regardless of whether the user is a critic or an enthusiast.

8. Dynamic Evolution: Bounded Asymptotic Learning

As users rate wines, their profile must evolve. We employ Bounded Asymptotic Learning to update the user vector U.

U_new = U_old + η · dampening · (W - U_old)

Saturation Dampening: To prevent "lock-in" at the extremes (1.0 or 5.0), we apply a dampening factor with a hard floor of 0.25. This ensures that even if a user is currently at an extreme (e.g., Tannin=5.0), consistent negative feedback will eventually pull them back toward the center.

9. User-Level Scalars: Sensitivity (S) and Exploration (E)

Beyond the 8-dimensional taste vector, each user possesses two additional scalar parameters that modulate how recommendations are generated and scored.

9.1 Sensitivity (S) ∈ [0, 1]

Purpose: Captures individual tolerance for sensory intensity in "harshness" dimensions (tannin, acidity, body, oak, spice). A sensitive user (S → 1) experiences bitter compounds more intensely than a tolerant user (S → 0).

Seeding from Quiz

S is initialized from the composite harshness profile derived from quiz answers:

S_initial = clamp((tannin + acidity + body + oak + spice) / 5 - 2.5, 0, 1)

Users with above-average harshness preferences seed S ≈ 0.5; extreme preferences push toward 1.0.

Modulating Asymmetric Penalties

The base overshoot penalty (1.4×) is amplified for sensitive users on harshness dimensions:

effectivePenalty_d = 1.4 × (1 + α_d × S)

Dimension	α (Amplification Factor)
Tannin	0.8
Acidity	0.7
Body	0.5
Oak	0.4
Spice	0.3
Other dimensions	0.0

A user with S = 1 and a wine overshooting tannin by 1.0 pays: 1.4 × 1.8 = 2.52 penalty, compared to 1.4 for S = 0.

Learning S from Ratings

When a user rates a wine poorly and the wine overshoots their profile on harshness dimensions, S increases:

residual = (expectedRating - actualRating) / 4
S_new = clamp(S_old + γ × residual × weightedOvershoot, 0, 1)

Where γ = 0.08 is the learning rate (decays with confidence), and weightedOvershoot is the normalized overshoot across harshness dimensions.

9.2 Exploration (E) ∈ [0, 1]

Purpose: Controls willingness to venture beyond established preferences. High-E users receive "wildcard" recommendations from outside their comfort zone.

Seeding from Quiz

E is initialized from quiz answers probing adventure-seeking behavior:

E_initial = (adventure_answer + novel_food_answer) / 2

Normalized to [0, 1] where 0.5 represents moderate exploration appetite.

Wildcard Injection

The number of wildcards injected into recommendations is determined by:

effectiveE = E × (1 - 0.5 × S) (Sensitive users get fewer wildcards)
wildcardCount = round(effectiveE × 2)

Wildcards are selected from wines with:

Match score in the "interesting range": 35% – 65%
noveltyDistance ≥ 3.0 (significantly different from user's profile)

Learning E from Wildcard Ratings

When users rate wildcards, E adjusts based on whether they exceeded the baseline expectation (60%):

r = (rating - 1) / 4 (Normalized to [0, 1])
E_new = clamp(E_old + γ × (r - 0.6) × noveltyFactor, 0, 1)

Where γ = 0.06, noveltyFactor = min(noveltyDistance / 3.0, 1.0), and ratings above 3★ (r = 0.5) push E up while 1-2★ ratings push E down.

9.3 S+E Interaction

Sensitivity suppresses exploration to protect sensitive users from potentially unpleasant wildcard experiences. The formula effectiveE = E × (1 - 0.5 × S) ensures:

A user with S = 0, E = 1 gets full exploration (2 wildcards)
A user with S = 1, E = 1 gets reduced exploration (1 wildcard)
A user with S = 1, E = 0 gets no wildcards regardless of S

Implementation Note

S and E are stored locally with confidence counters (0-30) that decay learning rates over time. This prevents early ratings from locking users into extreme values while still allowing gradual refinement.

References

Cold-Start Problem in Recommender Systems. Schein et al., 2002.
Vinotype & Sensory Segmentation. Hanni, 2012.
Asymmetric Impact in Attribute Performance. Mikulic et al., 2008.
Manifold Learning & Archetypes. Cutler et al., 1994.
PROP Taster Status and Food Preference. Tepper et al., 2009.
Chemical composition of wine. Waterhouse et al., 2016.

Works Cited

Introduction to Recommender Systems - Oracle, accessed November 28, 2025, https://www.oracle.com/a/ocom/docs/oracle-ds-introduction-to-recommendation-engines.pdf
Current Research Related to Wine Sensory Perception Since 2010 - MDPI, accessed November 28, 2025, https://www.mdpi.com/2306-5710/6/3/47
When to use cosine simlarity over Euclidean similarity - Data Science Stack Exchange, accessed November 28, 2025, https://datascience.stackexchange.com/questions/27726/when-to-use-cosine-simlarity-over-euclidean-similarity
Constraint Satisfaction Problem in AI - AlmaBetter, accessed November 28, 2025, https://www.almabetter.com/bytes/tutorials/artificial-intelligence/constraint-satisfaction-problem-in-ai
Manifold Learning in Machine Learning | by Hey Amit - Medium, accessed November 28, 2025, https://medium.com/@heyamit10/manifold-learning-in-machine-learning-e008e480d036
Nearest centroid classifier - Wikipedia, accessed November 28, 2025, https://en.wikipedia.org/wiki/Nearest_centroid_classifier
Asymmetrical impact of service attribute performance on consumer satisfaction - NIH, accessed November 28, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC9243898/

Questions or Feedback?

If you have technical questions about this implementation, want to discuss our methodology, or have suggestions for improvement, reach out to us at [email protected].