Transform Palazzo’s heuristic recommendation system into a data-driven, continuously learning personalization engine.
You’ll refine our existing rule-based models for design cohesion, preference awareness, and pricing sensitivity into adaptive, empirically validated ML models that improve with every user interaction — driving measurable lifts in engagement, conversion, and customer delight.

Outcomes (What Success Looks Like in 12 Months)

TimeframeOutcomeSuccess Metric

30 days Understand current recommender stack (cohesion, preference, price systems) and define evaluation framework.Offline evaluation harness running (NDCG@K, cohesion score, calibration curves).

60 days Build dataset of past recommendations + interactions; derive labeled pairs and sets for training.Training dataset established with ≥100K labeled examples (heuristics + user actions).

90 days Train and evaluate learned cohesion calibration model that improves ranking over heuristic baseline.+15–20% NDCG lift vs. heuristic on validation; feature importance documented.

4–6 months Integrate model into recommendation pipeline; measure real-world impact.+10–20% lift in CTR / Add-to-Cart in A/B vs. baseline.

6–9 months Launch context embeddings (room + user preference vectors) and personalization layer.Per-user personalization metrics (lift vs. global ranking; satisfaction >4.5/5).

12 months Establish continuous learning loop (retraining + monitoring).Retraining cadence operational; drift detection + automated dashboards live.

Responsibilities

🔍 Model Calibration & Learning

Fine-tune existing CLIP/SigLIP embeddings using weak supervision from heuristic scores and user feedback.
Learn weight calibration functions for cohesion scoring (style, color, material, budget).
Train and deploy lightweight ranking or regression models (XGBoost, MLPs) that learn from clickstream data.

🧠 Personalization & Context Modeling

Create context embeddings for rooms and users from uploaded photos, design choices, and interactions.
Implement preference clustering or per-user fine-tuning of aesthetic weights.

🧩 Experimentation & Evaluation

Define offline ranking metrics (NDCG, coverage, novelty, diversity, cohesion score stability).
Design and interpret online A/B tests to quantify engagement and revenue lift.
Collaborate with PM/designer to align metrics with user experience goals.

📊 Data & Feedback Systems

Work with data labelers and internal designers to curate labeled sets for compatibility and bundle cohesion.
Develop lightweight active learning loops that prioritize uncertain or low-confidence recommendations for review.
Build dashboards summarizing model health, drift, and performance over time.

Competencies (What Great Looks Like)

🧮 Technical Expertise

Strong in recommendation systems, metric learning, or ranking models.
Deep familiarity with PyTorch, Faiss, XGBoost / LightGBM, and the Python data stack.
Hands-on experience fine-tuning multimodal embeddings (CLIP, SigLIP, BLIP, etc.).
Solid understanding of offline evaluation metrics (AUC, NDCG, recall@K, diversity) and online experimentation (A/B, multi-armed bandits, significance).

🎨 Aesthetic and Spatial Intuition

Comfort reasoning about visual design cohesion — palette, texture, style, material — not just numeric similarity.
Ideally has experience in fashion, furniture, or lifestyle recommendation contexts.

🧠 Analytical & Communicative

Designs experiments that clearly link model changes to business KPIs.
Explains results clearly to product and design stakeholders — translates “model lift” into “conversion lift.”

⚙️ Startup Builder Mentality

Thrives in fast-paced, scrappy environments.
Works iteratively: prototypes → measures → deploys.
Can operate independently with limited engineering support.

Key Performance Indicators

CategoryMetric

Offline performanceNDCG@K lift vs baseline ≥ +15%; AUC ≥ 0.75

Online performanceCTR / Add-to-Cart / AOV lift ≥ +10–20%

Data efficiencyLabel utilization efficiency ≥ 80%; active learning coverage ≥ 70%

System reliabilityModel drift < 10%; retrain cadence < 30 days

Documentation & communicationClear, reproducible experiments; monthly insight reports

Profile

Attribute Target

Experience 4–8 years applied ML in recsys, personalization, or multimodal AI

Domain E-commerce, fashion tech, home design, or digital advertising

Education MS/PhD preferred in ML, CS, Applied Math, or related field

Stack PyTorch, sklearn, Faiss, pandas, NumPy, FastAPI, SQL/BigQuery

Bonus Graph ML, embeddings evaluation, A/B experimentation at scale

Soft skills Empirical, pragmatic, visually literate, collaborative with design/PM

Team Context

You’ll work directly with:

CTO + Senior Engineer: to integrate trained models into our recommendation stack.
Product Manager (Raffi): to define success metrics and pilot experiments.
Labeling Team: to bootstrap and validate training datasets.

Compensation Benchmark (Seed-Stage / Early Hire)

Title: Applied ML Scientist, Personalization & Design Intelligence
Base Salary: $130K–$170K (U.S.)
Equity: 0.3–0.75%
Location: Remote / Hybrid (U.S. preferred)
Reporting to: CTO

Cultural Fit

You thrive on turning beautiful ideas into measurable, scientific results.
You believe that taste can be modeled — not just through data, but through understanding people and spaces.
You care about why an AI recommendation feels good — not just whether it scores well

Apply now

See more open positions at Palazzo

Privacy policy Cookie policy