Applied ML Scientist — Model Calibration & Personalization
Palazzo
🎯 Job Scorecard: Applied ML Scientist — Model Calibration & Personalization
(Palazzo.ai — AI-Powered Interior Design Platform)
Mission
Transform Palazzo’s heuristic recommendation system into a data-driven, continuously learning personalization engine.
You’ll refine our existing rule-based models for design cohesion, preference awareness, and pricing sensitivity into adaptive, empirically validated ML models that improve with every user interaction — driving measurable lifts in engagement, conversion, and customer delight.
Outcomes (What Success Looks Like in 12 Months)
TimeframeOutcomeSuccess Metric
30 days Understand current recommender stack (cohesion, preference, price systems) and define evaluation framework.Offline evaluation harness running (NDCG@K, cohesion score, calibration curves).
60 days Build dataset of past recommendations + interactions; derive labeled pairs and sets for training.Training dataset established with ≥100K labeled examples (heuristics + user actions).
90 days Train and evaluate learned cohesion calibration model that improves ranking over heuristic baseline.+15–20% NDCG lift vs. heuristic on validation; feature importance documented.
4–6 months Integrate model into recommendation pipeline; measure real-world impact.+10–20% lift in CTR / Add-to-Cart in A/B vs. baseline.
6–9 months Launch context embeddings (room + user preference vectors) and personalization layer.Per-user personalization metrics (lift vs. global ranking; satisfaction >4.5/5).
12 months Establish continuous learning loop (retraining + monitoring).Retraining cadence operational; drift detection + automated dashboards live.
Responsibilities
🔍 Model Calibration & Learning
Fine-tune existing CLIP/SigLIP embeddings using weak supervision from heuristic scores and user feedback.
Learn weight calibration functions for cohesion scoring (style, color, material, budget).
Train and deploy lightweight ranking or regression models (XGBoost, MLPs) that learn from clickstream data.
🧠 Personalization & Context Modeling
Create context embeddings for rooms and users from uploaded photos, design choices, and interactions.
Implement preference clustering or per-user fine-tuning of aesthetic weights.
🧩 Experimentation & Evaluation
Define offline ranking metrics (NDCG, coverage, novelty, diversity, cohesion score stability).
Design and interpret online A/B tests to quantify engagement and revenue lift.
Collaborate with PM/designer to align metrics with user experience goals.
📊 Data & Feedback Systems
Work with data labelers and internal designers to curate labeled sets for compatibility and bundle cohesion.
Develop lightweight active learning loops that prioritize uncertain or low-confidence recommendations for review.
Build dashboards summarizing model health, drift, and performance over time.
Competencies (What Great Looks Like)
🧮 Technical Expertise
Strong in recommendation systems, metric learning, or ranking models.
Deep familiarity with PyTorch, Faiss, XGBoost / LightGBM, and the Python data stack.
Hands-on experience fine-tuning multimodal embeddings (CLIP, SigLIP, BLIP, etc.).
Solid understanding of offline evaluation metrics (AUC, NDCG, recall@K, diversity) and online experimentation (A/B, multi-armed bandits, significance).
🎨 Aesthetic and Spatial Intuition
Comfort reasoning about visual design cohesion — palette, texture, style, material — not just numeric similarity.
Ideally has experience in fashion, furniture, or lifestyle recommendation contexts.
🧠 Analytical & Communicative
Designs experiments that clearly link model changes to business KPIs.
Explains results clearly to product and design stakeholders — translates “model lift” into “conversion lift.”
⚙️ Startup Builder Mentality
Thrives in fast-paced, scrappy environments.
Works iteratively: prototypes → measures → deploys.
Can operate independently with limited engineering support.
Key Performance Indicators
CategoryMetric
Offline performanceNDCG@K lift vs baseline ≥ +15%; AUC ≥ 0.75
Online performanceCTR / Add-to-Cart / AOV lift ≥ +10–20%
Data efficiencyLabel utilization efficiency ≥ 80%; active learning coverage ≥ 70%
System reliabilityModel drift < 10%; retrain cadence < 30 days
Documentation & communicationClear, reproducible experiments; monthly insight reports
Profile
Attribute Target
Experience 4–8 years applied ML in recsys, personalization, or multimodal AI
Domain E-commerce, fashion tech, home design, or digital advertising
Education MS/PhD preferred in ML, CS, Applied Math, or related field
Stack PyTorch, sklearn, Faiss, pandas, NumPy, FastAPI, SQL/BigQuery
Bonus Graph ML, embeddings evaluation, A/B experimentation at scale
Soft skills Empirical, pragmatic, visually literate, collaborative with design/PM
Team Context
You’ll work directly with:
CTO + Senior Engineer: to integrate trained models into our recommendation stack.
Product Manager (Raffi): to define success metrics and pilot experiments.
Labeling Team: to bootstrap and validate training datasets.
Compensation Benchmark (Seed-Stage / Early Hire)
Title: Applied ML Scientist, Personalization & Design Intelligence
Base Salary: $130K–$170K (U.S.)
Equity: 0.3–0.75%
Location: Remote / Hybrid (U.S. preferred)
Reporting to: CTO
Cultural Fit
You thrive on turning beautiful ideas into measurable, scientific results.
You believe that taste can be modeled — not just through data, but through understanding people and spaces.
You care about why an AI recommendation feels good — not just whether it scores well