hero

PropTech Jobs

Take a look at the exciting opportunities available in the MetaProp portfolio

Applied ML Scientist — Model Calibration & Personalization

Palazzo

Palazzo

Software Engineering, Data Science
International Falls, MN, USA
USD 130k-170k / year + Equity
Posted on Nov 26, 2025

🎯 Job Scorecard: Applied ML Scientist — Model Calibration & Personalization

(Palazzo.ai — AI-Powered Interior Design Platform)


Mission

Transform Palazzo’s heuristic recommendation system into a data-driven, continuously learning personalization engine.
You’ll refine our existing rule-based models for design cohesion, preference awareness, and pricing sensitivity into adaptive, empirically validated ML models that improve with every user interaction — driving measurable lifts in engagement, conversion, and customer delight.


Outcomes (What Success Looks Like in 12 Months)

TimeframeOutcomeSuccess Metric

30 days Understand current recommender stack (cohesion, preference, price systems) and define evaluation framework.Offline evaluation harness running (NDCG@K, cohesion score, calibration curves).

60 days Build dataset of past recommendations + interactions; derive labeled pairs and sets for training.Training dataset established with ≥100K labeled examples (heuristics + user actions).

90 days Train and evaluate learned cohesion calibration model that improves ranking over heuristic baseline.+15–20% NDCG lift vs. heuristic on validation; feature importance documented.

4–6 months Integrate model into recommendation pipeline; measure real-world impact.+10–20% lift in CTR / Add-to-Cart in A/B vs. baseline.

6–9 months Launch context embeddings (room + user preference vectors) and personalization layer.Per-user personalization metrics (lift vs. global ranking; satisfaction >4.5/5).

12 months Establish continuous learning loop (retraining + monitoring).Retraining cadence operational; drift detection + automated dashboards live.


Responsibilities

🔍 Model Calibration & Learning

  • Fine-tune existing CLIP/SigLIP embeddings using weak supervision from heuristic scores and user feedback.

  • Learn weight calibration functions for cohesion scoring (style, color, material, budget).

  • Train and deploy lightweight ranking or regression models (XGBoost, MLPs) that learn from clickstream data.

🧠 Personalization & Context Modeling

  • Create context embeddings for rooms and users from uploaded photos, design choices, and interactions.

  • Implement preference clustering or per-user fine-tuning of aesthetic weights.

🧩 Experimentation & Evaluation

  • Define offline ranking metrics (NDCG, coverage, novelty, diversity, cohesion score stability).

  • Design and interpret online A/B tests to quantify engagement and revenue lift.

  • Collaborate with PM/designer to align metrics with user experience goals.

📊 Data & Feedback Systems

  • Work with data labelers and internal designers to curate labeled sets for compatibility and bundle cohesion.

  • Develop lightweight active learning loops that prioritize uncertain or low-confidence recommendations for review.

  • Build dashboards summarizing model health, drift, and performance over time.


Competencies (What Great Looks Like)

🧮 Technical Expertise

  • Strong in recommendation systems, metric learning, or ranking models.

  • Deep familiarity with PyTorch, Faiss, XGBoost / LightGBM, and the Python data stack.

  • Hands-on experience fine-tuning multimodal embeddings (CLIP, SigLIP, BLIP, etc.).

  • Solid understanding of offline evaluation metrics (AUC, NDCG, recall@K, diversity) and online experimentation (A/B, multi-armed bandits, significance).

🎨 Aesthetic and Spatial Intuition

  • Comfort reasoning about visual design cohesion — palette, texture, style, material — not just numeric similarity.

  • Ideally has experience in fashion, furniture, or lifestyle recommendation contexts.

🧠 Analytical & Communicative

  • Designs experiments that clearly link model changes to business KPIs.

  • Explains results clearly to product and design stakeholders — translates “model lift” into “conversion lift.”

⚙️ Startup Builder Mentality

  • Thrives in fast-paced, scrappy environments.

  • Works iteratively: prototypes → measures → deploys.

  • Can operate independently with limited engineering support.


Key Performance Indicators

CategoryMetric

Offline performanceNDCG@K lift vs baseline ≥ +15%; AUC ≥ 0.75

Online performanceCTR / Add-to-Cart / AOV lift ≥ +10–20%

Data efficiencyLabel utilization efficiency ≥ 80%; active learning coverage ≥ 70%

System reliabilityModel drift < 10%; retrain cadence < 30 days

Documentation & communicationClear, reproducible experiments; monthly insight reports


Profile

Attribute Target

Experience 4–8 years applied ML in recsys, personalization, or multimodal AI

Domain E-commerce, fashion tech, home design, or digital advertising

Education MS/PhD preferred in ML, CS, Applied Math, or related field

Stack PyTorch, sklearn, Faiss, pandas, NumPy, FastAPI, SQL/BigQuery

Bonus Graph ML, embeddings evaluation, A/B experimentation at scale

Soft skills Empirical, pragmatic, visually literate, collaborative with design/PM


Team Context

You’ll work directly with:

  • CTO + Senior Engineer: to integrate trained models into our recommendation stack.

  • Product Manager (Raffi): to define success metrics and pilot experiments.

  • Labeling Team: to bootstrap and validate training datasets.


Compensation Benchmark (Seed-Stage / Early Hire)

  • Title: Applied ML Scientist, Personalization & Design Intelligence

  • Base Salary: $130K–$170K (U.S.)

  • Equity: 0.3–0.75%

  • Location: Remote / Hybrid (U.S. preferred)

  • Reporting to: CTO


Cultural Fit

  • You thrive on turning beautiful ideas into measurable, scientific results.

  • You believe that taste can be modeled — not just through data, but through understanding people and spaces.

  • You care about why an AI recommendation feels good — not just whether it scores well