Madrid Embedding Recommender
Overview
The Madrid Embedding Recommender enables fast, cold-start recommendations specifically tailored for Madrid without requiring local user reviews. This lightweight model uses text embeddings to match user preferences with relevant places, making it particularly valuable for new users or visitors to the city.
Technical Implementation
Text Embedding Approach
Our approach leverages modern natural language processing techniques:
- Preference Encoding: User preferences are converted into a pseudo-document format:
- Categories are repeated according to their preference score
- Example: A user with high preference for parks (4.5) and cafes (3.5) is represented as “parks parks parks parks cafes cafes cafes”
- This weighted text representation captures preference strength
- Embedding Model: We use the all-MiniLM-L6-v2 SentenceTransformer model:
- Produces 384-dimensional dense vector representations
- Pre-trained on diverse text datasets
- Optimized for semantic similarity tasks
- Place Embedding: Each venue is represented as a text description including:
- Place name
- Category information
- Key features and attributes
- These are embedded into the same vector space as user preferences
Recommendation Pipeline
The recommendation process follows these steps:
- User Embedding Creation:
- Extract category preferences from user input (via sliders or chatbot)
- Generate the weighted category text representation
- Compute the embedding vector using SentenceTransformer
- Similarity Computation:
- Calculate cosine similarity between user embedding and all precomputed place embeddings
- Higher similarity indicates better semantic match between user preferences and place attributes
- Filtering & Diversity:
- Apply distance filter (3 km radius from user’s location)
- Group results by category
- Apply a diversity constraint: maximum of two recommendations per category
- This ensures varied recommendations rather than many similar places
- Final Ranking:
- Sort by similarity score
- Apply popularity and distance adjustments
- Return top-N recommendations
Implementation in Code
The MadridTransferRecommender class implements this approach:
class MadridTransferRecommender:
def __init__(self, embedding_model_name='all-MiniLM-L6-v2', embedding_path='models/madrid_place_embeddings.npz'):
self.model = SentenceTransformer(embedding_model_name)
self.place_embeddings = self._load_embeddings(embedding_path)
self.place_metadata = self._load_place_metadata()
def _load_embeddings(self, path):
# Load precomputed place embeddings
# ...
def _create_user_embedding(self, user_prefs):
# Convert user preferences to text and embed
# ...
def get_recommendations(self, user_lat, user_lon, user_prefs, num_recs=5):
# Generate recommendations based on embedding similarity
# ...
Advantages
- No User History Required: Perfect for new users with no previous interactions
- Fast Inference: Pre-computed embeddings enable real-time recommendations
- Language Understanding: Captures semantic relationships between preferences and places
- Diversity: Built-in constraints ensure varied recommendations
- Location Awareness: Geographic filtering ensures relevant local suggestions
Limitations and Future Improvements
- Limited Personalization: Less tailored to individual user history than collaborative filtering approaches
- Vocabulary Dependence: Performance affected by how places and preferences are described
- Future Work:
- Fine-tune embeddings on domain-specific data
- Integrate user feedback to improve embeddings over time
- Experiment with more sophisticated text representations