Recommendations
Personalized Product Recommendations with Neo4j
Recommendations
Personalized product recommendations can increase conversions, improve sales rates and provide a better experice for users. In this Neo4j Browser guide, we’ll take a look at how you can generate graphbased realtime personalized product recommendations using a dataset of movies and movie ratings, but these techniques can be applied to many different types of products or content.
GraphBased Recommendations
Generating personalized recommendations is one of the most common use cases for a graph database. Some of the main benefits of using graphs to generate recommendations include:

Performance. Indexfree adjacency allows for calculating recommendations in real time, ensuring the recommendation is always relevant and reflecting uptodate information.

Data model. The labeled property graph model allows for easily combining datasets from multiple sources, allowing enterprises to unlock value from previously separated data silos.
Let’s get started! Click the arrow above to move to the next pane in this guide.
The Open Movie Graph Data Model
The Property Graph Model
The data model of graph databases is called the labeled property graph model.
Nodes: The entities in the data.
Labels: Each node can have one or more label that specifies the type of the node.
Relationships: Connect two nodes. They have a single direction and type.
Properties: Keyvalue pair properties can be stored on both nodes and relationships.
Eliminate Data Silos
In this use case, we are using graphs to combine data from multiple silos.
Product Catalog: Data describing movies comes from the product catalog silo.
User Purchases / Reviews: Data on user purchases and reviews comes from the user or transaction silo.
By combining these two in the graph, we are able to query across datasets to generate personalized product recommendations.
Nodes
Movie
, Actor
, Director
, User
, Genre
are the labels used in this example.
Relationships
ACTED_IN
, IN_GENRE
, DIRECTED
, RATED
are the relationships used in this example.
Properties
title
, name
, year
, rating
are some of the properties used in this example.
Intro To Cypher
In order to work with our labeled property graph, we need a query language for graphs.
Graph Patterns
Cypher is the query language for graphs and is centered around graph patterns. Graph patterns are expressed in Cypher using ASCIIart like syntax.
Nodes
Nodes are defined within parentheses ()
. Optionally, we can specify node label(s): (:Movie)
Relationships
Relationships are defined within square brackets []
. Optionally we can specify type and direction:
(:Movie)+←[:RATED]+(:User)
Aliases
Graph elements can be bound to aliases that can be referred to later in the query:
(m:Movie)+←+[r:RATED](u:User)
Predicates
Filters can be applied to these graph patterns to limit the matching paths. Boolean logic operators, regular expressions and string comparison operators can be used here.
Aggregations
There is an implicit group when using aggregation functions such as COUNT
.
Be sure to use the Cypher Refcard as a reference for learning Cypher syntax.
Dissecting a Cypher Statement
Let’s look at a Cypher query that answers the question "How many reviews does each Matrix movie have?". Don’t worry if this seems complex, we’ll build up our understanding of Cypher as we move along.
MATCH (m:Movie)<[:RATED](u:User)
WHERE m.title CONTAINS "Matrix"
WITH m.title AS movie, COUNT(*) AS reviews
RETURN movie, reviews
ORDER BY reviews DESC
LIMIT 5;
find 

Search for an existing graph pattern 
filter 

Filter matching paths to only those matching a predicate 
aggregate 

Count number of paths matched for each movie 
return 

Specify columns to be returned by the statement 
order 

Order by number of reviews, in descending order 
limit 

Only return first five records 
Personalized Recommendations
Now let’s start generating some recommendations. There are two basic approaches to recommendation algorithms.
ContentBased Filtering
Recommend items that are similar to those that a user is viewing, rated highly or purchased previously.
MATCH p=(m:Movie {title: "Net, The"})[:ACTED_ININ_GENREDIRECTED*2]()
RETURN p LIMIT 25
Collaborative Filtering
Use the preferences, ratings and actions of other users in the network to find items to recommend.
MATCH (m:Movie {title: "Crimson Tide"})<[:RATED](u:User)[:RATED]>(rec:Movie)
RETURN rec.title AS recommendation, COUNT(*) AS usersWhoAlsoWatched
ORDER BY usersWhoAlsoWatched DESC LIMIT 25
ContentBased Filtering
The goal of contentbased filtering is to find similar items, using attributes (or traits) of the item. Using our movie data, one way we could define similarlity is movies that have common genres.
Similarity Based on Common Genres
// Find similar movies by common genres
MATCH (m:Movie)[:IN_GENRE]>(g:Genre)<[:IN_GENRE](rec:Movie)
WHERE m.title = "Inception"
WITH rec, COLLECT(g.name) AS genres, COUNT(*) AS commonGenres
RETURN rec.title, genres, commonGenres
ORDER BY commonGenres DESC LIMIT 10;
Personalized Recommendations Based on Genres
If we know what movies a user has watched, we can use this information to recommend similar movies:
// Content recommendation by overlapping genres
MATCH (u:User {name: "Angelica Rodriguez"})[r:RATED]>(m:Movie),
(m)[:IN_GENRE]>(g:Genre)<[:IN_GENRE](rec:Movie)
WHERE NOT EXISTS( (u)[:RATED]>(rec) )
WITH rec, [g.name, COUNT(*)] AS scores
RETURN rec.title AS recommendation, rec.year AS year,
COLLECT(scores) AS scoreComponents,
REDUCE (s=0,x in COLLECT(scores)  s+x[1]) AS score
ORDER BY score DESC LIMIT 10
Weighted Content Algorithm
Of course there are many more traits in addition to just genre that we can consider to compute similarity, such actors and directors. Let’s use a weighted sum to score the recommendations based on the number of actors, genres and directors they have in common to boost the score:
// Find similar movies by common genres
MATCH (m:Movie) WHERE m.title = "Wizard of Oz, The"
MATCH (m)[:IN_GENRE]>(g:Genre)<[:IN_GENRE](rec:Movie)
WITH m, rec, COUNT(*) AS gs
OPTIONAL MATCH (m)<[:ACTED_IN](a:Actor)[:ACTED_IN]>(rec)
WITH m, rec, gs, COUNT(a) AS as
OPTIONAL MATCH (m)<[:DIRECTED](d:Director)[:DIRECTED]>(rec)
WITH m, rec, gs, as, COUNT(d) AS ds
RETURN rec.title AS recommendation, (5*gs)+(3*as)+(4*ds) AS score ORDER BY score DESC LIMIT 100
ContentBased Similarity Metrics
So far we’ve used the number of common traits as a way to score the relevance of our recommendations. Let’s now consider a more robust way to quantify similarity, using a similarity metric. Similarity metrics are an important component used in generating personalized recommendations that allow us to quantify how similar two items (or as we’ll see later, how similar two users preferences) are.
Jaccard Index
The Jaccard index is a number between 0 and 1 that indicates how similar two sets are. The Jaccard index of two identical sets is 1. If two sets do not have a common element, then the Jaccard index is 0. The Jaccard is calculated by dividing the size of the intersection of two sets by the union of the two sets.
We can calculate the Jaccard index for sets of movie genres to determine how similar two movies are.
MATCH (m:Movie {title: "Inception"})[:IN_GENRE]>(g:Genre)<[:IN_GENRE](other:Movie)
WITH m, other, COUNT(g) AS intersection, COLLECT(g.name) AS i
MATCH (m)[:IN_GENRE]>(mg:Genre)
WITH m,other, intersection,i, COLLECT(mg.name) AS s1
MATCH (other)[:IN_GENRE]>(og:Genre)
WITH m,other,intersection,i, s1, COLLECT(og.name) AS s2
WITH m,other,intersection,s1,s2
WITH m,other,intersection,s1+[x IN s2 WHERE NOT x IN s1] AS union, s1, s2
RETURN m.title, other.title, s1,s2,((1.0*intersection)/SIZE(union)) AS jaccard ORDER BY jaccard DESC LIMIT 100
We can apply this same approach to all "traits" of the movie (genre, actors, directors, etc.):
MATCH (m:Movie {title: "Inception"})[:IN_GENREACTED_INDIRECTED](t)[:IN_GENREACTED_INDIRECTED](other:Movie)
WITH m, other, COUNT(t) AS intersection, COLLECT(t.name) AS i
MATCH (m)[:IN_GENREACTED_INDIRECTED](mt)
WITH m,other, intersection,i, COLLECT(mt.name) AS s1
MATCH (other)[:IN_GENREACTED_INDIRECTED](ot)
WITH m,other,intersection,i, s1, COLLECT(ot.name) AS s2
WITH m,other,intersection,s1,s2
WITH m,other,intersection,s1+[x IN s2 WHERE NOT x IN s1] AS union, s1, s2
RETURN m.title, other.title, s1,s2,((1.0*intersection)/SIZE(union)) AS jaccard ORDER BY jaccard DESC LIMIT 100
Collaborative Filtering – Leveraging Movie Ratings
Notice that we have usermovie ratings in our graph. The collaborative filtering approach is going to make use of this information to find relevant recommendations.
Steps:

Find similar users in the network.

Assuming that similar users have similar preferences, what are the movies those similar users like?
// Show all ratings by Misty Williams
MATCH (u:User {name: "Misty Williams"})
MATCH (u)[r:RATED]>(m:Movie)
RETURN *;
// Show all ratings by Misty Williams
MATCH (u:User {name: "Misty Williams"})
MATCH (u)[r:RATED]>(m:Movie)
RETURN avg(r.rating) AS average;
// What are the movies that Misty liked more than average?
MATCH (u:User {name: "Misty Williams"})
MATCH (u)[r:RATED]>(m:Movie)
WITH u, avg(r.rating) AS average
MATCH (u)[r:RATED]>(m:Movie)
WHERE r.rating > average
RETURN *;
Collaborative Filtering – The Wisdom of Crowds
Simple Collaborative Filtering
MATCH (u:User {name: "Cynthia Freeman"})[:RATED]>(:Movie)<[:RATED](o:User)
MATCH (o)[:RATED]>(rec:Movie)
WHERE NOT EXISTS( (u)[:RATED]>(rec) )
RETURN rec.title, rec.year, rec.plot
LIMIT 25
Of course this is just a simple appraoch, there are many problems with this query, such as not normalizing based on popularity or not taking ratings into consideration. In the next section, we will see how we can improve this approach using the kNN method.
Only Consider Genres Liked by the User
Many recommender systems are a blend of collaborative filtering and contentbased approaches:
MATCH (u:User {name: "Andrew Freeman"})[r:RATED]>(m:Movie)
WITH u, avg(r.rating) AS mean
MATCH (u)[r:RATED]>(m:Movie)[:IN_GENRE]>(g:Genre)
WHERE r.rating > mean
WITH u, g, COUNT(*) AS score
MATCH (g)<[:IN_GENRE](rec:Movie)
WHERE NOT EXISTS((u)[:RATED]>(rec))
RETURN rec.title AS recommendation, rec.year AS year, COLLECT(DISTINCT g.name) AS genres, SUM(score) AS sscore
ORDER BY sscore DESC LIMIT 10
Collaborative Filtering – Similarity Metrics
We use similarity metrics to quantify how similar two users or two items are. We’ve already seen Jaccard similarity used in the context of contentbased filtering. Now, we’ll see how similarity metrics are used with collaborative filtering.
Cosine Distance
Jaccard similarity was useful for comparing movies and is essentially comparing two sets (groups of genres, actors, directors, etc.). However, with movie ratings each relationship has a weight that we can consider as well.
Cosine Similarity
The cosine similarity of two users will tell us how similar two users' preferences for movies are. Users with a high cosine similarity will have similar preferences.
See this GraphGist for another example of using cosine similarity for recommendations.
// Most similar users using Cosine similarity
MATCH (p1:User {name: "Cynthia Freeman"})[x:RATED]>(m:Movie)<[y:RATED](p2:User)
WITH COUNT(m) AS numbermovies, SUM(x.rating * y.rating) AS xyDotProduct,
SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating)  xDot + a^2)) AS xLength,
SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating)  yDot + b^2)) AS yLength,
p1, p2 WHERE numbermovies > 10
RETURN p1.name, p2.name, xyDotProduct / (xLength * yLength) AS sim
ORDER BY sim DESC
LIMIT 100;
We can also compute this measure using the Cosine Similarity algorithm in the Neo4j Graph Algorithms Library.
MATCH (p1:User {name: 'Cynthia Freeman'})[x:RATED]>(movie)<[x2:RATED](p2:User)
WHERE p2 <> p1
WITH p1, p2, collect(x.rating) AS p1Ratings, collect(x2.rating) AS p2Ratings
WHERE size(p1Ratings) > 10
RETURN p1.name AS from,
p2.name AS to,
gds.alpha.similarity.cosine(p1Ratings, p2Ratings) AS similarity
ORDER BY similarity DESC
Collaborative Filtering – Similarity Metrics
Pearson Similarity
Pearson similarity, or Pearson correlation, is another similarity metric we can use. This is particularly wellsuited for product recommendations because it takes into account the fact that different users will have different mean ratings: on average some users will tend to give higher ratings than others. Since Pearson similarity considers differences about the mean, this metric will account for these discrepancies.
MATCH (u1:User {name:"Cynthia Freeman"})[r:RATED]>(m:Movie)
WITH u1, avg(r.rating) AS u1_mean
MATCH (u1)[r1:RATED]>(m:Movie)<[r2:RATED](u2)
WITH u1, u1_mean, u2, COLLECT({r1: r1, r2: r2}) AS ratings WHERE size(ratings) > 10
MATCH (u2)[r:RATED]>(m:Movie)
WITH u1, u1_mean, u2, avg(r.rating) AS u2_mean, ratings
UNWIND ratings AS r
WITH sum( (r.r1.ratingu1_mean) * (r.r2.ratingu2_mean) ) AS nom,
sqrt( sum( (r.r1.rating  u1_mean)^2) * sum( (r.r2.rating  u2_mean) ^2)) AS denom,
u1, u2 WHERE denom <> 0
RETURN u1.name, u2.name, nom/denom AS pearson
ORDER BY pearson DESC LIMIT 100
We can also compute this measure using the Pearson Similarity algorithm in the Neo4j Graph Algorithms Library.
MATCH (p1:User {name: 'Cynthia Freeman'})[x:RATED]>(movie:Movie)
WITH p1, gds.alpha.similarity.asVector(movie, x.rating) AS p1Vector
MATCH (p2:User)[x2:RATED]>(movie:Movie) WHERE p2 <> p1
WITH p1, p2, p1Vector, gds.alpha.similarity.asVector(movie, x2.rating) AS p2Vector
WHERE size(apoc.coll.intersection([v in p1Vector  v.category], [v in p2Vector  v.category])) > 10
RETURN p1.name AS from,
p2.name AS to,
gds.alpha.similarity.pearson(p1Vector, p2Vector, {vectorType: "maps"}) AS similarity
ORDER BY similarity DESC
LIMIT 100
Collaborative Filtering – NeighborhoodBased Recommendations
kNN – k Nearest Neighbors
Now that we have a method for finding similar users based on preferences, the next step is to allow each of the k most similar users to vote for what items should be recommended.
Essentially:
"Who are the 10 users with tastes in movies most similar to mine? What movies have they rated highly that I haven’t seen yet?"
MATCH (u1:User {name:"Cynthia Freeman"})[r:RATED]>(m:Movie)
WITH u1, avg(r.rating) AS u1_mean
MATCH (u1)[r1:RATED]>(m:Movie)<[r2:RATED](u2)
WITH u1, u1_mean, u2, COLLECT({r1: r1, r2: r2}) AS ratings WHERE size(ratings) > 10
MATCH (u2)[r:RATED]>(m:Movie)
WITH u1, u1_mean, u2, avg(r.rating) AS u2_mean, ratings
UNWIND ratings AS r
WITH sum( (r.r1.ratingu1_mean) * (r.r2.ratingu2_mean) ) AS nom,
sqrt( sum( (r.r1.rating  u1_mean)^2) * sum( (r.r2.rating  u2_mean) ^2)) AS denom,
u1, u2 WHERE denom <> 0
WITH u1, u2, nom/denom AS pearson
ORDER BY pearson DESC LIMIT 10
MATCH (u2)[r:RATED]>(m:Movie) WHERE NOT EXISTS( (u1)[:RATED]>(m) )
RETURN m.title, SUM( pearson * r.rating) AS score
ORDER BY score DESC LIMIT 25
MATCH (u1:User {name: 'Cynthia Freeman'})[x:RATED]>(movie:Movie)
WITH u1, gds.alpha.similarity.asVector(movie, x.rating) AS u1Vector
MATCH (u2:User)[x2:RATED]>(movie:Movie) WHERE u2 <> u1
WITH u1, u2, u1Vector, gds.alpha.similarity.asVector(movie, x2.rating) AS u2Vector
WHERE size(apoc.coll.intersection([v in u1Vector  v.category], [v in u2Vector  v.category])) > 10
WITH u1, u2, gds.alpha.similarity.pearson(u1Vector, u2Vector, {vectorType: "maps"}) AS similarity
ORDER BY similarity DESC
LIMIT 10
MATCH (u2)[r:RATED]>(m:Movie) WHERE NOT EXISTS( (u1)[:RATED]>(m) )
RETURN m.title, SUM( similarity * r.rating) AS score
ORDER BY score DESC LIMIT 25
Further Work
Resources

Web Cypher Refcard

Blog Post Collaborative Filtering: Creating the Best Teams Ever

Video Building a recommendation engine with Python and Neo4j

Book (free download) Graph Databases: New Opportunities for Connected Data
Exercises
Extend these queries:

Temporal component: Preferences change over time, use the rating timestamp to consider how more recent ratings might be used to find more relevant recommendations.

Keyword extraction: Enhance the traits available using the plot description. How would you model extracted keywords for movies?

Image recognition using posters: There are several libraries and APIs that offer image recognition and tagging. Since we have movie poster images for each movie, how could we use these to enhance our recommendations?