Recommendation System Algorithms | Accelerate AI Speed & Accuracy

On the Features of Recommendation System Algorithms

One of the most unusual methods I’ve used with neural networks is related precisely to discovering new music. Imagine: I have a chat with a neural network and an initial query; I upload a list of favorite genres, artists, and tracks. When I want to listen to something new, I simply send a command: “morning, working, find me something fresh,” or “recommend music similar to A.E.S. Dana but more energetic,” or “suggest three new genres I might like.” The AI provides recommendations, I listen, and I always note what worked and what didn’t — this way, the system improves over time. It’s very convenient: just specify your mood or activity, interest in new artists or familiar ones — and receive suitable variations.

This idea is developing in popular platforms in this direction. Spotify has a “radio” feature based on an artist or track, and Yandex Music offers a “My Wave” stream that can be customized according to mood, activity type, genre, and language of the composition. Algorithms analyze the user’s listening history, look for connections between preferences, and even consider seasonal changes.

However, this task is not so simple. One thing is my chat with a neural network — everything is individual and limited only to me. But recommendation systems of large services work with millions of contents (music, movies, books), which need to be properly distributed among millions of users. Moreover, new releases appear almost daily, so these systems require continuous training and updates. This is often a source of difficulties for developers.

Recent work by researchers from a university in Amsterdam addresses exactly this problem — creating methods to accelerate the training of such systems by dozens of times. The object of study was the SEATER model — a recommendation system proposed in 2024 by Chinese specialists. It is universal: suitable not only for music but also for online shopping or entertainment. Unlike traditional methods that iterate over all objects one by one, SEATER uses a hierarchical catalog structure similar to folders on a computer: when requesting music in a certain genre or mood, the system immediately accesses the relevant sections instead of searching through the entire list.

This approach speeds up search and improves recommendation accuracy. But researchers faced a new challenge — before each retraining of the folder tree, it needs to be rebuilt from scratch. They proposed two solutions: one — the fastest and simplest way to distribute objects into folders without much tuning; the other — a more precise method that combines speed with internal refinement of groups within the tree. These algorithms were tested on various datasets: user reviews from Yelp about businesses, book recommendations from Amazon, news clicks from Microsoft. On small datasets, no significant difference in time was observed — the effect was noticeable but not critical.

Why is that? The scale of data plays a key role. Companies with large volumes of user behavior data often do not share it publicly. Without access to massive datasets, it’s harder to implement innovative methods and observe their effects.

However, one very important dataset has emerged — Yambda from a major tech company, published as open-source last year. It contains over 5 billion anonymized events based on user activity on a music streaming platform. Using this volume allowed reducing data preparation time from 82 minutes to 83 seconds — nearly 60 times faster! At the same time, the quality of recommendations remained almost unchanged.

As a result: developers can now choose between ultra-fast processing of large catalogs or balancing speed and accuracy depending on the task. Users receive more relevant recommendations more often. This example clearly shows that the AI field needs knowledge sharing and collaboration — sharing discoveries can lead to new ideas for improvements or quick implementation of proven solutions.

A full description of work based on the SEATER model is available on arXiv, and open datasets like Yambda are accessible on platforms like Hugging Face for the wider community.

Created with n8n:
https://cutt.ly/n8n

Created with syllaby:
https://cutt.ly/syllaby