![[beeFormer] Transformer Is Trained By Combining Text Information And Interaction Data In The Recommendation System](https://aisholar.s3.ap-northeast-1.amazonaws.com/media/September2024/beeformer.png) 
  [beeFormer] Transformer Is Trained By Combining Text Information And Interaction Data In The Recommendation System
3 main points
✔️ Propose a new method called beeFormer
✔️ Train a Transformer model by combining textual information and interaction data in a recommendation system
✔️ Enables knowledge transfer between different domains, especially in cold start and zero-shot scenarios
beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems
written by Vojtěch Vančura, Pavel Kordík, Milan Straka
(Submitted on 16 Sep 2024)
Comments: Accepted to RecSys 2024
Subjects: Information Retrieval (cs.IR)
code:

The images used in this article are from the paper, the introductory slides, or were created based on them.
Background
In this paper, a new recommendation system, beeFormer, is proposed.
Traditional recommendation systems mainly use Collaborative Filtering (CF), which makes recommendations based on a user's past interactions (e.g., purchase history), but they face "cold start" and "zero shot" problems that are particularly difficult to apply to new products and users. However, they faced "cold start" and "zero-shot" problems, which were particularly difficult to apply to new products and users.
To compensate for this, approaches have been developed that use textual information (e.g., product descriptions and reviews), but these methods tend to focus only on "semantic similarity" and fail to adequately capture user behavior patterns.
Technique
To solve the above two issues, beeFormer uses text data and interaction data directly to train the Transformer model. beeFormer's training method first converts text data such as product descriptions into vectors using the Transformer model, This is then combined with interaction data to train the model. This approach enhances recommendation accuracy by utilizing not only the semantic similarity between products, but also the interaction data between products actually selected by the user.
In the training process, the Transformer model first encodes the commodity text to generate a vector matrix.

The generated vectors are then trained using a decoder model called "ELSA" in combination with the user interaction matrix. In this process, techniques such as gradient checkpointing and negative sampling are used to reduce memory consumption during training and to train even large data sets efficiently.
The strength of beeFormer is its ability to transfer knowledge between several different datasets. For example, by integrating and training a movie and a book dataset, we expect to build a universal, domain-independent recommendation model. For this reason, models trained with beeFormer perform better than traditional collaborative filtering methods and other text-based methods, especially in new product and zero-shot scenarios.
The introduction of beeFormer will enable more flexible and advanced recommendations that go beyond the limits of conventional recommendation systems. In the future, beeFormer is also expected to be applied to other multimodal fields, such as fashion and image recommendation, and will have a significant impact on the evolution of recommendation systems.
Experiment
In this paper, several experiments are conducted to evaluate the effectiveness of the proposed method, beeFormer.
First, we used three representative datasets, MovieLens20M, Goodbooks-10k, and Amazon Books, to test how beeFormer performs compared to other models. Each of these datasets contains user rating data for movies and books, respectively. Recall@20, Recall@50, and NDCG@100 were used as rating indicators.

In the experiment, the dataset was evaluated in two scenarios, "zero shot" and "cold start," to see if beeFormer has the ability to transfer knowledge in different domains. The zero-shot scenario assumes a situation where the trained models are trained in a different domain than the dataset being evaluated. For example, we are examining whether a model trained on the book dataset can perform on the movie dataset. As a result, beeFormer outperformed other baseline models in the zero-shot scenario.

The cold start scenario, on the other hand, evaluates performance in situations involving unseen items and new users. beeFormer leverages text-based information to improve recommendation accuracy for new items. In this experiment, beeFormer outperformed both the traditional Heater model and the best performing Sentence Transformer model. In particular, beeFormer maintained high performance in knowledge transfer across different datasets, significantly outperforming traditional methods.

In addition, a "time-split" experiment was conducted to evaluate data split in time order, with beeFormer outperforming collaborative filtering (CF) methods, particularly on the Amazon Books dataset. This experiment evaluated how accurately a model trained on historical data could predict the most recent interactions. The results confirm that beeFormer goes beyond mere textual similarity and is able to make recommendations based on actual user behavior patterns.

Overall, these experimental results demonstrate beeFormer's strong knowledge transfer capability and high recommendation accuracy, and prove to be a promising step toward achieving a universal recommendation system across different domains.
Summary
The conclusion of this paper shows that the proposed beeFormer methodology represents a significant advance over traditional recommendation systems. beeFormer combines textual information with user interaction data to train a Transformer model, which has the advantage of learning not only semantic similarities as well as hidden patterns of user behavior. This has allowed beeFormer to outperform existing methods, especially in "cold start" and "zero-shot" scenarios.
In conclusion, the implementation of beeFormer could play an important role in the design of future recommendation systems. In particular, the ability to provide consistently high performance in scenarios spanning different data sources and multiple domains is highlighted. Furthermore, beeFormer's high compatibility with existing tools and ease of implementation into real-world operations make its application in real business environments a promising prospect.
This method could be applied to other areas in the future, such as fashion and image recommendation, and shows new possibilities for the evolution of recommendation systems. beeFormer's future development includes training on larger multi-domain datasets and the development of multimodal recommendation models that integrate visual information. and multimodal recommendation models that integrate visual information. If these efforts are successful, it is expected that even higher-performance and more flexible recommendation systems will become a reality.
Categories related to this article





 
   ![Libra] A New Multimo](https://aisholar.s3.ap-northeast-1.amazonaws.com/media/February2025/libra-520x300.png) 
  
  
  
  
 