TY - GEN
T1 - Synerise Monad
T2 - 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023
AU - Rychalska, Barbara
AU - Łukasik, Szymon
AU - Dąbrowski, Jacek
N1 - Publisher Copyright:
© 2023 Copyright held by the owner/author(s).
PY - 2023/7/19
Y1 - 2023/7/19
N2 - The complexity of industry-grade event-based datalakes grows dynamically each passing hour. Companies actively gather behavioral information on their customers, recording multiple types of events, such as clicks, likes, page views, card transactions, add-to-basket, or purchase events. In response to this, the Synerise Monad platform has been proposed. The primary focus of Monad is to produce Universal Behavioral Representations (UBRs) - large vectors encapsulating the behavioral patterns of each user. UBRs do not lose knowledge about individual events, in contrast to aggregated features or averaged embeddings. They are based on award-winning algorithms developed at Synerise - Cleora and EMDE - and allow to process real-life datasets composed of billions of events in record time. In this paper, we introduce a new aspect of Monad: private foundation models for behavioral data, trained on top of UBRs. The foundation models are trained in purely self-supervised manner and allow to exploit general knowledge about human behavior, which proves especially useful when multiple downstream models must be trained and time constraints are tight, or when labeled data is scarce. Experimental results show that the Monad foundation models can reduce training time by half and require 3x less data to reach optimal results, often achieving state-of-the-art results.
AB - The complexity of industry-grade event-based datalakes grows dynamically each passing hour. Companies actively gather behavioral information on their customers, recording multiple types of events, such as clicks, likes, page views, card transactions, add-to-basket, or purchase events. In response to this, the Synerise Monad platform has been proposed. The primary focus of Monad is to produce Universal Behavioral Representations (UBRs) - large vectors encapsulating the behavioral patterns of each user. UBRs do not lose knowledge about individual events, in contrast to aggregated features or averaged embeddings. They are based on award-winning algorithms developed at Synerise - Cleora and EMDE - and allow to process real-life datasets composed of billions of events in record time. In this paper, we introduce a new aspect of Monad: private foundation models for behavioral data, trained on top of UBRs. The foundation models are trained in purely self-supervised manner and allow to exploit general knowledge about human behavior, which proves especially useful when multiple downstream models must be trained and time constraints are tight, or when labeled data is scarce. Experimental results show that the Monad foundation models can reduce training time by half and require 3x less data to reach optimal results, often achieving state-of-the-art results.
KW - behavioral modeling
KW - big data
KW - foundation models
KW - graph learning
KW - representation learning
UR - http://www.scopus.com/inward/record.url?scp=85168697531&partnerID=8YFLogxK
U2 - 10.1145/3539618.3591851
DO - 10.1145/3539618.3591851
M3 - Conference contribution
AN - SCOPUS:85168697531
T3 - SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 3344
EP - 3348
BT - SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
Y2 - 23 July 2023 through 27 July 2023
ER -