TY - GEN
T1 - On the Unreasonable Effectiveness of Centroids in Image Retrieval
AU - Wieczorek, Mikołaj
AU - Rychalska, Barbara
AU - Dąbrowski, Jacek
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Image retrieval task consists of finding similar images to a query image from a set of gallery (database) images. Such systems are used in various applications e.g. person re-identification (ReID) or visual product search. Despite active development of retrieval models it still remains a challenging task mainly due to large intra-class variance caused by changes in view angle, lighting, background clutter or occlusion, while inter-class variance may be relatively low. A large portion of current research focuses on creating more robust features and modifying objective functions, usually based on Triplet Loss. Some works experiment with using centroid/proxy representation of a class to alleviate problems with computing speed and hard samples mining used with Triplet Loss. However, these approaches are used for training alone and discarded during the retrieval stage. In this paper we propose to use the mean centroid representation both during training and retrieval. Such an aggregated representation is more robust to outliers and assures more stable features. As each class is represented by a single embedding - the class centroid - both retrieval time and storage requirements are reduced significantly. Aggregating multiple embeddings results in a significant reduction of the search space due to lowering the number of candidate target vectors, which makes the method especially suitable for production deployments. Comprehensive experiments conducted on two ReID and Fashion Retrieval datasets demonstrate effectiveness of our method, which outperforms the current state-of-the-art. We propose centroid training and retrieval as a viable method for both Fashion Retrieval and ReID applications. Our code is available at https://github.com/mikwieczorek/centroids-reid.
AB - Image retrieval task consists of finding similar images to a query image from a set of gallery (database) images. Such systems are used in various applications e.g. person re-identification (ReID) or visual product search. Despite active development of retrieval models it still remains a challenging task mainly due to large intra-class variance caused by changes in view angle, lighting, background clutter or occlusion, while inter-class variance may be relatively low. A large portion of current research focuses on creating more robust features and modifying objective functions, usually based on Triplet Loss. Some works experiment with using centroid/proxy representation of a class to alleviate problems with computing speed and hard samples mining used with Triplet Loss. However, these approaches are used for training alone and discarded during the retrieval stage. In this paper we propose to use the mean centroid representation both during training and retrieval. Such an aggregated representation is more robust to outliers and assures more stable features. As each class is represented by a single embedding - the class centroid - both retrieval time and storage requirements are reduced significantly. Aggregating multiple embeddings results in a significant reduction of the search space due to lowering the number of candidate target vectors, which makes the method especially suitable for production deployments. Comprehensive experiments conducted on two ReID and Fashion Retrieval datasets demonstrate effectiveness of our method, which outperforms the current state-of-the-art. We propose centroid training and retrieval as a viable method for both Fashion Retrieval and ReID applications. Our code is available at https://github.com/mikwieczorek/centroids-reid.
KW - Centroid triplet loss
KW - Clothes retrieval
KW - Deep learning in fashion
KW - Fashion retrieval
KW - Person re-identification
UR - http://www.scopus.com/inward/record.url?scp=85121930766&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-92273-3_18
DO - 10.1007/978-3-030-92273-3_18
M3 - Conference contribution
AN - SCOPUS:85121930766
SN - 9783030922726
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 212
EP - 223
BT - Neural Information Processing - 28th International Conference, ICONIP 2021, Proceedings
A2 - Mantoro, Teddy
A2 - Lee, Minho
A2 - Ayu, Media Anugerah
A2 - Wong, Kok Wai
A2 - Hidayanto, Achmad Nizar
PB - Springer Science and Business Media Deutschland GmbH
T2 - 28th International Conference on Neural Information Processing, ICONIP 2021
Y2 - 8 December 2021 through 12 December 2021
ER -