Disentangling Neuron Representations with Concept Vectors

Laura O'Mahony, Vincent Andrearczyk, Henning Müller, Mara Graziani

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    Mechanistic interpretability aims to understand how models store representations by breaking down neural networks into interpretable units. However, the occurrence of polysemantic neurons, or neurons that respond to multiple unrelated features, makes interpreting individual neurons challenging. This has led to the search for meaningful vectors, known as concept vectors, in activation space instead of individual neurons. The main contribution of this paper is a method to disentangle polysemantic neurons into concept vectors encapsulating distinct features. Our method can search for fine-grained concepts according to the user's desired level of concept separation. The analysis shows that polysemantic neurons can be disentangled into directions consisting of linear combinations of neurons. Our evaluations show that the concept vectors found encode coherent, human-understandable features.

    Original languageEnglish
    Title of host publicationProceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023
    PublisherIEEE Computer Society
    Pages3770-3775
    Number of pages6
    ISBN (Electronic)9798350302493
    DOIs
    Publication statusPublished - 2023
    Event2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023 - Vancouver, Canada
    Duration: 18 Jun 202322 Jun 2023

    Publication series

    NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
    Volume2023-June
    ISSN (Print)2160-7508
    ISSN (Electronic)2160-7516

    Conference

    Conference2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023
    Country/TerritoryCanada
    CityVancouver
    Period18/06/2322/06/23

    Fingerprint

    Dive into the research topics of 'Disentangling Neuron Representations with Concept Vectors'. Together they form a unique fingerprint.

    Cite this