Scalable Channel Mixer for Vision Transformers

Every project has a beautiful feature showcase page. It’s easy to include images in a flexible 3-column grid format. Make your photos 1/3, 2/3, or full width.

Caption photos easily. On the left, a road goes through a tunnel. Middle, leaves artistically fall in a hipster photoshoot. Right, in another hipster photoshoot, a lumberjack grasps a handful of pine needles.

Proposed SCHEME channel mixer. The channel mixer of the standard transformer consists of two MLP layers, performing dimensionality expansion and reduction by a factor of $E$. SCHEME uses a combination of a block diagonal MLP (BD-MLP), which reduces the complexity of the MLP layers by using block diagonal weights, and a channel covariance attention (CCA) mechanism that enables communication across feature groups through feature-based attention. This, however, is only needed for training. The weights (1-alpha) decay to zero upon training convergence and CCA can be discarded during inference, as shown on the right. Experiments show that CCA helps learn better feature clusters, but is not needed once these are formed.

You can also put regular text between your rows of images. Say you wanted to write a little bit about your project before you posted the rest of the images. You describe how you toiled, sweated, bled for your project, and then… you reveal its glory in the next row of images.

You can also have artistically styled 2/3 + 1/3 images, like these.

SCHEME: Scalable Channel Mixer for Vision Transformers

The code is simple. Just wrap your images with <div class="col-sm"> and place them inside <div class="row"> (read more about the Bootstrap Grid system). To make images responsive, add img-fluid class to each; for rounded corners and shadows use rounded and z-depth-1 classes. Here’s the code for the last row of images above:

<div class="row justify-content-sm-center">
    <div class="col-sm-8 mt-3 mt-md-0">
        {% include figure.html path="assets/img/6.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
    </div>
    <div class="col-sm-4 mt-3 mt-md-0">
        {% include figure.html path="assets/img/11.jpg" title="example image" class="img-fluid rounded z-depth-1" %}
    </div>
</div>