BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

1The University of Hong Kong   2Hong Kong University of Science and Technology
3Intellifusion   4The Chinese University of Hong Kong
*Project lead     †Corresponding authors
ConceptExpress teaser.

TL;DR: We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities.

BiGR can perform visual generation, discrimination, editting and more.

Advantages of BiGR

  • Uniformity: BiGR is the first conditional image generation model that unifies generative and discriminative tasks within the same model. By modeling compact binary latent codes, BiGR delivers strong performance in both tasks compared to existing models.
  • Efficiency: BiGR generates images at a low time cost, attributed to the small number of sampling steps required in the iterative unmasking process, while still maintaining high generation quality.
  • Flexibility: BiGR can be flexibly employed for various vision applications, such as inpainting, outpainting, editing, interpolation, and enrichment in a zero-shot manner, without the need for task-specific structural changes or parameter fine-tuning.
  • Scalability: BiGR demonstrates scalability in both generative and discriminative tasks, as evidenced by comprehensive evaluations of generation quality and linear-probe performance.

Method


BiGR is built upon Llama backbone, incorporating mask-token prediction and binary transcoder. BiGR is trained with a weighted binary cross-entropy (wBCE) loss for reconstructing masked tokens. For image generation, we design entropy-order sampling. For visual representation, we simply apply average pooling in the intermediate layers.

Results


Quantitative Comparison



Image Generation



Zero-shot Generalized Applications


BiGR supports diverse zero-shot applications, without requiring task-specific structural changes or parameter fine-tuning.


Try out BiGR yourself at Colab!

BibTeX

If you find this project useful for your research, please cite the following:

@misc{hao2024bigr,
    title={Bi{GR}: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities}, 
    author={Shaozhe Hao and Xuantong Liu and Xianbiao Qi and Shihao Zhao and Bojia Zi and Rong Xiao and Kai Han and Kwan-Yee~K. Wong},
    year={2024},
}