BiGR

Advantages of BiGR

Uniformity: BiGR is the first conditional image generation model that unifies generative and discriminative tasks within the same model. By modeling compact binary latent codes, BiGR delivers strong performance in both tasks compared to existing models.
Efficiency: BiGR generates images at a low time cost, attributed to the small number of sampling steps required in the iterative unmasking process, while still maintaining high generation quality.
Flexibility: BiGR can be flexibly employed for various vision applications, such as inpainting, outpainting, editing, interpolation, and enrichment in a zero-shot manner, without the need for task-specific structural changes or parameter fine-tuning.
Scalability: BiGR demonstrates scalability in both generative and discriminative tasks, as evidenced by comprehensive evaluations of generation quality and linear-probe performance.

Method

BiGR is built upon Llama backbone, incorporating mask-token prediction and binary transcoder. BiGR is trained with a weighted binary cross-entropy (wBCE) loss for reconstructing masked tokens. For image generation, we design entropy-order sampling. For visual representation, we simply apply average pooling in the intermediate layers.

Results

Quantitative Comparison

Image Generation

Zero-shot Generalized Applications

BiGR supports diverse zero-shot applications, without requiring task-specific structural changes or parameter fine-tuning.

Try out BiGR yourself at Colab!

BibTeX

If you find this project useful for your research, please cite the following:

@misc{hao2024bigr,
    title={Bi{GR}: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities}, 
    author={Shaozhe Hao and Xuantong Liu and Xianbiao Qi and Shihao Zhao and Bojia Zi and Rong Xiao and Kai Han and Kwan-Yee~K. Wong},
    year={2024},
}

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

TL;DR: We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities.