HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification

Shuyi Ouyang, Hongyi Wang, Ziwei Niu, Zhenjia Bai, Shiao Xie, Yingying Xu, Ruofeng Tong 0001, Yen-Wei Chen 0001, Lanfen Lin. HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification. In Abdulmotaleb El-Saddik, Tao Mei, Rita Cucchiara, Marco Bertini 0001, Diana Patricia Tobon Vallejo, Pradeep K. Atrey, M. Shamim Hossain, editors, Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023. pages 4768-4777, ACM, 2023. [doi]

@inproceedings{OuyangWNBXX00L23,
  title = {HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification},
  author = {Shuyi Ouyang and Hongyi Wang and Ziwei Niu and Zhenjia Bai and Shiao Xie and Yingying Xu and Ruofeng Tong 0001 and Yen-Wei Chen 0001 and Lanfen Lin},
  year = {2023},
  doi = {10.1145/3581783.3612159},
  url = {https://doi.org/10.1145/3581783.3612159},
  researchr = {https://researchr.org/publication/OuyangWNBXX00L23},
  cites = {0},
  citedby = {0},
  pages = {4768-4777},
  booktitle = {Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023},
  editor = {Abdulmotaleb El-Saddik and Tao Mei and Rita Cucchiara and Marco Bertini 0001 and Diana Patricia Tobon Vallejo and Pradeep K. Atrey and M. Shamim Hossain},
  publisher = {ACM},
}