Efficient Video Transformers via Spatial-temporal Token Merging for Action Recognition

Zhanzhou Feng, Jiaming Xu, Lei Ma 0008, Shiliang Zhang. Efficient Video Transformers via Spatial-temporal Token Merging for Action Recognition. TOMCCAP, 20(4), April 2024. [doi]

Abstract

Abstract is missing.