Efficient Video Transformers via Spatial-temporal Token Merging for Action Recognition

Zhanzhou Feng, Jiaming Xu, Lei Ma 0008, Shiliang Zhang. Efficient Video Transformers via Spatial-temporal Token Merging for Action Recognition. TOMCCAP, 20(4), April 2024. [doi]

Possibly Related Publications

The following publications are possibly variants of this publication: