A new approach to content-based file type detection

Mehdi Chehel Amirani, Mohsen Toorani, Ali Asghar Beheshti Shirazi. A new approach to content-based file type detection. In Proceedings of the 13th IEEE Symposium on Computers and Communications (ISCC 2008), July 6-9, Marrakech, Morocco. pages 1103-1108, IEEE, 2008. [doi]

Abstract

File type identification and file type clustering may be difficult tasks that have an increasingly importance in the field of computer and network security. Classical methods of file type detection including considering file extensions and magic bytes can be easily spoofed Content-based file type detection is a newer way that is taken into account recently. In this paper, a new content-based method for the purpose of file type detection and file type clustering is proposed that is based on the PCA and neural networks. The proposed method has a good accuracy and is fast enough.