Abstract is missing.
- Optical character recognition with transformers and CTCIsrael Campiotti, Roberto de Alencar Lotufo. [doi]
- Scholarly big data quality assessment: a case study of document linking and conflation with S2ORCJian Wu 0006, Ryan Hiltabrand, Dominik Soós, C. Lee Giles. [doi]
- How did dennis ritchie produce his PhD thesis?: a typographical mysteryDavid F. Brailsford, Brian W. Kernighan, William A. Ritchie. [doi]
- SeNMFk-SPLIT: large corpora topic modeling by semantic non-negative matrix factorization with automatic model selectionMaksim Ekin Eren, Nick Solovyev, Manish Bhattarai, Kim Ø. Rasmussen, Charles Nicholas, Boian S. Alexandrov. [doi]
- Modifying PDF sewing patterns for use with projectorsCharlotte Curtis. [doi]
- Optical character recognition guided image super resolutionPhilipp Hildebrandt, Maximilian Schulze, Sarel Cohen, Vanja Doskoc, Raid Saabni, Tobias Friedrich 0001. [doi]
- Chinese public procurement document harvesting pipelineDanrun Cao, Oussama Ahmia, Nicolas Béchet, Pierre-François Marteau. [doi]
- Tab this folder of documents: page stream segmentation of business documentsThisanaporn Mungmeeprued, Yuxin Ma, Nisarg Mehta, Aldo Lipani. [doi]
- A cascaded approach for page-object detection in scientific papersErika Spiteri Bailey, Alexandra Bonnici, Stefania Cristina. [doi]
- Long-term lifecycle-related management of digital building documents: towards a holistic and standard-based concept for a technical and organizational solution in building authoritiesUwe M. Borghoff, Eberhard Pfeiffer, Peter Rödig. [doi]
- Anonymizing and obfuscating PDF content while preserving document structureCharlotte Curtis. [doi]
- Binarization of photographed documents image quality, processing time and size assessmentRafael Dueire Lins, Rodrigo Barros Bernardino, Ricardo da Silva Barboza, Steven J. Simske. [doi]
- Detecting malware using text documents extracted from spam email through machine learningLuis Ángel Redondo-Gutierrez, Francisco Jáñez-Martino, Eduardo Fidalgo, Enrique Alegre, Víctor González-Castro, Rocío Alaíz-Rodríguez. [doi]
- From print to online newspapers on small displays: a layout generation approach aimed at preserving entry pointsSebastián Gallardo Díaz, Dorian Mazauric, Pierre Kornprobst. [doi]
- Academic writing and publishing beyond documentsCerstin Mahlow, Michael Piotrowski. [doi]
- Downstream transformer generation of question-answer pairs with preprocessing and postprocessing pipelinesCheng Zhang 0016, Hao Zhang, Yicheng Sun, Jie Wang. [doi]
- Graphical document representation for french newsletters analysisAlexis Blandin, Farida Saïd, Jeanne Villaneau, Pierre-François Marteau. [doi]
- Theory entity extraction for social and behavioral sciences papers using distant supervisionXin Wei, Lamia Salsabil, Jian Wu 0006. [doi]
- Triplet transformer network for multi-label document classificationJohannes Werner Melsbach, Sven Stahlmann, Stefan Hirschmeier, Detlef Schoder. [doi]