Abstract is missing.
- GenomicsBench: A Benchmark Suite for GenomicsArun Subramaniyan 0001, Yufeng Gu, Timothy Dunn, Somnath Paul, Md. Vasimuddin, Sanchit Misra, David T. Blaauw, Satish Narayanasamy, Reetuparna Das. 1-12 [doi]
- GNNMark: A Benchmark Suite to Characterize Graph Neural Network Training on GPUsTrinayan Baruah, Kaustubh Shivdikar, Shi Dong, Yifan Sun, Saiful A. Mojumder, Kihoon Jung, José L. Abellán, Yash Ukidave, Ajay Joshi, John Kim, David R. Kaeli. 13-23 [doi]
- AIBench Training: Balanced Industry-Standard AI Training BenchmarkingFei Tang, Wanling Gao, Jianfeng Zhan, Chuanxin Lan, Xu Wen, Lei Wang, Chunjie Luo, Zheng Cao, Xingwang Xiong, Zihan Jiang, Tianshu Hao, Fanda Fan, Fan Zhang, Yunyou Huang, Jianan Chen 0003, Mengjia Du, Rui Ren, Chen Zheng, Daoyi Zheng, Haoning Tang, Kunlin Zhan, Biao Wang, Defei Kong, Minghe Yu, Chongkang Tan, Huan Li, Xinhui Tian, Yatao Li, Junchao Shao, Zhenyu Wang, Xiaoyu Wang, Jiahui Dai, Hainan Ye. 24-35 [doi]
- CoCoPeLia: Communication-Computation Overlap Prediction for Efficient Linear Algebra on GPUsPetros Anastasiadis, Nikela Papadopoulou, Georgios I. Goumas, Nectarios Koziris. 36-47 [doi]
- Learning Sparse Matrix Row Permutations for Efficient SpMM on GPU ArchitecturesAtefeh Mehrabi, Donghyuk Lee, Niladrish Chatterjee, Daniel J. Sorin, Benjamin C. Lee, Mike O'Connor. 48-58 [doi]
- Analyzing Secure Memory Architecture for GPUsShougang Yuan, Ardhi Wiratama Baskara Yudha, Yan Solihin, Huiyang Zhou. 59-69 [doi]
- MicroGrad: A Centralized Framework for Workload Cloning and Stress TestingGokul Subramanian Ravi, Ramon Bertran, Pradip Bose, Mikko H. Lipasti. 70-72 [doi]
- ViStA: Video Streaming and Analytics BenchmarkNavneet Raju, Rahul M. Koushik, Hari Om, Subramaniam Kalambur. 73-75 [doi]
- Analysis of Factors Affecting Power Consumption and Energy Efficiency of SGEMM on the Low-Power Myriad-2 VPUSuyash Bakshi, Lennart Johnsson. 76-78 [doi]
- A Defense-Inspired Benchmark SuitePete Ehrett, Nathan Block, Bing Schaefer, Adrian Berding, John Paul Koenig, Pranav Srinivasan, Valeria Bertacco, Todd M. Austin. 79-80 [doi]
- An Automated Traffic Generation Framework for Performance Evaluation of Networks-on-Chip for Real World Use CasesSri Harsha Gade, Anup Gangwar, Ambica Prasad, Nitin Kumar Agarwal, Ravishankar Sreedharan. 81-83 [doi]
- How Do Graph Relabeling Algorithms Improve Memory Locality?Mohsen Koohi Esfahani, Peter Kilpatrick, Hans Vandierendonck. 84-86 [doi]
- Designing GPU Architecture for Memory Bandwidth ReservationEmir C. Marangoz, Kyoung-Don Kang, Seunghee Shin. 87-89 [doi]
- Reducing BERT Computation by Padding Removal and Curriculum LearningWei Zhang, Wei Wei, Wen Wang, Lingling Jin, Zheng Cao. 90-92 [doi]
- Efficient Split Counter Mode Encryption for NVMQi Pei, Seunghee Shin. 93-95 [doi]
- AI Tax in Mobile SoCs: End-to-end Performance Analysis of Machine Learning in SmartphonesMichael Buch, Zahra Azad, Ajay Joshi, Vijay Janapa Reddi. 96-106 [doi]
- Performance Characterization of .NET BenchmarksAniket Deshmukh, Ruihao Li, Rathijit Sen, Robert R. Henry, Monica Beckwith, Gagan Gupta. 107-117 [doi]
- Performance Analysis of Graph Neural Network FrameworksJunwei Wu, Jingwei Sun, Hao Sun, Guangzhong Sun. 118-127 [doi]
- Loopapalooza: Investigating Limits of Loop-Level Parallelism with a Compiler-Driven ApproachAli Mustafa Zaidi, Konstantinos Iordanou, Mikel Luján, Giacomo Gabrielli. 128-138 [doi]
- Real-Time Characterization of Data Access CorrelationsBryan Harris, Michael Marzullo, Nihat Altiparmak. 139-150 [doi]
- Comparative Code Structure Analysis using Deep Learning for Performance PredictionTarek Ramadan, Tanzima Zerin Islam, Chase Phelps, Nathan Pinnow, Jayaraman J. Thiagarajan. 151-161 [doi]
- Understanding Capacity-Driven Scale-Out Neural Recommendation InferenceMichael Lui, Yavuz Yetim, Özgür Özkan, Zhuoran Zhao, Shin-Yeh Tsai, Carole-Jean Wu, Mark Hempstead. 162-171 [doi]
- Re-establishing Fetch-Directed Instruction Prefetching: An Industry PerspectiveYasuo Ishii, Jaekyu Lee, Krishnendra Nathella, Dam Sunwoo. 172-182 [doi]
- Enabling Reproducible and Agile Full-System SimulationBobby R. Bruce, Ayaz Akram, Hoa Nguyen, Kyle Roarty, Mahyar Samani, Marjan Fariborz, Trivikram Reddy, Matthew D. Sinclair, Jason Lowe-Power. 183-193 [doi]
- A Case Against Hardware Managed DRAM Caches for NVRAM Based SystemsMark Hildebrand, Julian T. Angeles, Jason Lowe-Power, Venkatesh Akella. 194-204 [doi]
- Characterizing Massively Parallel PolymorphismMengchi Zhang, Ahmad Alawneh, Timothy G. Rogers. 205-216 [doi]
- Pinpointing the Memory Behaviors of DNN TrainingJiansong Li, Xiao Dong, Guangli Li, Peng Zhao, Xueying Wang, Xiaobing Chen, Xianzhi Yu, Yongxin Yang, Zihan Jiang, Wei Cao, Lei Liu 0030, Xiaobing Feng 0002. 217-219 [doi]
- Thermal-Aware Overclocking for SmartphonesGuru Prasad Srinivasa, David Werner, Mark Hempstead, Geoffrey Challen. 220-222 [doi]
- The Impact of SoC Integration and OS Deployment on the Reliability of Arm ProcessorsPablo Bodmann, George Papadimitriou 0001, Dimitris Gizopoulos, Paolo Rech. 223-225 [doi]
- Memory-Efficient Hardware Performance Counters with Approximate-Counting AlgorithmsJingyi Xu, Sehoon Kim, Borivoje Nikolic, Yakun Sophia Shao. 226-228 [doi]
- Architecture-Level Energy Estimation for Heterogeneous Computing SystemsFrancis Wang, Yannan Nellie Wu, Matthew Woicik, Joel S. Emer, Vivienne Sze. 229-231 [doi]
- Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor AcceleratorsYannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, Joel S. Emer. 232-234 [doi]
- Splash-4: Improving Scalability with Lock-Free ConstructsEduardo José Gómez-Hernández, Ruixiang Shao, Christos Sakalis, Stefanos Kaxiras, Alberto Ros. 235-236 [doi]
- Accelerating Fully Homomorphic Encryption Through Microarchitecture-Aware Analysis and OptimizationWonkyung Jung, Eojin Lee, Sangpyo Kim, Namhoon Kim, Keewoo Lee, Chohong Min, Jung Hee Cheon, Jung Ho Ahn. 237-239 [doi]
- Efficient Management of Scratch-Pad Memories in Deep Learning AcceleratorsSubhankar Pal, Swagath Venkataramani, Viji Srinivasan, Kailash Gopalakrishnan. 240-242 [doi]
- Hardware Acceleration for DBMS Machine Learning Scoring: Is It Worth the Overheads?Zahra Azad, Rathijit Sen, Kwanghyun Park, Ajay Joshi. 243-253 [doi]
- TPUPoint: Automatic Characterization of Hardware-Accelerated Machine-Learning Behavior for Cloud ComputingAbenezer Wudenhe, Hung-Wei Tseng 0001. 254-264 [doi]
- Pitfalls of InfiniBand with On-Demand PagingTakuya Fukuoka, Shigeyuki Sato, Kenjiro Taura. 265-275 [doi]
- Analyzing the Interplay Between Random Shuffling and Storage Devices for Efficient Machine LearningZhi-Lin Ke, Hsiang-Yun Cheng, Chia-Lin Yang, Han-wei Huang. 276-287 [doi]
- E3: A HW/SW Co-design Neuroevolution Platform for Autonomous Learning in Edge DeviceSheng-Chun Kao, Tushar Krishna. 288-298 [doi]
- FireMarshal: Making HW/SW Co-Design Reproducible and ReliableNathan Pemberton, Alon Amid. 299-309 [doi]
- COBRA: A Framework for Evaluating Compositions of Hardware Branch PredictorsJerry Zhao, Abraham Gonzalez, Alon Amid, Sagar Karandikar, Krste Asanovic. 310-320 [doi]