Abstract is missing.
- TaskStream: accelerating task-parallel workloads by recovering program structureVidushi Dadu, Tony Nowatzki. 1-13 [doi]
- DOTA: detect and omit weak attentions for scalable transformer accelerationZheng Qu, Liu Liu 0017, Fengbin Tu, Zhaodong Chen, Yufei Ding, Yuan Xie 0001. 14-26 [doi]
- A full-stack search technique for domain optimized deep learning acceleratorsDan Zhang, Safeen Huda, Ebrahim M. Songhori, Kartik Prabhu, Quoc V. Le, Anna Goldie, Azalia Mirhoseini. 27-42 [doi]
- FINGERS: exploiting fine-grained parallelism in graph mining acceleratorsQihang Chen, Boyu Tian, Mingyu Gao. 43-55 [doi]
- BiSon-e: a lightweight and high-performance accelerator for narrow integer linear algebra computing on the edgeEnrico Reggiani, Cristóbal Ramírez Lazo, Roger Figueras Bagué, Adrián Cristal, Mauro Olivieri, Osman Sabri Unsal. 56-69 [doi]
- Software-defined address mapping: a case on 3D memoryJialiang Zhang, Michael Swift, Jing Jane Li. 70-83 [doi]
- Parallel virtualized memory translation with nested elastic cuckoo page tablesJovan Stojkovic, Dimitrios Skarlatos 0002, Apostolos Kokolis, Tianyin Xu, Josep Torrellas. 84-97 [doi]
- CARAT CAKE: replacing paging via compiler/kernel cooperationBrian Suchy, Souradip Ghosh, Drew Kersnar, Siyuan Chai, Zhen Huang, Aaron Nelson, Michael Cuevas, Alex Bernat, Gaurav Chaudhary, Nikos Hardavellas, Simone Campanoni, Peter A. Dinda. 98-114 [doi]
- NVAlloc: rethinking heap metadata management in persistent memory allocatorsZheng Dang, Shuibing He, Peiyi Hong, Zhenxin Li, Xuechen Zhang, Xian-He Sun, Gang Chen. 115-127 [doi]
- Every walk's a hit: making page walks single-access cache hitsChang Hyun Park 0001, Ilias Vougioukas, Andreas Sandberg, David Black-Schaffer. 128-141 [doi]
- GPM: leveraging persistent memory from a GPUShweta Pandey, Aditya K. Kamath, Arkaprava Basu. 142-156 [doi]
- GPUReplay: a 50-KB GPU stack for client MLHeejin Park, Felix Xiaozhu Lin. 157-170 [doi]
- ValueExpert: exploring value patterns in GPU-accelerated applicationsKeren Zhou, Yueming Hao, John M. Mellor-Crummey, Xiaozhu Meng, Xu Liu 0001. 171-185 [doi]
- SparseCore: stream ISA and processor specialization for sparse computationGengyu Rao, Jingji Chen, Jason Yik, Xuehai Qian. 186-199 [doi]
- JSONSki: streaming semi-structured data with bit-parallel fast-forwardingLin Jiang, Zhijia Zhao 0001. 200-211 [doi]
- MineSweeper: a "clean sweep" for drop-in use-after-free preventionMárton Erdos, Sam Ainsworth, Timothy M. Jones 0001. 212-225 [doi]
- Revizor: testing black-box CPUs against speculation contractsOleksii Oleksenko, Christof Fetzer, Boris Köpf, Mark Silberstein. 226-239 [doi]
- Protecting adaptive sampling from information leakage on low-power sensorsTejas Kannan, Henry Hoffmann. 240-254 [doi]
- One size does not fit all: security hardening of MIPS embedded systems via static binary debloating for shared librariesHaotian Zhang, Mengfei Ren, Yu Lei 0001, Jiang Ming 0002. 255-270 [doi]
- ViK: practical mitigation of temporal memory safety violations through object ID inspectionHaehyun Cho, Jinbum Park, Adam Oest, Tiffany Bao, Ruoyu Wang 0001, Yan Shoshitaishvili, Adam Doupé, Gail-Joon Ahn. 271-284 [doi]
- Eavesdropping user credentials via GPU side channels on smartphonesBoyuan Yang, Ruirong Chen, Kai Huang, Jun Yang, Wei Gao. 285-299 [doi]
- CRISP: critical slice prefetchingHeiner Litz, Grant Ayers, Parthasarathy Ranganathan. 300-313 [doi]
- Pinned loads: taming speculative loads in secure processorsZirui Neil Zhao, Houxiang Ji, Adam Morrison 0001, Darko Marinov, Josep Torrellas. 314-328 [doi]
- DAGguise: mitigating memory timing side channelsPeter W. Deutsch, Yuheng Yang, Thomas Bourgeat, Jules Drean, Joel S. Emer, Mengjia Yan. 329-343 [doi]
- RecShard: statistical feature-based memory optimization for industry-scale neural recommendationGeet Sethi, Bilge Acun, Niket Agarwal, Christos Kozyrakis, Caroline Trippel, Carole-Jean Wu. 344-358 [doi]
- AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architecturesZhen Zheng, Xuanda Yang, Pengzhan Zhao, Guoping Long, Kai Zhu, Feiwen Zhu, Wenyi Zhao, Xiaoyong Liu, Jun Yang, Jidong Zhai, Shuaiwen Leon Song, Wei Lin 0016. 359-373 [doi]
- NASPipe: high performance and reproducible pipeline parallel supernet training via causal synchronous parallelismShixiong Zhao, Fanxin Li, Xusheng Chen, Tianxiang Shen, Li Chen, Sen Wang, Nicholas Zhang, Cheng Li, Heming Cui. 374-387 [doi]
- VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and schedulingZihan Liu, Jingwen Leng, Zhihui Zhang, Quan Chen, Chao Li, Minyi Guo. 388-401 [doi]
- Breaking the computation and communication abstraction barrier in distributed machine learning workloadsAbhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Saarikivi. 402-416 [doi]
- Clio: a hardware-software co-designed disaggregated memory systemZhiyuan Guo, Yizhou Shan, Xuhao Luo, Yutong Huang, Yiying Zhang. 417-433 [doi]
- Enzian: an open, general, CPU/FPGA platform for systems software researchDavid Cock, Abishek Ramdas, Daniel Schwyn, Michael Giardino, Adam Turowski, Zhenhao He, Nora Hossle, Dario Korolija, Melissa Licciardello, Kristina Martsenko, Reto Achermann, Gustavo Alonso, Timothy Roscoe. 434-451 [doi]
- Efficient and scalable core multiplexing with M³vNils Asmussen, Sebastian Haas, Carsten Weinhold, Till Miemietz, Michael Roitzsch. 452-466 [doi]
- FlexOS: towards flexible OS isolationHugo Lefeuvre, Vlad-Andrei Badoiu, Alexander Jung, Stefan Lucian Teodorescu, Sebastian Rauch, Felipe Huici, Costin Raiciu, Pierre Olivier. 467-482 [doi]
- Adelie: continuous address space layout re-randomization for Linux driversRuslan Nikolaev 0001, Hassan Nadeem, Cathlyn Stone, Binoy Ravindran. 483-498 [doi]
- Suppressing ZZ crosstalk of Quantum computers through pulse and scheduling co-optimizationLei Xie, Jidong Zhai, Zhenxing Zhang, Jonathan Allcock, Shengyu Zhang, Yicong Zheng. 499-513 [doi]
- QUEST: systematically approximating Quantum circuits for higher output fidelityTirthak Patel, Ed Younis, Costin Iancu, Wibe De Jong, Devesh Tiwari. 514-528 [doi]
- HAMMER: boosting fidelity of noisy Quantum circuits by exploiting Hamming behavior of erroneous outcomesSwamit S. Tannu, Poulami Das 0005, Ramin Ayanzadeh, Moinuddin K. Qureshi. 529-540 [doi]
- LILLIPUT: a lightweight low-latency lookup-table decoder for near-term Quantum error correctionPoulami Das 0005, Aditya Locharla, Cody Jones. 541-553 [doi]
- Paulihedral: a generalized block-wise compiler optimization framework for Quantum simulation kernelsGushu Li, Anbang Wu, Yunong Shi, Ali Javadi-Abhari, Yufei Ding, Yuan Xie. 554-569 [doi]
- Astraea: towards QoS-aware and resource-efficient multi-stage GPU servicesWei Zhang 0149, Quan Chen 0002, Kaihua Fu, Ningxin Zheng, Zhiyi Huang 0001, Jingwen Leng, Minyi Guo. 570-582 [doi]
- Memory-harvesting VMs in cloud platformsAlexander Fuerst, Stanko Novakovic, Iñigo Goiri, Gohar Irfan Chaudhry, Prateek Sharma, Kapil Arya, Kevin Broas, Eugene Bak, Mehmet Iyigun, Ricardo Bianchini. 583-594 [doi]
- IOCost: block IO control for containers in datacentersTejun Heo, Dan Schatzberg, Andrew Newell, Song Liu, Saravanan Dhakshinamurthy, Iyswarya Narayanan, Josef Bacik, Chris Mason, Chunqiang Tang, Dimitrios Skarlatos 0002. 595-608 [doi]
- TMO: transparent memory offloading in datacentersJohannes Weiner, Niket Agarwal, Dan Schatzberg, Leon Yang, Hao Wang, Blaise Sanouillet, Bikash Sharma, Tejun Heo, Mayank Jain, Chunqiang Tang, Dimitrios Skarlatos 0002. 609-621 [doi]
- SOL: safe on-node learning in cloud platformsYawen Wang, Daniel Crankshaw, Neeraja J. Yadwadkar, Daniel S. Berger, Christos Kozyrakis, Ricardo Bianchini. 622-634 [doi]
- GenStore: a high-performance in-storage processing system for genome sequence analysisNika Mansouri-Ghiasi, Jisung Park 0001, Harun Mustafa, Jeremie S. Kim, Ataberk Olgun, Arvid Gollwitzer, Damla Senol Cali, Can Firtina, Haiyu Mao, Nour Almadhoun Alserr, Rachata Ausavarungnirun, Nandita Vijaykumar, Mohammed Alser, Onur Mutlu. 635-654 [doi]
- ProSE: the architecture and design of a protein discovery engineEyes Robson, Ceyu Xu, Lisa Wu Wills. 655-668 [doi]
- v ))-cost solution for parallel merge style operations on sorted key-value arraysBangyan Wang, Lei Deng, Fei Sun, Guohao Dai, Liu Liu, Yu Wang, Yuan Xie. 669-682 [doi]
- Client-optimized algorithms and acceleration for encrypted compute offloadingMcKenzie van der Hagen, Brandon Lucia. 683-696 [doi]
- Finding missed optimizations through the lens of dead code eliminationTheodoros Theodoridis, Manuel Rigger, Zhendong Su 0001. 697-709 [doi]
- A tree clock data structure for causal orderings in concurrent executionsUmang Mathur 0001, Andreas Pavlogiannis, Hünkar Can Tunç, Mahesh Viswanathan 0001. 710-725 [doi]
- RSSD: defend against ransomware with hardware-isolated network-storage codesign and post-attack analysisBenjamin Reidys, Peng Liu, Jian Huang 0006. 726-739 [doi]
- Creating concise and efficient dynamic analyses with ALDAXiang Cheng, David Devecsery. 740-752 [doi]
- IceBreaker: warming serverless functions better with heterogeneityRohan Basu Roy, Tirthak Patel, Devesh Tiwari. 753-767 [doi]
- INFless: a native serverless system for low-latency, high-throughput inferenceYanan Yang, Laiping Zhao, Yiming Li, Huanyu Zhang, Jie Li, Mingyang Zhao, Xingzhen Chen, Keqiu Li. 768-781 [doi]
- FaaSFlow: enable efficient workflow execution for function-as-a-serviceZijun Li, Yushi Liu, Linsong Guo, Quan Chen, Jiagan Cheng, Wenli Zheng, Minyi Guo. 782-796 [doi]
- Serverless computing on heterogeneous computersDong Du, Qingyuan Liu, Xueqiang Jiang, Yubin Xia, Binyu Zang, Haibo Chen. 797-813 [doi]
- CoolEdge: hotspot-relievable warm water cooling for energy-efficient edge datacentersQiangyu Pei, Shutong Chen, Qixia Zhang, Xinhui Zhu, Fangming Liu, Ziyang Jia, Yishuo Wang, Yongjie Yuan. 814-829 [doi]
- Yashme: detecting persistency racesHamed Gorjiara, Guoqing Harry Xu, Brian Demsky. 830-845 [doi]
- EXAMINER: automatically locating inconsistent instructions between real devices and CPU emulators for ARMMuhui Jiang, Tianyi Xu, Yajin Zhou, Yufeng Hu, Ming Zhong, Lei Wu, Xiapu Luo, Kui Ren 0001. 846-858 [doi]
- Path-sensitive and alias-aware typestate analysis for detecting OS bugsTuo Li, Jia-Ju Bai, Yulei Sui, Shi-Min Hu 0001. 859-872 [doi]
- Efficiently detecting concurrency bugs in persistent memory programsZhangyu Chen, Yu Hua 0001, Yongle Zhang, Luochangqi Ding. 873-887 [doi]
- Who goes first? detecting go concurrency bugs via message reorderingZiheng Liu, Shihao Xia, Yu Liang, Linhai Song, Hong Hu. 888-902 [doi]
- CryoWire: wire-driven microarchitecture designs for cryogenic computingDongmoon Min, Yujin Chung, Ilkwon Byun, Junpyo Kim, Jangwoo Kim. 903-917 [doi]
- REVAMP: a systematic framework for heterogeneous CGRA realizationThilini Kaushalya Bandara, Dhananjaya Wijerathne, Tulika Mitra, Li-Shiuan Peh. 918-932 [doi]
- PLD: fast FPGA compilation to make reconfigurable acceleration compatible with modern incremental refinement software developmentYuanlong Xiao, Eric Micallef, Andrew Butt, Matthew Hofmann, Marc Alston, Matthew Goldsmith, Andrew Merczynski-Hait, André DeHon. 933-945 [doi]
- Debugging in the brave new world of reconfigurable hardwareJiacheng Ma, Gefei Zuo, Kevin Loughlin, Haoyang Zhang, Andrew Quinn 0001, Baris Kasikci. 946-962 [doi]
- Temporal and SFQ pulse-streams encoding for area-efficient superconducting acceleratorsPatricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, George Michelogiannakis. 963-976 [doi]
- Understanding and exploiting optimal function inliningTheodoros Theodoridis, Tobias Grosser, Zhendong Su 0003. 977-989 [doi]
- CirFix: automatically repairing defects in hardware design codeHammad Ahmad, Yu Huang 0015, Westley Weimer. 990-1003 [doi]
- Vector instruction selection for digital signal processors using program synthesisMaaz Bin Safeer Ahmad, Alexander J. Root, Andrew Adams, Shoaib Kamil, Alvin Cheung. 1004-1016 [doi]
- HeteroGen: transpiling C to heterogeneous HLS code with automated test generation and program repairQian Zhang, Jiyuan Wang, Guoqing Harry Xu, Miryung Kim. 1017-1029 [doi]
- Tree traversal synthesis using domain-specific symbolic compilationYanju Chen, Junrui Liu, Yu Feng, Rastislav Bodík. 1030-1042 [doi]
- SRAM has no chill: exploiting power domain separation to steal on-chip secretsJubayer Mahmod, Matthew Hicks. 1043-1055 [doi]
- Randomized row-swap: mitigating Row Hammer by breaking spatial correlation between aggressor and victim rowsGururaj Saileshwar, Bolin Wang, Moinuddin K. Qureshi, Prashant J. Nair. 1056-1069 [doi]
- ShEF: shielded enclaves for cloud FPGAsMark Zhao, Mingyu Gao 0001, Christos Kozyrakis. 1070-1085 [doi]
- Invisible bits: hiding secret messages in SRAM's analog domainJubayer Mahmod, Matthew Hicks. 1086-1098 [doi]
- Taurus: a data plane architecture for per-packet MLTushar Swamy, Alexander Rucker, Muhammad Shahbaz 0001, Ishan Gaur, Kunle Olukotun. 1099-1114 [doi]
- FlexDriver: a network driver for your acceleratorHaggai Eran, Maxim Fudim, Gabi Malka, Gal Shalom, Noam Cohen, Amit Hermony, Dotan Levi, Liran Liss, Mark Silberstein. 1115-1129 [doi]
- The benefits of general-purpose on-NIC memoryBoris Pismenny, Liran Liss, Adam Morrison 0001, Dan Tsafrir. 1130-1147 [doi]
- Domain specific run time optimization for software data planesSebastiano Miano, Alireza Sanaee, Fulvio Risso, Gábor Rétvári, Gianni Antichi. 1148-1164 [doi]