Abstract is missing.
- The Algorithmics of Write OptimizationMichael A. Bender. 1 [doi]
- MIDAS: Multilinear Detection at ScaleSaliya Ekanayake, Jose Cadena, Udayanga Wickramasinghe, Anil Vullikanti. 2-11 [doi]
- Optimizing Parallel Graph Connectivity Computation via Subgraph SamplingMichael Sutton 0001, Tal Ben-Nun, Amnon Barak. 12-21 [doi]
- Parallel Algorithms Through Approximation: B-Edge CoverS. M. Ferdous, Arif M. Khan, Alex Pothen. 22-33 [doi]
- A Parallel Algorithm for Bayesian Network Inference Using Arithmetic CircuitsMd. Vasimuddin, Sriram P. Chockalingam, Srinivas Aluru. 34-43 [doi]
- Cataloging the Visible Universe Through Bayesian Inference at PetascaleJeffrey Regier, Kiran Pamnany, Keno Fischer, Andreas Noack, Maximilian Lam, Jarrett Revels, Steve Howard, Ryan Giordano, David Schlegel, Jon McAuliffe, Rollin C. Thomas, Prabhat. 44-53 [doi]
- Efficient, Parallel At-scale Correlation Analysis for Atom Probe Tomography on Hybrid ArchitecturesHao Lu, Sudip K. Seal, Gregory Muzyn, Wei Guo, Jonathan D. Poplawsky. 54-63 [doi]
- A Fast and Massively-Parallel Inverse Solver for Multiple-Scattering Tomographic Image ReconstructionMert Hidayetoglu, Carl Pearson, Izzat El Hajj, Levent Gürel, Weng Cho Chew, Wen-mei W. Hwu. 64-74 [doi]
- Real-Time Massively Distributed Multi-object Adaptive Optics Simulations for the European Extremely Large TelescopeHatem Ltaief, Ali Charara, Damien Gratadour, Nicolas Doucet, Bilel Hadri, Eric Gendron, Saber Feki, David E. Keyes. 75-84 [doi]
- Performance Isolation of Data-Intensive Scale-out Applications in a Multi-tenant CloudPalden Lama, Shaoqi Wang, Xiaobo Zhou 0002, Dazhao Cheng. 85-94 [doi]
- QoS Support for Scientific Workflows Using Software-Defined Storage Resource EnclavesSuman Karki, Bao Nguyen, Xuechen Zhang. 95-104 [doi]
- Scalable Data Resilience for In-memory Data StagingShaohua Duan, Pradeep Subedi, Keita Teranishi, Philip E. Davis, Hemanth Kolla, Marc Gamell, Manish Parashar. 105-115 [doi]
- Performance and Scalability of Lightweight Multi-kernel Based Operating SystemsBalazs Gerofi, Rolf Riesen, Masamichi Takagi, Taisuke Boku, Kengo Nakajima, Yutaka Ishikawa, Robert W. Wisniewski. 116-125 [doi]
- Architectural Support for Unlimited Memory Versioning and RenamingEran Gilad, Tehila Mayzels, Elazar Raab, Mark Oskin, Yoav Etsion. 126-136 [doi]
- CTA-Aware Prefetching and Scheduling for GPUGunjae Koo, Hyeran Jeon, Zhenhong Liu, Nam Sung Kim, Murali Annavaram. 137-148 [doi]
- CIAO: Cache Interference-Aware Throughput-Oriented Architecture and Scheduling for GPUsJie Zhang 0048, Shuwen Gao, Nam Sung Kim, Myoungsoo Jung. 149-159 [doi]
- Millipede: Die-Stacked Memory Optimizations for Big Data Machine Learning AnalyticsNitin 0002, Mithuna Thottethodi, T. N. Vijaykumar. 160-171 [doi]
- Scheduling Monotone Moldable Jobs in Linear TimeKlaus Jansen, Felix Land. 172-181 [doi]
- The Power to Schedule a Parallel ProgramKunal Agrawal, Seth Gilbert. 182-193 [doi]
- Scheduling Parallel Tasks under Multiple Resources: List Scheduling vs. Pack SchedulingHongyang Sun, Redouane Elghazi, Ana Gainaru, Guillaume Aupy, Padma Raghavan. 194-203 [doi]
- Parallel Scheduling of DAGs under Memory ConstraintsLoris Marchal, Hanna Nagy, Bertrand Simon, Frédéric Vivien. 204-213 [doi]
- Evaluating Active Learning with Cost and Memory AwarenessDmitry Duplyakin, Jed Brown, Donna Calhoun. 214-223 [doi]
- Semantics-Preserving Parallelization of Stochastic Gradient DescentSaeed Maleki, Madanlal Musuvathi, Todd Mytkowicz. 224-233 [doi]
- Efficient Gradient Boosted Decision Tree Training on GPUsZeyi Wen, Bingsheng He, Ramamohanarao Kotagiri, Shengliang Lu, Jiashuai Shi. 234-243 [doi]
- BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPUYuwei Hu, Jidong Zhai, Dinghua Li, Yifan Gong, Yuhao Zhu, Wei Liu, Lei Su, Jiangming Jin. 244-253 [doi]
- Lightweight MPI Communicators with Applications to Perfectly Balanced QuicksortMichael Axtmann, Armin Wiebigke, Peter Sanders 0001. 254-265 [doi]
- Improving Network Throughput with Global Communication ReorderingWim Lavrijsen, Costin Iancu, Xing Pan. 266-275 [doi]
- Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUsKaixi Hou, Hao Wang, Wu-chun Feng, Jeffrey S. Vetter, Seyong Lee. 276-285 [doi]
- Development and Application of a Hybrid Programming Environment on an ARM/DSP System for High Performance ComputingGaurav Mitra, Jonathan Bohmann, Ian Lintault, Alistair P. Rendell. 286-295 [doi]
- GC-Aware Request Steering with Improved Performance and Reliability for SSD-Based RAIDsSuzhen Wu, Weidong Zhu, Guixin Liu, Hong Jiang, Bo Mao. 296-305 [doi]
- A Set-Aware Key-Value Store on Shingled Magnetic Recording Drives with Dynamic BandTing Yao, Zhi-hu Tan, Jiguang Wan, Ping Huang, Yiwen Zhang, Changsheng Xie, Xubin He. 306-315 [doi]
- Software-Hardware Managed Last-level Cache Allocation Scheme for Large-Scale NVRAM-Based Multicores Executing Parallel Data Analytics ApplicationsMasab Ahmad, Halit Dogan, Fabio Checconi, Xinyu Que, Daniele Buono, Omer Khan. 316-325 [doi]
- MOCA: Memory Object Classification and Allocation in Heterogeneous Memory SystemsAditya Narayan, Tiansheng Zhang, Shaizeen Aga, Satish Narayanasamy, Ayse Kivilcim Coskun. 326-335 [doi]
- Communication-Free Massively Distributed Graph GenerationDaniel Funke, Sebastian Lamm, Peter Sanders 0001, Christian Schulz 0003, Darren Strash, Moritz von Looz. 336-347 [doi]
- Understanding and Modeling Lossy Compression Schemes on HPC Scientific DataTao Lu, Qing Liu 0002, Xubin He, Huizhang Luo, Eric Suchyta, Jong Choi, Norbert Podhorszki, Scott Klasky, Matthew Wolf, Tong Liu, Zhenbo Qiao. 348-357 [doi]
- UBIS: Utilization-Aware Cluster SchedulingKarthik Kambatla, Vamsee Yarlagadda, Iñigo Goiri, Ananth Grama. 358-367 [doi]
- Hardware Transactional Memory Meets Memory PersistencyDaniel Castro, Paolo Romano 0002, João Barreto. 368-377 [doi]
- Empowering Flexible and Scalable High Performance Architectures with Embedded PhotonicsKeren Bergman. 378 [doi]
- Large Bandwidth-Efficient FFTs on Multicore and Multi-socket SystemsDoru-Thom Popovici, Tze Meng Low, Franz Franchetti. 379-388 [doi]
- Lattice H-Matrices on Distributed-Memory SystemsAkihiro Ida. 389-398 [doi]
- Evaluating the Performance and Cost of Accelerating Seismic Processing with CUDA, OpenCL, OpenACC, and OpenMPTiago Lobato Gimenes, Flavia Pisani, Edson Borin. 399-408 [doi]
- Avoiding Synchronization in First-Order Methods for Sparse Convex OptimizationAditya Devarakonda, Kimon Fountoulakis, James Demmel, Michael W. Mahoney. 409-418 [doi]
- A Dynamic Hash Table for the GPUSaman Ashkiani, Martin Farach-Colton, John D. Owens. 419-429 [doi]
- GPU LSM: A Dynamic Dictionary Data Structure for the GPUSaman Ashkiani, Shengren Li, Martin Farach-Colton, Nina Amenta, John D. Owens. 430-440 [doi]
- WarpDrive: Massively Parallel Hashing on Multi-GPU NodesDaniel Jünger, Christian Hundt 0002, Bertil Schmidt. 441-450 [doi]
- Quotient Filters: Approximate Membership Queries on the GPUAfton Geil, Martin Farach-Colton, John D. Owens. 451-462 [doi]
- BabelFlow: An Embedded Domain Specific Language for Parallel Analysis and VisualizationSteve Petruzza, Sean Treichler, Valerio Pascucci, Peer-Timo Bremer. 463-473 [doi]
- Online Tuning of Parallelism Degree in Parallel Nesting Transactional MemoryJingna Zeng, Paolo Romano 0002, João Barreto, Luís E. T. Rodrigues, Seif Haridi. 474-483 [doi]
- Work-Stealing, Locality-Aware Actor SchedulingSaman Barghi, Martin Karsten. 484-494 [doi]
- Indigo: A Domain-Specific Language for Fast, Portable Image ReconstructionMichael B. Driscoll, Benjamin Brock, Frank Ong, Jonathan I. Tamir, Hsiou-Yuan Liu, Michael Lustig, Armando Fox, Katherine A. Yelick. 495-504 [doi]
- Swallow: Joint Online Scheduling and Coflow Compression in Datacenter NetworksQihua Zhou, Peng Li 0017, Kun Wang, Deze Zeng, Song Guo 0001, Minyi Guo. 505-514 [doi]
- Auto-tuning Streamed Applications on Intel Xeon PhiPeng Zhang, Jianbin Fang, Tao Tang, Canqun Yang, Zheng Wang. 515-525 [doi]
- Analyzing Resource Trade-offs in Hardware Overprovisioned SupercomputersRyuichi Sakamoto, Tapasya Patki, Thang Cao, Masaaki Kondo, Koji Inoue, Masatsugu Ueda, Daniel A. Ellsworth, Barry Rountree, Martin Schulz 0001. 526-535 [doi]
- Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble ApplicationsVivek Balasubramanian, Matteo Turilli, Weiming Hu, Matthieu Lefebvre, Wenjie Lei, Ryan T. Modrak, Guido Cervone, Jeroen Tromp, Shantenu Jha. 536-545 [doi]
- A Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked FormatsPeter Ahrens, Helen Xu, Nicholas Schiefer. 546-556 [doi]
- Communication Lower Bounds for Matricized Tensor Times Khatri-Rao ProductGrey Ballard, Nicholas Knight, Kathryn Rouse. 557-567 [doi]
- Blocking Optimization Techniques for Sparse Tensor ComputationJee W. Choi, Xing Liu, Shaden Smith, Tyler Simon. 568-577 [doi]
- TTLG - An Efficient Tensor Transposition Library for GPUsJyothi Vedurada, Arjun Suresh, Aravind Sukumaran-Rajam, Jinsung Kim, Changwan Hong, Ajay Panyala, Sriram Krishnamoorthy, V. Krishna Nandivada, Rohit Kumar Srivastava, P. Sadayappan. 578-588 [doi]
- Do Developers Understand IEEE Floating Point?Peter A. Dinda, Conor Hetland. 589-598 [doi]
- sDPF-RSA: Utilizing Floating-point Computing Power of GPUs for Massive Digital Signature ComputationsJiankuo Dong, Fangyu Zheng, Niall Emmart, Jingqiang Lin, Charles C. Weems. 599-609 [doi]
- Rethinking large-scale Economic Modeling for Efficiency: Optimizations for GPU and Xeon Phi ClustersSimon Scheidegger, Dmitry Mikushin, Felix Kubler, Olaf Schenk. 610-619 [doi]
- A Fast Scalable Implicit Solver with Concentrated Computation for Nonlinear Time-Evolution Problems on Low-Order Unstructured Finite ElementsTsuyoshi Ichimura, Kohei Fujita, Masashi Horikoshi, Larry Meadows, Kengo Nakajima, Takuma Yamaguchi, Kentaro Koyama, Hikaru Inoue, Akira Naruse, Keisuke Katsushima, Muneo Hori, Lalith Maddegedara. 620-629 [doi]
- Characterizing Scheduling Delay for Low-Latency Data Analytics WorkloadsWei Chen 0038, Aidi Pi, Shaoqi Wang, Xiaobo Zhou. 630-639 [doi]
- Runtime Scheduling Policies for Distributed Graph AlgorithmsJesun Sahariar Firoz, Marcin Zalewski, Andrew Lumsdaine, Martina Barnas. 640-649 [doi]
- Communication Efficient Checking of Big Data OperationsLorenz Hübschle-Schneider, Peter Sanders 0001. 650-659 [doi]
- What Size Should Your Buffers to Disks be?Guillaume Aupy, Olivier Beaumont, Lionel Eyraud-Dubois. 660-669 [doi]
- THOR: THermal-aware Optimizations for extending ReRAM LifetimeMajed Valad Beigi, Gokhan Memik. 670-679 [doi]
- CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction OffloadingLifeng Nai, Ramyad Hadidi, He Xiao, Hyojong Kim, Jaewoong Sim, Hyesoon Kim. 680-689 [doi]
- GreenSprint: Effective Computational Sprinting in Green Data CentersHaoran Cai, Xu Zhou, Qiang Cao, Hong Jiang, Feng Sheng, Xiandong Qi, Jie Yao, Changsheng Xie, Liang Xiao, Liang Gu. 690-699 [doi]
- Joint Server and Network Energy Saving in Data Centers for Latency-Sensitive ApplicationsLiang Zhou, Chih-Hsun Chou, Laxmi N. Bhuyan, K. K. Ramakrishnan, Daniel Wong 0001. 700-709 [doi]
- The Day After Tomorrow: The Looming Post-Exascale CrisisBruce Hendrickson. 710 [doi]
- Implicit Decomposition for Write-Efficient Connectivity AlgorithmsNaama Ben-David, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu 0001, Charles McGuffey, Julian Shun. 711-722 [doi]
- Distributed Symmetry Breaking in Graphs with Bounded DiversityLeonid Barenboim, Tzalik Maimon. 723-732 [doi]
- Complete Visitability for Autonomous Robots on GraphsAisha Aljohani, Pavan Poudel, Gokarna Sharma. 733-742 [doi]
- Local Mixing Time: Distributed Computation and ApplicationsAnisur Rahaman Molla, Gopal Pandurangan. 743-752 [doi]
- Roofline Guided Design and Analysis of a Multi-stencil CFD Solver for Multicore PerformanceBahareh Mostafazadeh, Ferran Marti, Feng Liu, Aparna Chandramowlishwaran. 753-762 [doi]
- Taming the "Monster": Overcoming Program Optimization Challenges on SW26010 Through Precise Performance ModelingShizhen Xu, Yuanchao Xu, Wei Xue, Xipeng Shen, Fang Zheng, Xiaomeng Huang, Guangwen Yang. 763-773 [doi]
- Performance and Accuracy Trade-offs of HPC Application Modeling and SimulationZhou Tong, Xin Yuan 0001, Scott Pakin, Michael Lang 0003. 774-783 [doi]
- PADDLE: Performance Analysis Using a Data-Driven Learning EnvironmentJayaraman J. Thiagarajan, Rushil Anirudh, Bhavya Kailkhura, Nikhil Jain, Tanzima Islam, Abhinav Bhatele, Jae-Seung Yeom, Todd Gamblin. 784-793 [doi]
- Efficient Solving of Scan Primitive on Multi-GPU SystemsAdrián Pérez Diéguez, Margarita Amor, Ramon Doallo, Akira Nukada, Satoshi Matsuoka. 794-803 [doi]
- Quantifying the Performance and Energy-Efficiency Impact of Hardware Transactional Memory on Scientific Applications on Large-Scale NUMA SystemsJinsu Park, Woongki Baek. 804-813 [doi]
- GPU-Accelerated Large-Scale Genome AssemblySayan Goswami, Kisung Lee, Shayan Shams, Seung-Jong Park. 814-824 [doi]
- GPU Data Access on Complex Geometries for D3Q19 Lattice Boltzmann MethodGregory Herschlag, Seyong Lee, Jeffrey S. Vetter, Amanda Randles. 825-834 [doi]
- SLIMFAST: Reducing Metadata Redundancy in Sound and Complete Dynamic Data Race DetectionYuanfeng Peng, Christian DeLozier, Ariel Eizenberg, William Mansky, Joseph Devietti. 835-844 [doi]
- SWORD: A Bounded Memory-Overhead Detector of OpenMP Data Races in Production RunsSimone Atzeni, Ganesh Gopalakrishnan, Zvonimir Rakamaric, Ignacio Laguna, Gregory L. Lee, Dong H. Ahn. 845-854 [doi]
- Unobtrusive Asynchronous Exception Handling with Standard Java Try/Catch BlocksMostafa Mehrabi, Nasser Giacaman, Oliver Sinnen. 855-864 [doi]
- COMPI: Concolic Testing for MPI ApplicationsHongbo Li, Sihuan Li, Zachary Benavides, Zizhong Chen, Rajiv Gupta 0001. 865-874 [doi]
- Experimental Design of Work Chunking for Graph Algorithms on High Bandwidth Memory ArchitecturesGeorge M. Slota, Sivasankaran Rajamanickam. 875-884 [doi]
- Distributed Louvain Algorithm for Graph Community DetectionSayan Ghosh, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, Hao Lu, Daniel G. Chavarría-Miranda, Arif Khan, Assefaw Hadish Gebremedhin. 885-895 [doi]
- Application Codesign of Near-Data Processing for Similarity SearchVincent T. Lee, Amrita Mazumdar, Carlo C. del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin. 896-907 [doi]
- A Communication-Avoiding 3D LU Factorization Algorithm for Sparse MatricesPiyush Sao, Xiaoye Sherry Li, Richard W. Vuduc. 908-919 [doi]
- A New GPU Algorithm to Compute a Level Set-Based Analysis for the Parallel Solution of Sparse Triangular SystemsErnesto Dufrechou, Pablo Ezzatti. 920-929 [doi]
- Performance of Hierarchical-matrix BiCGStab Solver on GPU ClustersIchitaro Yamazaki, Ahmad Abdelfattah, Akihiro Ida, Satoshi Ohshima, Stanimire Tomov, Rio Yokota, Jack J. Dongarra. 930-939 [doi]
- Convergence Models and Surprising Results for the Asynchronous Jacobi MethodJordi Wolfson-Pou, Edmond Chow. 940-949 [doi]
- Overhead-Conscious Format Selection for SpMV-Based ApplicationsYue Zhao, Weijie Zhou, Xipeng Shen, Graham Yiu. 950-959 [doi]
- Cudele: An API and Framework for Programmable Consistency and Durability in a Global NamespaceMichael A. Sevilla, Ivo Jimenez, Noah Watkins, Jeff LeFevre, Peter Alvaro, Shel Finkelstein, Patrick Donnelly, Carlos Maltzahn. 960-969 [doi]
- SELECT: A Distributed Publish/Subscribe Notification System for Online Social NetworksNuno Apolónia, Stefanos Antaris, Sarunas Girdzijauskas, George Pallis, Marios D. Dikaiakos. 970-979 [doi]
- A Lightweight Communication Runtime for Distributed Graph AnalyticsHoang-Vu Dang, Roshan Dathathri, Gurbinder Gill, Alex Brooks, Nikoli Dryden, Andrew Lenharth, Loc Hoang, Keshav Pingali, Marc Snir. 980-989 [doi]
- Intra-Cluster Coalescing to Reduce GPU NoC PressureLu Wang, Xia Zhao, David Kaeli, Zhiying Wang, Lieven Eeckhout. 990-999 [doi]
- HybridPass: Hybrid Scheduling for Mixed Flows in Datacenter NetworksBo Peng, Jianguo Yao, Zhengwei Qi, Haibing Guan. 1000-1009 [doi]
- Scalable Power-Efficient Kilo-Core Photonic-Wireless NoC ArchitecturesAvinash Kodi, Kyle Shifflet, Savas Kaya, Soumyasanta Laha, Ahmed Louri. 1010-1019 [doi]
- Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-coresJahanzeb Maqbool Hashmi, Sourav Chakraborty 0003, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda. 1020-1029 [doi]
- Tiny Groups Tackle Byzantine AdversariesMercy O. Jaiyeola, Kyle Patron, Jared Saia, Maxwell Young, Qian M. Zhou. 1030-1039 [doi]
- Skueue: A Scalable and Sequentially Consistent Distributed QueueMichael Feldmann 0001, Christian Scheideler, Alexander Setzer. 1040-1049 [doi]
- Self-Stabilizing Supervised Publish-Subscribe SystemsMichael Feldmann 0001, Christina Kolb, Christian Scheideler, Thim Strothmann. 1050-1059 [doi]
- Spartan: A Framework For Sparse Robust Addressable NetworksJohn Augustine, Sumathi Sivasubramaniam. 1060-1069 [doi]
- Beyond Binary Search: Parallel In-Place Construction of Implicit Search Tree LayoutsKyle Berney, Henri Casanova, Alyssa Higuchi, Ben Karsin, Nodari Sitchinava. 1070-1079 [doi]
- An Energy-Efficient Single-Source Shortest Path AlgorithmSara Karamati, Jeffrey Young, Richard W. Vuduc. 1080-1089 [doi]
- Scalable Breadth-First Search on a GPU ClusterYuechao Pan, Roger Pearce, John D. Owens. 1090-1101 [doi]
- Chameleon: Online Clustering of MPI Program TracesAmir Bahmani, Frank Mueller. 1102-1112 [doi]
- Trade-Off Study of Localizing Communication and Balancing Network Traffic on a Dragonfly SystemXin Wang, Misbah Mubarak, Xu Yang, Robert B. Ross, Zhiling Lan. 1113-1122 [doi]
- Level-Spread: A New Job Allocation Policy for Dragonfly NetworksYijia Zhang, Ozan Tuncer, Fulya Kaplan, Katzalin Olcoz, Vitus J. Leung, Ayse Kivilcim Coskun. 1123-1132 [doi]
- A Migratory Heterogeneity-Aware Data Layout Scheme for Parallel File SystemsShuibing He, Xian-He Sun, Yang Wang 0006, Chengzhong Xu. 1133-1142 [doi]
- LALCA: Locality-Aware Lock Contention Avoidance for NVMe-Based Scale-out Storage SystemMyoungwon Oh, Sejin Park, Jugwan Eom, Seungmin Kim, Sangjae Kim, Kang-Won Lee, Heon Y. Yeom. 1143-1152 [doi]
- Mitigating Traffic-Based Side Channel Attacks in Bandwidth-Efficient Cloud StoragePengfei Zuo, Yu Hua 0001, Cong Wang, Wen Xia, Shunde Cao, Yukun Zhou, Yuanyuan Sun. 1153-1162 [doi]
- Chameleon: An Adaptive Wear Balancer for Flash ClustersNannan Zhao, Ali Anwar, Yue Cheng, Mohammed Salman, Da-ping Li, Jiguang Wan, Changsheng Xie, Xubin He, Feiyi Wang, Ali Raza Butt. 1163-1172 [doi]