Abstract is missing.
- Computational Challenges in Constructing the Tree of LifeTandy J. Warnow. 1 [doi]
- Monitoring Properties of Large, Distributed, Dynamic GraphsGal Yehuda, Daniel Keren, Islam Akaria. 2-11 [doi]
- Parallel Construction of Suffix Trees and the All-Nearest-Smaller-Values ProblemPatrick Flick, Srinivas Aluru. 12-21 [doi]
- The Reverse Cuthill-McKee Algorithm in Distributed-MemoryAriful Azad, Mathias Jacquelin, Aydin Buluç, Esmond G. Ng. 22-31 [doi]
- SlimSell: A Vectorizable Graph Representation for Breadth-First SearchMaciej Besta, Florian Marending, Edgar Solomonik, Torsten Hoefler. 32-41 [doi]
- SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database SearchHaidong Lan, Weiguo Liu, Yongchao Liu, Bertil Schmidt. 42-51 [doi]
- PUNAS: A Parallel Ungapped-Alignment-Featured Seed Verification Algorithm for Next-Generation Sequencing Read AlignmentYuandong Chan, Kai Xu, Haidong Lan, Weiguo Liu, Yongchao Liu, Bertil Schmidt. 52-61 [doi]
- Eliminating Irregularities of Protein Sequence Search on Multicore ArchitecturesJing Zhang, Sanchit Misra, Hao Wang 0002, Wu-chun Feng. 62-71 [doi]
- Communication Optimization on GPU: A Case Study of Sequence Alignment AlgorithmsJie Wang, Xinfeng Xie, Jason Cong. 72-81 [doi]
- Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line ManagementBingchao Li, Jizhou Sun, Murali Annavaram, Nam Sung Kim. 82-91 [doi]
- Content-Aware Non-Volatile Cache ReplacementQi Zeng, Jih-Kwon Peir. 92-101 [doi]
- DEFT-Cache: A Cost-Effective and Highly Reliable SSD Cache for RAID StorageJiguang Wan, Wei Wu, Ling Zhan, Qing Yang, Xiaoyang Qu, Changsheng Xie. 102-111 [doi]
- Adaptive Software Caching for Efficient NVRAM Data PersistencePengcheng Li, Dhruva R. Chakrabarti, Chen Ding, Liang Yuan. 112-122 [doi]
- Container-Based Cloud Platform for Mobile Computation OffloadingSong Wu, Chao Niu, Jia Rao, Hai Jin, Xiaohai Dai. 123-132 [doi]
- Enhancing Datacenter Resource Management through Temporal Logic ConstraintsHao He, Jiang Hu, Dilma Da Silva. 133-142 [doi]
- High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand ClustersJie Zhang, Xiaoyi Lu, Dhabaleswar K. Panda. 143-152 [doi]
- Argo NodeOS: Toward Unified Resource Management for ExascaleSwann Perarnau, Judicael A. Zounmevo, Matthieu Dreher, Brian C. Van Essen, Roberto Gioiosa, Kamil Iskra, Maya B. Gokhale, Kazutomo Yoshii, Peter H. Beckman. 153-162 [doi]
- Rational Fair Consensus in the Gossip ModelAndrea Clementi, Luciano Gualà, Guido Proietti, Giacomo Scornavacca. 163-171 [doi]
- Leader Election in a Smartphone Peer-to-Peer NetworkCalvin Newport. 172-181 [doi]
- Leader Election in Asymmetric Labeled Unidirectional RingsKarine Altisen, Ajoy K. Datta, Stéphane Devismes, Anaïs Durand, Lawrence L. Larmore. 182-191 [doi]
- Tight Load Balancing Via Randomized Local SearchPetra Berenbrink, Peter Kling, Christopher Liaw, Abbas Mehrabian. 192-201 [doi]
- Large Scale Manycore-Aware PIC Simulation with Efficient Particle BinningHiroshi Nakashima, Yoshiki Summura, Keisuke Kikura, Yohei Miyake. 202-212 [doi]
- Optimization and Parallelization of B-Spline Based Orbital Evaluations in QMC on Multi/Many-Core Shared Memory ProcessorsAmrita Mathuriya, Ye Luo, Anouar Benali, Luke Shulenburger, Jeongnim Kim. 213-223 [doi]
- One-Way Wave Equation Migration at Scale on GPUs Using Directive Based ProgrammingKshitij Mehta, Maxime R. Hugues, Oscar R. Hernandez, David E. Bernholdt, Henri Calandra. 224-233 [doi]
- Towards Highly scalable Ab Initio Molecular Dynamics (AIMD) Simulations on the Intel Knights Landing Manycore ProcessorMathias Jacquelin, Wibe De Jong, Eric J. Bylaska. 234-243 [doi]
- General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming ModelsXubin Tan, Jaume Bosch, Miquel Vidal, Carlos Álvarez, Daniel Jiménez-González, Eduard Ayguadé, Mateo Valero. 244-253 [doi]
- Accelerating Graph and Machine Learning Workloads Using a Shared Memory Multicore Architecture with Auxiliary Support for In-hardware Explicit MessagingHalit Dogan, Farrukh Hijaz, Masab Ahmad, Brian Kahne, Peter Wilson, Omer Khan. 254-264 [doi]
- Respin: Rethinking Near-Threshold Multiprocessor Design with Non-volatile MemoryXiang Pan, Anys Bacha, Radu Teodorescu. 265-275 [doi]
- MOCHA: Morphable Locality and Compression Aware Architecture for Convolutional Neural NetworksSyed Mohammad Asad Hassan Jafri, Ahmed Hemani, Kolin Paul, Naeem Abbas. 276-286 [doi]
- Autotuning Stencil Computations with Structural Ordinal Regression LearningBiagio Cosenza, Juan J. Durillo, Stefano Ermon, Ben H. H. Juurlink. 287-296 [doi]
- Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNLSabela Ramos, Torsten Hoefler. 297-306 [doi]
- Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent CodeDavid Beckingsale, Olga Pearce, Ignacio Laguna, Todd Gamblin. 307-316 [doi]
- Generating Performance Models for Irregular ApplicationsRyan D. Friese, Nathan R. Tallent, Abhinav Vishnu, Darren J. Kerbyson, Adolfy Hoisie. 317-326 [doi]
- Bounded Reordering Allows Efficient Reliable Message TransmissionKeishla D. Ortiz-Lopez, Jennifer L. Welch. 327-336 [doi]
- Dynamic Adaptation in Wireless Networks Under Comprehensive Interference via Carrier SenseDongxiao Yu, Yuexuan Wang, Tigran Tonoyan, Magnús M. Halldórsson. 337-346 [doi]
- Fault-Tolerant Online Packet Scheduling on Parallel ChannelsPawel Garncarek, Tomasz Jurdzinski, Krzysztof Lorys. 347-356 [doi]
- Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable SystemsTorsten Hoefler, Amnon Barak, Amnon Shiloh, Zvi Drezner. 357-366 [doi]
- DR-BW: Identifying Bandwidth Contention in NUMA Architectures with Supervised LearningHao Xu, Shasha Wen, Alfredo Giménez, Todd Gamblin, Xu Liu. 367-376 [doi]
- Data Centric Performance Measurement Techniques for Chapel ProgramsHui Zhang, Jeffrey K. Hollingsworth. 377-386 [doi]
- A Parallel FastTrack Data Race Detector on Multi-core SystemsYoung Wn Song, Yann-Hang Lee. 387-396 [doi]
- Localized Fault Recovery for Nested Fork-Join ProgramsGokcen Kestor, Sriram Krishnamoorthy, Wenjing Ma. 397-408 [doi]
- Exploring DataVortex Systems for Irregular ApplicationsRoberto Gioiosa, Antonino Tumeo, Jian Yin, Thomas Warfel, David J. Haglin, Santiago Betelú. 409-418 [doi]
- 2-MTCP: Light-Weight Coding for Efficient Multi-Path Transmission in Data Center NetworkJiyan Sun, Yan Zhang 0014, Xin Wang, Shihan Xiao, Zhen Xu, Hongjing Wu, Xin Chen, Yanni Han. 419-428 [doi]
- A Scalable and Resilient Microarchitecture Based on Multiport Binding for High-Radix Router DesignYi Dai, Kefei Wang, Gang Qu, Liquan Xiao, Dezun Dong, Xingyun Qi. 429-438 [doi]
- Partitioning Low-Diameter Networks to Eliminate Inter-Job InterferenceNikhil Jain, Abhinav Bhatele, Xiang Ni, Todd Gamblin, Laxmikant V. Kalé. 439-448 [doi]
- Accelerating Spark Datasets by Inlining DeserializationJan Wroblewski, Kazuaki Ishizaki, Hiroshi Inoue, Moriyoshi Ohara. 449-458 [doi]
- MRapid: An Efficient Short Job Optimizer on HadoopHong Zhang, Hai Huang, Liqiang Wang. 459-468 [doi]
- Accommodating Thread-Level Heterogeneity in Coupled Parallel ApplicationsSamuel K. Gutierrez, Kei Davis, Dorian C. Arnold, Randal S. Baker, Robert W. Robey, Patrick S. McCormick, Daniel Holladay, Jon A. Dahl, R. Joe Zerr, Florian Weik, Christoph Junghans. 469-478 [doi]
- Multi-GPU Graph AnalyticsYuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, John D. Owens. 479-490 [doi]
- NVIDIA Deep Learning TutorialJulie Bernauer. 491 [doi]
- A Scalable System Architecture to Addressing the Next Generation of Predictive Simulation Workflows with Coupled Compute and Data Intensive ApplicationsMark Seager. 492 [doi]
- Fault-Tolerant Robot Gathering Problems on Graphs With Arbitrary Appearing TimesSergio Rajsbaum, Armando Castañeda, David Flores-Peñaloza, Manuel Alcantara. 493-502 [doi]
- Distributed Vehicle Routing ApproximationAkhil Krishnan, Mikhail Markov, Borzoo Bonakdarpour. 503-512 [doi]
- O(log N)-Time Complete Visibility for Asynchronous Robots with LightsGokarna Sharma, Ramachandran Vaidyanathan, Jerry L. Trahan, Costas Busch, Suresh Rai. 513-522 [doi]
- Similarity Search on Automata ProcessorsVincent T. Lee, Justin Kotalik, Carlo C. del Mundo, Armin Alaghi, Luis Ceze, Mark Oskin. 523-534 [doi]
- 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLightYulong Ao, Chao Yang, Xinliang Wang, Wei Xue, Haohuan Fu, Fangfang Liu, Lin Gan, Ping Xu, Wenjing Ma. 535-544 [doi]
- Image-Domain Gridding on Graphics ProcessorsBram Veenboer, Matthias Petschow, John W. Romein. 545-554 [doi]
- Aces4: A Platform for Computational Chemistry Calculations with Extremely Large Block-Sparse ArraysBeverly A. Sanders, Jason N. Byrd, Nakul Jindal, Victor F. Lotrich, Dmitry Lyakh, Ajith Perera, Rodney J. Bartlett. 555-564 [doi]
- PhiOpenSSL: Using the Xeon Phi Coprocessor for Efficient Cryptographic CalculationsShun Yao, Dantong Yu. 565-574 [doi]
- Directive-Based Partitioning and Pipelining for Graphics Processing UnitsXuewen Cui, Thomas R. W. Scogland, Bronis R. de Supinski, Wu-chun Feng. 575-584 [doi]
- ScalaIOExtrap: Elastic I/O Tracing and ExtrapolationXiaoqing Luo, Frank Mueller, Philip H. Carns, Jonathan Jenkins, Robert Latham, Robert B. Ross, Shane Snyder. 585-594 [doi]
- SimProf: A Sampling Framework for Data Analytic WorkloadsJen-Cheng Huang, Lifeng Nai, Pranith Kumar, Hyojong Kim, Hyesoon Kim. 595-604 [doi]
- PaPar: A Parallel Data Partitioning Framework for Big Data ApplicationsHao Wang, Jing Zhang, Da Zhang, Sarunya Pumma, Wu-chun Feng. 605-614 [doi]
- swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLightJiarui Fang, Haohuan Fu, Wenlai Zhao, Bingwei Chen, Weijie Zheng, Guangwen Yang. 615-624 [doi]
- Community Detection on the GPUMd. Naim, Fredrik Manne, Mahantesh Halappanavar, Antonino Tumeo. 625-634 [doi]
- Scalable Graph Traversal on Sunway TaihuLight with Ten Million CoresHeng Lin, Xiongchao Tang, Bowen Yu, Youwei Zhuo, Wenguang Chen, Jidong Zhai, Wanwang Yin, Weimin Zheng. 635-645 [doi]
- Partitioning Trillion-Edge Graphs in MinutesGeorge M. Slota, Sivasankaran Rajamanickam, Karen D. Devine, Kamesh Madduri. 646-655 [doi]
- Generating Families of Practical Fast Matrix Multiplication AlgorithmsJianyu Huang, Leslie Rice, Devin A. Matthews, Robert A. van de Geijn. 656-667 [doi]
- Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory ImplementationMathieu Faverge, Julien Langou, Yves Robert, Jack J. Dongarra. 668-677 [doi]
- Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear EquationsTobias Wicky, Edgar Solomonik, Torsten Hoefler. 678-687 [doi]
- A Work-Efficient Parallel Sparse Matrix-Sparse Vector Multiplication AlgorithmAriful Azad, Aydin Buluç. 688-697 [doi]
- Power Efficient Sharing-Aware GPU Data ManagementAbdulaziz Tabbakh, Murali Annavaram, Xuehai Qian. 698-707 [doi]
- Fly-Over: A Light-Weight Distributed Power-Gating Mechanism for Energy-Efficient Networks-on-ChipRahul Boyapati, Jiayi Huang, Ningyuan Wang, Kyung-Hoon Kim, Ki Hwan Yum, Eun Jung Kim 0001. 708-717 [doi]
- RCube: A Power Efficient and Highly Available Network for Data CentersZhenhua Li, Yuanyuan Yang. 718-727 [doi]
- Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC SystemsThang Cao, Wei Huang, Yuan He 0002, Masaaki Kondo. 728-737 [doi]
- Algorithms for Hierarchical and Semi-Partitioned Parallel SchedulingVincenzo Bonifaci, Gianlorenzo D'Angelo, Alberto Marchetti-Spaccamela. 738-747 [doi]
- Efficient and Deterministic Scheduling for Parallel State Machine ReplicationOdorico Machado Mendizabal, Ruda S. T. De Moura, Fernando Luís Dotti, Fernando Pedone. 748-757 [doi]
- Dynamic Memory-Aware Task-Tree SchedulingGuillaume Aupy, Clement Brasseur, Loris Marchal. 758-767 [doi]
- Approximation Proofs of a Fast and Efficient List Scheduling Algorithm for Task-Based Runtime Systems on Multicores and GPUsOlivier Beaumont, Lionel Eyraud-Dubois, Suraj Kumar. 768-777 [doi]
- Automatic Collapsing of Non-Rectangular LoopsPhilippe Clauss, Ervin Altintas, Matthieu Kuhn. 778-787 [doi]
- HOMP: Automated Distribution of Parallel Loops and Data in Highly Parallel Accelerator-Based SystemsYongHong Yan, Jiawen Liu, Kirk W. Cameron, Mariam Umar. 788-798 [doi]
- Multigrain Parallelism: Bridging Coarse-Grain Parallel Programs and Fine-Grain Event-Driven MultithreadingJaime Arteaga Molina, Stéphane Zuckerman, Guang R. Gao. 799-808 [doi]
- Improving the Integration of Task Nesting and Dependencies in OpenMPJosep M. Pérez, Vicenç Beltran, Jesús Labarta, Eduard Ayguadé. 809-818 [doi]
- Runtime Aware ArchitecturesMateo Valero. 819 [doi]
- Reducing Pagerank Communication via Propagation BlockingScott Beamer, Krste Asanovic, David A. Patterson. 820-831 [doi]
- Clustering Throughput Optimization on the GPUMichael G. Gowanlock, Cody M. Rude, David M. Blair, Justin D. Li, Victor Pankratius. 832-841 [doi]
- FlexVC: Flexible Virtual Channel Management in Low-Diameter NetworksPablo Fuentes, Enrique Vallejo 0001, Ramón Beivide, Cyriel Minkenberg, Mateo Valero. 842-854 [doi]
- Relaxations for High-Performance Message Passing on Massively Parallel SIMT ProcessorsBenjamin Klenk, Holger Fröning, Hans Eberle, Larry Dennison. 855-865 [doi]
- The SEPO Model of Computation to Enable Larger-Than-Memory Hash Tables for GPU-Accelerated Big Data AnalyticsReza Mokhtari, Michael Stumm. 866-875 [doi]
- Elastic Consistent Hashing for Distributed Storage SystemsWei Xie, Yong Chen. 876-885 [doi]
- An N log N Parallel Fast Direct Solver for Kernel MatricesChenhan D. Yu, William B. March, George Biros. 886-896 [doi]
- A Robust Parallel Preconditioner for Indefinite Systems Using Hierarchical Matrices and Randomized SamplingPieter Ghysels, Xiaoye Sherry Li, Christopher Gorman, François-Henry Rouet. 897-906 [doi]
- FFQ: A Fast Single-Producer/Multiple-Consumer Concurrent FIFO QueueSergei Arnautov, Pascal Felber, Christof Fetzer, Bohdan Trach. 907-916 [doi]
- Scalable Lock-Free Vector with CombiningIvan Walulya, Philippas Tsigas. 917-926 [doi]
- Automatic-Signal Monitors with Multi-object SynchronizationWei-Lun Hung, Vijay K. Garg. 927-936 [doi]
- Optimal Algorithms for a Mesh-Connected Computer with Limited Additional Global BandwidthYujie An, Quentin F. Stout. 937-946 [doi]
- An Adaptive Core-Specific Runtime for Energy EfficiencySridutt Bhalachandra, Allan Porterfield, Stephen L. Olivier, Jan F. Prins. 947-956 [doi]
- Production Hardware Overprovisioning: Real-World Performance Optimization Using an Extensible Power-Aware Resource Management FrameworkRyuichi Sakamoto, Thang Cao, Masaaki Kondo, Koji Inoue, Masatsugu Ueda, Tapasya Patki, Daniel A. Ellsworth, Barry Rountree, Martin Schulz 0001. 957-966 [doi]
- Co-Run Scheduling with Power Cap on Integrated CPU-GPU SystemsQi Zhu, Bo Wu, Xipeng Shen, Li Shen, Zhiying Wang. 967-977 [doi]
- Characterizing and Modeling Power and Energy for Extreme-Scale In-Situ VisualizationVignesh Adhinarayanan, Wu-chun Feng, David H. Rogers, James P. Ahrens, Scott Pakin. 978-987 [doi]
- Application Level Reordering of Remote Direct Memory Access OperationsWim Lavrijsen, Costin Iancu. 988-997 [doi]
- Toucan - A Translator for Communication Tolerant MPI ApplicationsSergio M. Martin, Marsha J. Berger, Scott B. Baden. 998-1007 [doi]
- Memory Compression Techniques for Network Address Management in MPIYanfei Guo, Charles J. Archer, Michael Blocksome, Scott Parker, Wesley Bland, Ken Raffenetti, Pavan Balaji. 1008-1017 [doi]
- Transparent Caching for RMA SystemsSalvatore Di Girolamo, Flavio Vella, Torsten Hoefler. 1018-1027 [doi]
- When Neurons FailEl Mahdi El Mhamdi, Rachid Guerraoui. 1028-1037 [doi]
- On Optimizing Distributed Tucker Decomposition for Dense TensorsVenkatesan T. Chakaravarthy, Jee W. Choi, Douglas J. Joseph, Xing Liu, Prakash Murali, Yogish Sabharwal, Dheeraj Sreedhar. 1038-1047 [doi]
- Model-Driven Sparse CP Decomposition for Higher-Order TensorsJiajia Li, Jee Choi, Ioakeim Perros, Jimeng Sun, Richard W. Vuduc. 1048-1057 [doi]
- Sparse Tensor Factorization on Many-Core Processors with High-Bandwidth MemoryShaden Smith, JongSoo Park, George Karypis. 1058-1067 [doi]
- Proximity-Aware Balanced Allocations in Cache NetworksAli Pourmiri, Mahdi Jafari Siavoshani, Seyed Pooya Shariatpanahi. 1068-1077 [doi]
- Addressing Performance Heterogeneity in MapReduce Clusters with Elastic TasksWei Chen, Jia Rao, Xiaobo Zhou. 1078-1087 [doi]
- Autonomic Resource Management for Program Orchestration in Large-Scale Data AnalysisMasahiro Tanaka, Kenjiro Taura, Kentaro Torisawa. 1088-1097 [doi]
- Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing SystemsTao Gao, Yanfei Guo, Boyu Zhang, Pietro Cicotti, Yutong Lu, Pavan Balaji, Michela Taufer. 1098-1108 [doi]
- Elastic Data Compression with Improved Performance and Space Efficiency for Flash-Based Storage SystemsBo Mao, Hong Jiang, Suzhen Wu, Yaodong Yang, Zaifa Xi. 1109-1118 [doi]
- E^2MC: Entropy Encoding Based Memory Compression for GPUsSohan Lal, Jan Lucas, Ben H. H. Juurlink. 1119-1128 [doi]
- Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled QuantizationDingwen Tao, Sheng Di, Zizhong Chen, Franck Cappello. 1129-1139 [doi]
- ATM: Approximate Task Memoization in the Runtime SystemIulian Brumar, Marc Casas, Miquel Moretó, Mateo Valero, Gurindar S. Sohi. 1140-1150 [doi]
- Design and Implementation of Papyrus: Parallel Aggregate Persistent StorageJungwon Kim, Kittisak Sajjapongse, Seyong Lee, Jeffrey S. Vetter. 1151-1162 [doi]
- Language-Based Optimizations for Persistence on Nonvolatile Main Memory SystemsJoel Edward Denny, Seyong Lee, Jeffrey S. Vetter. 1163-1173 [doi]
- MetaKV: A Key-Value Store for Metadata Management of Distributed Burst BuffersTeng Wang, Adam Moody, Yue Zhu, Kathryn Mohror, Kento Sato, Tanzima Islam, Weikuan Yu. 1174-1183 [doi]
- Parallelism and Garbage Collection Aware I/O Scheduler with Improved SSD PerformanceJiayang Guo, Yiming Hu, Bo Mao, Suzhen Wu. 1184-1193 [doi]