Abstract is missing.
- ALONA: Automatic Loop Nest Approximation with Reconstruction and Space PruningDaniel Maier, Biagio Cosenza, Ben H. H. Juurlink. 3-18 [doi]
- Automatic Low-Overhead Load-Imbalance Detection in MPI ApplicationsPeter Arzt, Yannic Fischler, Jan-Patrick Lehr, Christian Bischof. 19-34 [doi]
- Trace-Based Workload Generation and ExecutionYannis Sfakianakis, Eleni Kanellou, Manolis Marazakis, Angelos Bilas. 37-54 [doi]
- Update on the Asymptotic Optimality of LPTAnne Benoit, Louis-Claude Canon, Redouane Elghazi, Pierre-Cyrille Héam. 55-69 [doi]
- E2EWatch: An End-to-End Anomaly Diagnosis Framework for Production HPC SystemsBurak Aksar, Benjamin Schwaller, Omar Aaziz, Vitus J. Leung, Jim M. Brandt, Manuel Egele, Ayse K. Coskun. 70-85 [doi]
- Collaborative GPU Preemption via Spatial Multitasking for Efficient GPU SharingZhuoran Ji, Cho-Li Wang. 89-104 [doi]
- A Fixed-Parameter Algorithm for Scheduling Unit Dependent Tasks with Unit Communication DelaysNing Tang, Alix Munier Kordon. 105-119 [doi]
- Plan-Based Job Scheduling for Supercomputers with Shared Burst BuffersJan Kopanski, Krzysztof Rzadca. 120-135 [doi]
- Taming Tail Latency in Key-Value Stores: A Scheduling PerspectiveSonia Ben Mokhtar, Louis-Claude Canon, Anthony Dugois, Loris Marchal, Etienne Rivière. 136-150 [doi]
- A Log-Linear (2 +5/6)-Approximation Algorithm for Parallel Machine Scheduling with a Single Orthogonal ResourceAdrian Naruszko, Bartlomiej Przybylski, Krzysztof Rzadca. 151-166 [doi]
- An MPI-based Algorithm for Mapping Complex Networks onto Hierarchical ArchitecturesMaria Predari, Charilaos Tzovas, Christian Schulz 0003, Henning Meyerhenke. 167-182 [doi]
- Pipelined Model Parallelism: Complexity Results and Memory ConsiderationsOlivier Beaumont, Lionel Eyraud-Dubois, Alena Shilova. 183-198 [doi]
- Efficient and Systematic Partitioning of Large and Deep Neural Networks for ParallelizationHaoran Wang, Chong Li, Thibaut Tachon, Hongxing Wang, Sheng Yang, Sébastien Limet, Sophie Robert. 201-216 [doi]
- A GPU Architecture Aware Fine-Grain Pruning Technique for Deep Neural NetworksKyusik Choi, Hoeseok Yang. 217-231 [doi]
- Towards Flexible and Compiler-Friendly Layer Fusion for CNNs on Multicore CPUsZhongyi Lin, Evangelos Georganas, John D. Owens. 232-248 [doi]
- Smart Distributed DataSets for Stream ProcessingTiago Lopes, Miguel E. Coimbra, Luís Veiga. 249-265 [doi]
- Colony: Parallel Functions as a Service on the Cloud-Edge ContinuumFrancesc Lordan, Daniele Lezzi, Rosa M. Badia. 269-284 [doi]
- Horizontal Scaling in Cloud Using Contextual BanditsDavid Delande, Patricia Stolf, Raphaël Féraud, Jean-Marc Pierson, André Bottaro. 285-300 [doi]
- Geo-distribute Cloud Applications at the EdgeRonan-Alexandre Cherrueau, Marie Delavergne, Adrien Lèbre. 301-316 [doi]
- A Fault Tolerant and Deadline Constrained Sequence Alignment Application on Cloud-Based Spot GPU InstancesRafaela C. Brum, Walisson P. Sousa, Alba C. M. A. Melo, Cristiana Bentes, Maria Clicia Stelling de Castro, Lúcia Maria de A. Drummond. 317-333 [doi]
- Sustaining Performance While Reducing Energy Consumption: A Control Theory ApproachSophie Cerf, Raphaël Bleuse, Valentin Reis, Swann Perarnau, Éric Rutten. 334-349 [doi]
- Algorithm Design for Tensor UnitsRezaul Chowdhury, Francesco Silvestri 0001, Flavio Vella. 353-367 [doi]
- A Scalable Approximation Algorithm for Weighted Longest Common SubsequenceJeremy Buhler, Thomas Lavastida, Kefu Lu, Benjamin Moseley. 368-384 [doi]
- TSLQueue: An Efficient Lock-Free Design for Priority QueuesAdones Rukundo, Philippas Tsigas. 385-401 [doi]
- G-Morph: Induced Subgraph Isomorphism Search of Labeled Graphs on a GPUBryan Rowe, Rajiv Gupta 0001. 402-417 [doi]
- Accelerating Graph Applications Using Phased Transactional MemoryCatalina Munoz Morales, Rafael Murari, Joao P. L. de Carvalho, Bruno Chinelato Honorio, Alexandro Baldassin, Guido Araujo. 421-434 [doi]
- Efficient GPU Computation Using Task Graph ParallelismDian-Lun Lin, Tsung-Wei Huang. 435-450 [doi]
- Towards High Performance Resilience Using Performance Portable AbstractionsNicolas Morales, Keita Teranishi, Bogdan Nicolae, Christian Trott, Franck Cappello. 451-465 [doi]
- Enhancing Load-Balancing of MPI Applications with WorkshareThomas Dionisi, Stéphane Bouhrour, Julien Jaeger, Patrick Carribault, Marc Pérache. 466-481 [doi]
- Particle-In-Cell Simulation Using Asynchronous TaskingNicolas Guidotti, Pedro Ceyrat, João Barreto 0001, José Monteiro, Rodrigo Rodrigues 0001, Ricardo Fonseca, Xavier Martorell, Antonio J. Peña. 482-498 [doi]
- Exploiting Co-execution with OneAPI: Heterogeneity from a Modern PerspectiveRaúl Nozal, José Luis Bosque. 501-516 [doi]
- Designing a 3D Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore SystemsYuankun Fu, Fengguang Song. 519-535 [doi]
- Fault-Tolerant LU Factorization Is Low CostCamille Coti, Laure Petrucci, Daniel Alberto Torres González. 536-549 [doi]
- Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUsFritz Göbel, Thomas Grützmacher, Tobias Ribizel, Hartwig Anzt. 550-564 [doi]
- Outsmarting the Atmospheric Turbulence for Ground-Based Telescopes Using the Stochastic Levenberg-Marquardt MethodYuxi Hong, El Houcine Bergou, Nicolas Doucet, Hao Zhang, Jesse Cranney, Hatem Ltaief, Damien Gratadour, François Rigaut, David E. Keyes. 565-579 [doi]
- GPU-Accelerated Mahalanobis-Average Hierarchical Clustering AnalysisAdam Smelko, Miroslav Kratochvíl, Martin Krulis, Tomás Sieger. 580-595 [doi]
- PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory HierarchyVladimir Dimic, Miquel Moretó, Marc Casas, Mateo Valero. 599-615 [doi]
- Optimized Implementation of the HPCG Benchmark on Reconfigurable HardwareAlberto Zeni, Kenneth O'Brien, Michaela Blott, Marco D. Santambrogio. 616-630 [doi]