Abstract is missing.
- Scalable Instruction-Level Parallelism Through Tree-InstructionsJaime H. Moreno, Mayan Moudgill. 1-11 [doi]
- Increasing Memory Bandwidth with Wide Buses: Compiler, Hardware and Performance Trade-OffsDavid López, Mateo Valero, Josep Llosa, Eduard Ayguadé. 12-19 [doi]
- Implementation of Collective I/O in the Intel Paragon Parallel File System: Initial ExperiencesRajesh Bordawekar. 20-27 [doi]
- Optimizing Collective I/O Performance on Parallel Computers: A Multisystem StudyYing Chen, Jarek Nieplocha, Ian T. Foster, Marianne Winslett. 28-35 [doi]
- Performance Improvement through Overhead Analysis: A Case Study in Molecular DynamicsGraham D. Riley, J. Mark Bull, John R. Gurd. 36-43 [doi]
- Generalized Cannon s Algorithm for Parallel Matrix MultiplicationHyuk-Jae Lee, James P. Robertson, José A. B. Fortes. 44-51 [doi]
- A Highly Accurate Fast Solver for Helmholtz EquationsXian-He Sun, Yu Zhuang. 52-59 [doi]
- Data Caches for Superscalar ProcessorsToni Juan, Juan J. Navarro, Olivier Temam. 60-67 [doi]
- Improving Data Cache Performance by Pre-Executing Instructions Under a Cache MissJames Dundas, Trevor N. Mudge. 68-75 [doi]
- Eliminating Cache Conflict Misses through XOR-Based Placement FunctionsAntonio González, Mateo Valero, Nigel P. Topham, Joan-Manuel Parcerisa. 76-83 [doi]
- CP-PACS: A Massively Parallel Processor for Large Scale Scientific CalculationsTaisuke Boku, Ken ichi Itakura, Hiroshi Nakamura, Kisaburo Nakazawa. 108-115 [doi]
- A Methodology for Specifying Data Distribution Using Only Standard Object-Oriented FeaturesNaohito Sato, Satoshi Matsuoka, Jean-Marc Jézéquel, Akinori Yonezawa. 116-123 [doi]
- HPC++: Experiments with the Parallel Standard Template LibraryElizabeth Johnson, Dennis Gannon. 124-131 [doi]
- Impact of Selection Functions on Routing Algorithm Performance in Multicomputer NetworksWu-chang Feng, Kang G. Shin. 132-139 [doi]
- Performance Benefits of Virtual Channels and Adaptive Routing: An Application-Driven StudyAniruddha S. Vaidya, Anand Sivasubramaniam, Chita R. Das. 140-147 [doi]
- Distributed Shared Memory Systems with Improved Barrier Synchronization and Data TransferNian-Feng Tzeng, Angkul Kongmunvattana. 148-155 [doi]
- A Graph Based Approach to Barrier Synchronisation MinimisationElena Stöhr, Michael F. P. O Boyle. 156-163 [doi]
- Performance Evaluation of Message-Driven Parallel VLSI CAD Applications on General Purpose MultiprocessorsJohn G. Holm, John A. Chandy, Steven Parkes, Sumit Roy, Venkatram Krishnaswamy, Gagan Hasteer, Prithviraj Banerjee. 172-179 [doi]
- Incorporating Application Dependent Information in an Automatic Code Generating EnvironmentRobert van Engelen, Ilja Heitlager, Lex Wolters, Gerard Cats. 180-187 [doi]
- Sparse Code Generation for Imperfectly Nested Loops with DependencesVladimir Kotlyar, Keshav Pingali. 188-195 [doi]
- Speculative Execution via Address Prediction and Data PrefetchingJosé González, Antonio González. 196-203 [doi]
- Adaptive Data Prefetching Using Cache InformationAndo Ki, Alan E. Knowles. 204-212 [doi]
- Performance considerations in software multicastsJörg Cordsen, Hans Werner Pohl, Wolfgang Schröder-Preikschat. 213-220 [doi]
- Iteration Space Slicing and Its Application to Communication OptimizationWilliam Pugh, Evan Rosser. 221-228 [doi]
- Compiler and Run-Time Support for Semi-Structured ApplicationsNikos Chrisochoides, Induprakas Kodukula, Keshav Pingali. 229-236 [doi]
- Conflict-Free Template Access in ::::k::::-ary and Binomial TreesMaria Cristina Pinotti, Sajal K. Das, Falguni Sarkar. 237-244 [doi]
- Design and Performance of the Shasta Distributed Shared Memory ProtocolDaniel J. Scales, Kourosh Gharachorloo. 245-252 [doi]
- An I/O Network Architecture of the Distributed Shared-Memory Massively Parallel Computer JUMP-1Hironori Nakajo, Satoshi Ohtani, Takashi Matsumoto, Masadi Kohata, Kei Hiraki, Yukio Kaneda. 253-260 [doi]
- Symbolic Evaluation for Parallelizing CompilersThomas Fahringer, Bernhard Scholz. 261-268 [doi]
- A Compiler Algorithm for Optimizing Locality in Loop NestsMahmut T. Kandemir, J. Ramanujam, Alok N. Choudhary. 269-276 [doi]
- Compile-Time Minimisation of Load Imbalance in Loop NestsRizos Sakellariou, John R. Gurd. 277-284 [doi]
- Implementation and Analysis of Path History in Dynamic Branch Prediction SchemesShlomo Reches, Shlomo Weiss. 285-292 [doi]
- A Victim Cache for Vector RegistersRoger Espasa, Mateo Valero. 293-300 [doi]
- Performance Analysis of Tree VLIW Architecture for Exploiting Branch ILP in Non-Numerical CodeSoo-Mook Moon, Kemal Ebcioglu. 301-308 [doi]
- Non-Singular Data Transformations: Definition, Validity and ApplicationsMichael F. P. O Boyle, Peter M. W. Knijnenburg. 309-316 [doi]
- Cache Miss Equations: An Analytical Representation of Cache MissesSomnath Ghosh, Margaret Martonosi, Sharad Malik. 317-324 [doi]
- Adaptive Migratory Scheme for Distributed Shared MemoryJai-Hoon Kim, Nitin H. Vaidya. 325-332 [doi]
- Developing Architecture Adaptive Algorithms Using Simulation with MISS-PVM for Performance PredictionDieter F. Kvasnicka, Christoph W. Ueberhuber. 333-339 [doi]
- Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Coding MethodologyJeff Bilmes, Krste Asanovic, Chee-Whye Chin, James Demmel. 340-347 [doi]