Abstract is missing.
- Adding a vector unit to a superscalar processorFrancisca Quintana, Jesús Corbal, Roger Espasa, Mateo Valero. 1-10 [doi]
- Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technologyHuy Nguyen, Lizy Kurian John. 11-20 [doi]
- Improving the performance of speculatively parallel applications on the Hydra CMPKunle Olukotun, Lance Hammond, Mark Willey. 21-30 [doi]
- The pool of subsectors cache designJeffrey B. Rothman, Alan Jay Smith. 31-42 [doi]
- Symmetry and performance in consistency protocolsPeter J. Keleher. 43-50 [doi]
- A locality sensitive multi-module cache with explicit managementF. Jesús Sánchez, Antonio González. 51-59 [doi]
- A new quad-tree-based sub-system allocation technique for mesh-connected parallel machinesJeeraporn Srisawat, Nikitas A. Alexandridis. 60-67 [doi]
- On the complexity of list scheduling algorithms for distributed-memory systemsAndrei Radulescu, Arjan J. C. van Gemund. 68-75 [doi]
- Communication conscious radix sortDaniel Jiménez-González, Josep-Lluis Larriba-Pey, Juan J. Navarro. 76-82 [doi]
- Eliminating synchronization bottlenecks in object-based programs using adaptive replicationMartin C. Rinard, Pedro C. Diniz. 83-92 [doi]
- Responsiveness without interruptsDejan Perkovic, Peter J. Keleher. 101-108 [doi]
- Reducing branch misprediction penalties via dynamic control independence detectionYuan C. Chou, Jason Fung, John Paul Shen. 109-118 [doi]
- Software trace cacheAlex Ramírez, Josep-Lluis Larriba-Pey, Carlos Navarro, Josep Torrellas, Mateo Valero. 119-126 [doi]
- Cyclic dependence based data reference predictionChi-Hung Chi, Jun-Li Yuan, Chin-Ming Cheung. 127-134 [doi]
- CACHET: an adaptive cache coherence protocol for distributed shared-memory systemsXiaowei Shen, Arvind, Larry Rudolph. 135-144 [doi]
- Adapting cache line size to application behaviorAlexander V. Veidenbaum, Weiyu Tang, Rajesh K. Gupta, Alexandru Nicolau, Xiaomei Ji. 145-154 [doi]
- Reducing cache misses using hardware and software page placementTimothy Sherwood, Brad Calder, Joel S. Emer. 155-164 [doi]
- Application scaling under shared virtual memory on a cluster of SMPsDongming Jiang, Brian O Kelley, Xiang Yu, Sanjeev Kumar, Angelos Bilas, Jaswinder Pal Singh. 165-174 [doi]
- Shared virtual memory with automatic update supportLiviu Iftode, Matthias A. Blumrich, Cezary Dubnicki, David L. Oppenheimer, Jaswinder Pal Singh, Kai Li. 175-183 [doi]
- Realizing the performance potential of the virtual interface architectureEvan Speight, Hazim Abdel-Shafi, John K. Bennett. 184-192 [doi]
- Low-level router design and its impact on supercomputer system performanceValentin Puente, José A. Gregorio, Cruz Izu, Ramón Beivide, Fernando Vallejo. 193-201 [doi]
- Improving the performance of bristled CC-NUMA systems using virtual channels and adaptivityJosé F. Martínez, Josep Torrellas, José Duato. 202-209 [doi]
- A new method to make communication latency uniform: distributed routing balancingD. Franco, I. Garcés, Emilio Luque. 210-219 [doi]
- An affine partitioning algorithm to maximize parallelism and minimize communicationAmy W. Lim, Gerald I. Cheong, Monica S. Lam. 228-237 [doi]
- A graphic parallelizing environment for user-compiler interactionClaudia Roberta Calidonna, Maurizio Giordano, Mario Mango Furnari. 238-245 [doi]
- Dynamic remote memory acquisition for parallel data mining on ATM-connected PC clusterMasato Oguchi, Masaru Kitsuregawa. 246-252 [doi]
- Parallel I/O for scientific applications on heterogeneous clusters: a resource-utilization approachYong E. Cho, Marianne Winslett, Szu-Wen Kuo, Jonghyun Lee, Ying Chen. 253-259 [doi]
- The design and evaluation of high performance communication using a Gigabit EthernetShinji Sumimoto, Hiroshi Tezuka, Atsushi Hori, Hiroshi Harada, Toshiyuki Takahashi, Yutaka Ishikawa. 260-267 [doi]
- The scalability of multigrain systemsDonald Yeung. 268-277 [doi]
- A comparative analysis of four parallelisation schemesNandini Mukherjee, John R. Gurd. 278-285 [doi]
- A design analysis of a hybrid technology multithreaded architecture for petaflops scale computation3Thomas L. Sterling, Larry A. Bergman. 286-293 [doi]
- Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessorsXavier Martorell, Eduard Ayguadé, Nacho Navarro, Julita Corbalán, Marc González, Jesús Labarta. 294-301 [doi]
- SMARTS: exploiting temporal locality and parallelism through vertical executionSuvas Vajracharya, Steve Karmesin, Peter H. Beckman, James Crotinger, Allen D. Malony, Sameer Shende, R. R. Oldehoeft, Stephen Smith. 302-310 [doi]
- Problem space promotion and its evaluation as a technique for efficient parallel computationBradford L. Chamberlain, E. Christopher Lewis, Lawrence Snyder. 311-318 [doi]
- A quantitative architectural evaluation of synchronization algorithms and disciplines on ccNUMA systems: the case of the SGI Origin2000Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou. 319-328 [doi]
- A comparison of MPI, SHMEM and cache-coherent shared address space programming models on the SGI Origin2000Hongzhang Shan, Jaswinder Pal Singh. 329-338 [doi]
- Comparing the memory system performance of the HP V-class and SGI Origin 2000 multiprocessors using microbenchmarks and scientific applicationsRavi R. Iyer, Nancy M. Amato, Lawrence Rauchwerger, Laxmi N. Bhuyan. 339-347 [doi]
- Increasing effective IPC by exploiting distant parallelismIvan Martel, Daniel Ortega, Eduard Ayguadé, Mateo Valero. 348-355 [doi]
- Clustered speculative multithreaded processorsPedro Marcuello, Antonio González. 365-372 [doi]
- Fast cluster failover using virtual memory-mapped communicationYuanyuan Zhou, Peter M. Chen, Kai Li. 373-382 [doi]
- Performance impact of proxies in data intensive client-server applicationsMichael D. Beynon, Alan Sussman, Joel H. Saltz. 383-390 [doi]
- A comparison of two approaches for independent scaling up of processing and communication capacities in multicomputer networksA. Ferre-Vilaplana, José M. Bernabéu-Aubán. 391-398 [doi]
- Reorganizing global schedules for register allocationGang Chen, Michael D. Smith. 408-416 [doi]
- Improving memory hierarchy performance for irregular applicationsJohn M. Mellor-Crummey, David B. Whalley, Ken Kennedy. 425-433 [doi]
- High-level semantic optimization of numerical codesVijay Menon, Keshav Pingali. 434-443 [doi]
- Nonlinear array layouts for hierarchical memory systemsSiddhartha Chatterjee, Vibhor V. Jain, Alvin R. Lebeck, Shyam Mundhra, Mithuna Thottethodi. 444-453 [doi]
- Microservers: a new memory semantics for massively parallel computingJay B. Brockman, Peter M. Kogge, Thomas L. Sterling, Vincent W. Freeh, Shannon K. Kuntz. 454-463 [doi]
- Efficient management of memory hierarchies in embedded DRAM systemsAshley Saulsbury, Su-Jaen Huang, Fredrik Dahlgren. 464-473 [doi]
- Dynamic removal of redundant computationsCarlos Molina, Antonio González, Jordi Tubella. 474-481 [doi]
- A tile selection algorithm for data locality and cache interferenceJacqueline Chame, Sungdo Moon. 492-499 [doi]
- An integer linear programming approach for optimizing cache localityMahmut T. Kandemir, Prithviraj Banerjee, Alok N. Choudhary, J. Ramanujam, Eduard Ayguadé. 500-509 [doi]