Journal: CCF Trans. High Perform. Comput.

Volume 6, Issue 6

549 -- 565Jie Liu, Yizhuo Wang, Jianhua Gao, Weixing Ji. pSpMv: precision-based sparse matrix partition and SpMV optimization
566 -- 587Xinjie Wang, Guanghao Ma, Jiaying Song, Mingyao Geng, Wenhui Hu, Xi Duan, Zhigang Wang, Jiali Xu, Xiaogang Jin, Fang Li, Dexun Chen, Maoxue Yu. Heterogeneous many-core optimization for Monte Carlo path-tracing on new generation Sunway HPC system
588 -- 607Jianfeng Liu, Jianbin Fang, Ting Wang 0009, Jing Xie, Chun Huang, Zheng Wang 0001. Efficient compiler optimization by modeling passes dependence
608 -- 631Zhangyu Liu, Jinqiu Wang, Huijun Wu, Qingzhen Ma, Lin Peng, Zhanyong Tang. Auto-tuning for HPC storage stack: an optimization perspective
632 -- 645Zhenshan Bao, Mengyuan Wang, Wei Bai, Wenbo Zhang 0003. Multi-index federated aggregation algorithm based on trusted verification
646 -- 664Zheng Liu, Meng Hao, Weizhe Zhang, Gangzhao Lu, Xueyang Tian, Siyu Yang, Mingdong Xie, Jie Dai, Chenyu Yuan, Desheng Wang 0002, Hongwei Yang. Optimizing depthwise separable convolution on DCU

Volume 6, Issue 5

459 -- 471Wenxiang Yang, Jie Yu 0006. Trade-off topology design for hierarchical network based on job characteristics
472 -- 487D. Sirisha, S. Sambhu Prasad. CPTF-a new heuristic based branch and bound algorithm for workflow scheduling in heterogeneous distributed computing systems
488 -- 502Yu Hu, Ziteng Li, Jianfeng Li, Junbo Tie, Lei Wang 0011. A security JPEG image system accelerated by NEON technology based on FT-2000/4
503 -- 518Mouzhi Yang, Peng Zhang 0061, Jianbin Fang, Weifeng Liu 0002, Chun Huang. thSORT: an efficient parallel sorting algorithm on multi-core DSPs
519 -- 532Jie Jia, Xinyuan Lin, Fang Lin, Yi Liu 0013. DCU-CHK: checkpointing for large-scale CPU-DCU heterogeneous computing systems
533 -- 548Zhewen Xu, Xiaohui Wei 0002, Jieyun Hao, Jiale Li, Hongliang Li 0003, Zhaohui Ding, Sicong Li. HiRM: Hierarchical resource management for earth system models on many-core clusters

Volume 6, Issue 4

365 -- 377Zhengxiong Hou, Hong Shen 0001, Qiying Feng, Zhiqi Lv, Junwei Jin, Xingshe Zhou 0001, Jianhua Gu. Optimizing job scheduling by using broad learning to predict execution times on HPC clusters
378 -- 396Moirangthem Goldie Meitei, Ningrinla Marchang. Altruistic user-oriented task allocation techniques for mobile crowdsensing
397 -- 407Chenyang Jiao, Zhikai Qin, Li Shen 0007. ScalaQC: a scalability optimization framework for full-state quantum simulation on CPU+GPU heterogeneous clusters
408 -- 424Huming Zhu, Chendi Liu, Qiuming Li, Lingyun Zhang, Libing Wang, Sifan Li, Licheng Jiao, Biao Hou. Deep convolutional encoder-decoder networks based on ensemble learning for semantic segmentation of high-resolution aerial imagery
425 -- 438Riku Nunokawa, Yoichi Shimomura, Mulya Agung, Ryusuke Egawa, Hiroyuki Takizawa. Conflict-aware workload co-execution on SX-aurora TSUBASA
439 -- 458Maoxue Yu, Guanghao Ma, Zhuoya Wang, Shuai Tang, Yuhu Chen, Yucheng Wang, Yuanyuan Liu, Dongning Jia, Zhiqiang Wei. swCUDA: Auto parallel code translation framework from CUDA to ATHREAD for new generation sunway supercomputer

Volume 6, Issue 3

241 -- 242Jianbin Fang, Jidong Zhai, Zheng Wang 0001. Editorial for the special issue on programming models and system software for High-Performance Computing (HPC) environments
243 -- 262Junsheng Chang, Kai Lu, Yang Guo 0003, Yongwen Wang, Zhenyu Zhao, Libo Huang, Hongwei Zhou, Yao Wang 0002, Fei Lei, Biwei Zhang. A survey of compute nodes with 100 TFLOPS and beyond for supercomputers
263 -- 273Jianfeng Liu, Wangrong Gao, Hanzheng Liang, Lin Peng, Ting Wang 0009. Towards a universal and portable assembly code size reduction: a case study of RISC-V ISA
274 -- 286Haoran Lin, Lifeng Yan, Qixin Chang, Haitian Lu, Chenlin Li, Quanjie He, Zeyu Song, Xiaohui Duan, Zekun Yin, Yuxuan Li, Zhao Liu, Wei Xue, Haohuan Fu, Lin Gan, Guangwen Yang, Weiguo Liu. O2ath: an OpenMP offloading toolkit for the sunway heterogeneous manycore platform
287 -- 300Yicheng Sui, Yufei Sun, Changqing Shi, Haotian Wang, Zhiqiang Zhang, Jiahao Wang, Yuzhi Zhang. Opencl-pytorch: an OpenCL-based extension of PyTorch
301 -- 318Juncheng Hu, Xilong Che, Bowen Kan, Yuhan Shao. LS-HTC: an HTC system for large-scale jobs
319 -- 329Changqing Shi, Yufei Sun, Yicheng Sui, Yuqiao Chen, Haotian Wang, Yuzhi Zhang. oclCUB: an OpenCL parallel computing library for deep learning operators
330 -- 342Zongjing Chen, Kangjin Huang, Yonggang Che, Chuanfu Xu, Jian Zhang, Zhe Dai, Ming Li. Extending OP2 framework to support portable parallel programming of complex applications
343 -- 364Shaojie Tan, Qingcai Jiang, Zhenwei Cao, Xiaoyu Hao, Junshi Chen, Hong An. Uncovering the performance bottleneck of modern HPC processor with static code analyzer: a case study on Kunpeng 920

Volume 6, Issue 2

113 -- 114Shanjiang Tang, Yusen Li. Editorial for the special issue on heterogenous computing
115 -- 129Yang Xiao, Zeke Wang. AIbench: a tool for benchmarking Huawei ascend AI processors
130 -- 149Shanjiang Tang, Ziyi Wang, Ce Yu, Chao Sun, Yusen Li, Jian Xiao 0001. Fast and accurate novelty detection for large surveillance video
150 -- 163Xinyang Shen, Yu Huang 0013, Long Zheng 0003, Xiaofei Liao, Hai Jin 0001. A heterogeneous 3-D stacked PIM accelerator for GCN-based recommender systems
164 -- 178Gang Liu 0028, Zeting Wang, Amelie Chi Zhou, Rui Mao 0001. Adaptive key partitioning in distributed stream processing
179 -- 191Shiyang Li, Jingyu Zhu, Jiaxun Han, Yuting Peng, Zhuoran Wang, Xiaoli Gong, Gang Wang 0001, Jin Zhang 0003, Xuqiang Wang. OneGraph: a cross-architecture framework for large-scale graph computing on GPUs based on oneAPI
192 -- 205Yuanwei Sun, Haikun Liu, Xiaofei Liao, Hai Jin 0001, Yu Zhang 0027. FPGA-based acceleration architecture for Apache Spark operators
206 -- 220Yani Liu, Feng Zhang 0007, Zaifeng Pan, Xiaoguang Guo, Yihua Hu, Xiao Zhang 0001, Xiaoyong Du 0001. Compressed data direct computing for Chinese dataset on DCU
221 -- 239Yu Lu, Ce Yu, Jian Xiao 0001, Hao Wang, Hao Fu, Bo Kang, Gang Zheng. A large-scale heterogeneous computing framework for non-uniform sampling two-dimensional convolution applications

Volume 6, Issue 1

1 -- 2Yunquan Zhang, Guangming Tan, Liang Yuan. Special issue of HPCChina 2023
3 -- 16Yidong Chen, Jingshan Pan, Zidong Han, Yonghong Hu, Meng Guo, Zhonghua Lu. BSPADMM: block splitting proximal ADMM for sparse representation with strong scalability
17 -- 31Yueyuan Zhou, ZiYi Ren, En Shao, Lixian Ma, Qiang Hu, Leping Wang, Guangming Tan. FILL: a heterogeneous resource scheduling system addressing the low throughput problem in GROMACS
32 -- 44Lu Bai, Weixing Ji, Qinyuan Li, Xilai Yao, Wei Xin, Wanyi Zhu. ConvDarts: a fast and exact convolutional algorithm selector for deep learning frameworks
45 -- 53Hang Cao, Cheng Xu, Yunqi Han, Muhui Lin, Kai Shen, Geng Wang, Jinhu Li, Xiangzheng Sun, Ronghui He, Liang You, Hang Yang, Xiantao Zhang. An efficient cloud-based elastic RDMA protocol for HPC applications
54 -- 67Haoyuan Zhang, Wenpeng Ma, Wu Yuan 0002, Jian Zhang, Zhonghua Lu. Mixed-precision block incomplete sparse approximate preconditioner on Tensor core
68 -- 77Dazheng Liu, Wenjuan Liu, Liangrui Pan, Yutao Dou, Jianping Wu. Optimization of the parallel semi-Lagrangian scheme to overlap computation with communication based on grouping levels in YHGSM
78 -- 93Yang Wang, Qinglin Wang, Xiangdong Pei, Songzhu Mei, Rongchun Li, Jie Liu 0002. High performance dilated convolutions on multi-core DSPs
94 -- 111Zhengxian Lu, Chengkun Du, Yanfeng Jiang, Xueshuo Xie, Tao Li, Fei Yang. Quantitative evaluation of deep learning frameworks in heterogeneous computing environment