Optimizing Infrastructure Performance and Cost Efficiency for Big Data Processing on High Performance Compute Clusters

Nikolas Williams; Diana Chen

Vol. 15 No. 7 (2023): Emerging Trends in Machine Intelligence and Big Data - 157

Articles

Optimizing Infrastructure Performance and Cost Efficiency for Big Data Processing on High Performance Compute Clusters

PDF

Nikolas Williams,
Diana Chen

more info

Nikolas Williams
Eötvös Loránd University

Diana Chen
Department of Computer Science, University of Ruse

Published 2023-07-08

Keywords

HPC,
Accelerators,
Networking,
Storage,
Optimization,
Benchmarking

...

How to Cite

Williams, N., & Chen, D. (2023). Optimizing Infrastructure Performance and Cost Efficiency for Big Data Processing on High Performance Compute Clusters. Emerging Trends in Machine Intelligence and Big Data, 15(7), 18–27. Retrieved from https://orientreview.com/index.php/etmibd-journal/article/view/28

Abstract

With the exponential growth of big data, organizations are adopting high performance computer (HPC) clusters to process large volumes of data efficiently. However, optimizing these clusters for cost and performance remains a key challenge. This paper analyzes the critical infrastructure considerations for big data HPC deployments, specifically focusing on optimizing processing speeds, storage capacity, network fabric, accelerators like GPUs and workload management. It evaluates various cluster architectures from leading cloud providers and recommends best practices for organizations to maximize ROI. Key results demonstrate how choosing appropriate instance families, storage tiers, interconnection fabrics like InfiniBand and optimization features like auto-scaling and spot instances can significantly improve price-performance. For example, GPU and FPGA-accelerated clusters lowered big data processing costs per terabyte by 42-55% over traditional CPU-based infrastructure while maintaining over 80% utilization rates. The analysis also shows that combining low latency RDMA networks like InfiniBand with scale-out architectures can double throughput speeds for memory-intensive workloads. Together, these infrastructure optimizations enabled organizations to achieve the twin goals of accelerating big data time-to-insight while lowering total cost of ownership.

PDF

Optimizing Infrastructure Performance and Cost Efficiency for Big Data Processing on High Performance Compute Clusters

Keywords

How to Cite

Download Citation

Abstract