ΣΦΗΜΜΥ 14





Christos Antonopoulos

Christos Antonopoulos is an Associate Professor at the Electrical & Computer Engineering (ECE) Department of the University of Thessaly in Volos, Greece and currently serves as the director of Computer Systems Lab (https://csl.e-ce.uth.gr). His research interests span the areas of system and applications software, emphasizing on run-time monitoring and adaptivity with performance and power optimization criteria, both at the granularity of a single node and at scale. He is also working on improving the programmability of accelerator-based heterogeneous systems. For his research he has been awarded three best-paper awards, in IWOMP '05, PPoPP '07 and Computing Frontiers ’21 conferences. Previously, he served as a postdoctoral research associate at the Computer Science Department of the College of William and Mary, VA, USA, and as a Visiting Assistant Professor at ECE-UTH. Prof. Antonopoulos teaches both undergraduate and graduate courses on Programming, Operating Systems and Programming of High-Performance Systems. He has been actively involved in several research projects both in the EU and in USA. He earned his PhD, MSc and Diploma from the Computer Engineering and Informatics Department of the University of Patras, Greece.

ML-managed compute systems. From the node- to the datacenter-level and beyond

The high complexity, heterogeneity and – often – extreme scale of modern compute systems aggravates the already challenging task of system management, making the traditional human-in-the-loop approach unrealistic, and rule-based approaches inefficient. AI-driven management is a promising alternative. For example, power and/or energy efficiency are first-class optimization targets when designing modern software and hardware. However, modern processors are inherently heterogeneous, due to manufacturing variability. Even within the same chip, different cores have different minimum error-free operating points in terms of voltage and frequency. Hardware manufacturers typically follow a conservative approach, specifying the same operating point for all components of the same family, including significant guardbands to guarantee error-free operation. We will discuss machine learning-based methodologies for “just-right” processor supply voltage configuration, at runtime, according to the characteristics of software and hardware. We find that these methodologies can significantly improve power- and energy-efficiency, even for complex software stacks, without performance loss or unacceptable adverse effects to output quality. In datacenters, VM consolidation is yet another knob towards energy efficiency. We extend our ML-based methodology from the node-level to the datacenter-scale and evaluate whether this is a viable approach, considering both the gains due to the reduction of energy consumption and the cost of potential SLA (service level agreement) violations. Finally, we discuss our vision towards ML-based system management in the edge-cloud continuum. The latter represents an extreme case of “compute system” in terms of scale, heterogeneity and volatility of resources, necessitating a hierarchical and continuously trained ML architecture.

Take me back to speakers!