The goal of this lecture is to present and compare the features of some of the most popular Big Data programming frameworks for HPC systems, such as Hadoop, Spark, Storm, Airflow, MPI, Hive, and Pig, by highlighting their models, features, advantages, and disadvantages in developing different classes of programs, such as batch, streaming, graph-based, and query-based applications. The lecture will also discuss the main factors that can influence the choice of the most appropriate framework to process and analyze big data. These include the characteristics of input data, the class of application, the infrastructure scale, and many other factors that can determines, to some extent, the decisions of developers, including designer skills, tool community size, data privacy issues, security requirements, available budget, integrability, and interoperability.
Speakers
Domenico Talia – University of Calabria & CINI lab HPC-KTT
Domenico Talia is a professor of computer engineering at the University of Calabria and an honorary professor at Amity University. He is a Senior Associate Editor of ACM Computing Surveys, an Associate Editor of Computer, and a member of the editorial board of Future Generation Computer Systems, IEEE Transactions on Parallel and Distributed Systems, the International Journal of Web and Grid Services, the Journal of Cloud Computing, and the International Journal of Next-Generation Computing. His research interests include HPC, Big Data, machine learning, parallel and distributed data analysis, cloud computing, social media analysis, distributed knowledge discovery, peer-to-peer systems, and concurrent programming models. He has authored several books and more than 400 scientific papers.
Paolo Trunfio – University of Calabria & CINI lab HPC-KTT
Paolo Trunfio is a professor of computer engineering at the Department of Computer Engineering, Modeling, Electronics, and Systems (DIMES) at the University of Calabria. He is serving as an Associate Editor of “ACM Computing Surveys” and “Journal of Big Data,” as well as a member of the Editorial Board of several scientific journals, including “Future Generation Computer Systems”, the “International Journal of Web Information Systems”, the “International Journal of Parallel, Emergent and Distributed Systems” and “Big Data and Cognitive Computing”. His research activities are primarily focused on data analysis systems, machine learning systems, parallel and distributed computing, grid, cloud, and mobile systems, and programming models for high-performance computing (HPC) systems.
Event Timeslots (1)
Wed 18 – Programming Models & Tools
-
D. Talia, P. Trunfio (University of Calabria & CINI lab HPC-KTT)