DATA MANAGEMENT DAY @ UNIVERSITY OF CYPRUS

10.00 - 10.05 Opening

Opening

10.05 - 10.40 Talk

Commemoration

Andreas Pitsilides (University of Cyprus)

10.40 - 11.15 Break

Coffee Break

University of Cyprus

11.15 - 12.15 Talk

Adaptive Stream Aggregation Processing: To Divide, To Drop, or To Distribute?

Panos K. Chrysanthis (University of Pittsburgh)

Online analytics and real-time data processing in most advanced IoT, scientific, business, and defense applications, rely heavily on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). ACQs continuously aggregate streaming data and periodically produce results such as max or average over a given window of the latest data. Low operational cost and timely processing are of paramount importance in these applications. To meet these requirements under fixed resources, our Advanced Database Management Technologies Lab has developed adaptive algorithms for stream partitioning (divide), load shedding (drop) and query migration (distribute) for window-based aggregations. In this talk, we will visit the key question "To Divide, To Drop, or To Distribute?", introducing our recent contributions: (1) Aggregation-driven Partitioning that adapts the partitioning policy based on aggregation cost, (2) Concept-driven load shedding that adapts grouped aggregations to changing workloads and (3) Uninterruptible Migration of Continuous Queries without Operator State Migration.

12.15 - 13.30 Break

Break

University of Cyprus

13.30 - 14.15 Talk

Reactors: A Case for Predictable, Virtualized Actor Database Systems

Marcos Antonio Vaz Salles (University of Copenhagen)

The requirements for OLTP database systems are becoming ever more demanding. Domains such as finance and computer games increasingly mandate that developers be able to encode complex application logic and control transaction latencies in in-memory databases. At the same time, infrastructure engineers in these domains need to experiment with and deploy OLTP database architectures that ensure application scalability and maximize resource utilization in modern machines. In this work, we propose a relational actor programming model for in-memory databases as a novel, holistic approach towards fulfilling these challenging requirements. Conceptually, relational actors, or reactors for short, are application-defined, isolated logical actors that encapsulate relations and process function calls asynchronously. Reactors ease reasoning about correctness by guaranteeing serializability of application-level function calls. In contrast to classic transactional models, however, reactors allow developers to take advantage of intra-transaction parallelism and state encapsulation in their applications to reduce latency and improve locality. Moreover, reactors enable a new degree of flexibility in database deployment. We present ReactDB, a system design exposing reactors that allows for flexible virtualization of database architecture between the extremes of shared-nothing and shared-everything without changes to application code. Our experiments illustrate latency control, low overhead, and asynchronicity trade-offs with ReactDB in OLTP benchmarks. This joint work with Vivek Shah at the University of Copenhagen has been presented at ACM SIGMOD 2018.

14.15 - 15.00 Talk

Modeling and Building IoT Data Platforms with Actor-Oriented Databases

Yongluan Zhou (University of Copenhagen)

Vast amounts of data are being generated daily with the adoption of Internet-of-Things (IoT) solutions in an ever-increasing number of application domains. There are problems associated with all stages of the lifecycle of these data (e.g., capture, curation and preservation). Moreover, the volume, variety, dynamicity and ubiquity of IoT data present additional challenges to their usability, prompting the need for constructing scalable data-intensive IoT data management and processing platforms. In this talk, I will present our recent efforts in modelling and building IoT data platforms based on an Actor-Oriented Database (AODB). We take advantage of two complementary case studies – in structural health monitoring and beef cattle tracking and tracing – to describe novel software requirements introduced by IoT data processing. Our investigation illustrates the challenges and benefits provided by AODB to meet these requirements in terms of modeling and IoT-based systems implementation. Obtained results reveal the advantages of using AODB in IoT scenarios and lead to principles on how to effectively use an actor model to design and implement IoT data platforms.

15.00 - 15.30 Break

Coffee Break

University of Cyprus

15.30 - 16.00 Talk

A Comparison of Self-configuring Approaches for Big Data Analytics Systems

Herodotos Herodotou (Cyprus University of Technology)

Big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators struggle to understand and tune them to achieve good performance. There has been a significant amount of research addressing this problem by providing self-configuring features in big data analytics systems, following either a cost modeling, simulation-based, experiment-driven, machine learning, or adaptive approach. In this talk, I describe the foundations and a comparison of the different self-configuring approaches as well as open research challenges towards the development of self-configuring big data platforms that are easier to use, simpler to maintain, and more robust in their operating characteristics.

16.00 - 16.30 Talk

Query-Driven Descriptive Analytics for IoT and Edge Computing

George Pallis (University of Cyprus)

With consumers embracing the prevalence of ubiquitously connected smart devices, edge computing is emerging as a principal computing paradigm for latency-sensitive and in-proximity services. However, as the plethora of data generated across connected devices continues to vastly increase, the need to query the “edge” and derive in-time analytic insights is more evident than ever. This talk presents our vision for a rich and declarative query model abstraction particularly tailored for the unique characteristics of edge computing and presents a prototype framework that realizes our vision. Towards this, we present StreamSight, a framework for edge-enabled IoT services which provides a rich and declarative query model abstraction for expressing complex analytics on monitoring data streams and then dynamically compiling these queries into stream processing jobs for continuous execution on distributed processing engines. To overcome the resource restrictive barriers in edge computing deployments, StreamSight outputs the query execution plan so that intermediate results are reused and not continuously recomputed. In turn, StreamSight enables users to express various optimization strategies (e.g., approximate answers, query prioritization) and constraints (e.g., sample size, error-bounds) so that delay-sensitive requirements relevant to the deployment are not violated. We evaluate our framework on Apache Spark with real-world workloads and show that leveraging StreamSight can significantly increase performance by at least 4× while still satisfying all accuracy guarantees. We conclude by presenting a number of potential use-cases which stand to benefit from the realization of query-driven descriptive analytics for edge computing.

16.30 - 17.00 Talk

Decaying Telco Big Data with Data Postdiction

Demetris Zeinalipour (University of Cyprus)

A Telecommunication company (Telco) is traditionally only perceived as the entity that provides telecommunication services, such as telephony and data communication access to users. However, the radio and backbone infrastructure of such entities spanning densely most urban spaces and widely most rural areas, provides nowadays a unique opportunity to collect immense amounts of data that capture a variety of natural phenomena on an ongoing basis, e.g., traffic, commerce, mobility patterns and emergency response. In this talk, I will present a novel decaying operator for Telco Big Data (TBD), coined TBD-DP (Data Postdiction). Unlike data prediction, which aims to make a statement about the future value of some tuple, our formulated data postdiction term, aims to make a statement about the past value of some tuple, which doesn’t exist anymore as it had to be deleted to free up disk space. TBD-DP relies on existing Machine Learning (ML) algorithms to abstract TBD into compact models that can be stored and queried when necessary. Our proposed TBD-DP operator has the following two conceptual phases: (i) in an offline phase, it utilizes a LSTM-based hierarchical ML algorithm to learn a tree of models (coined TBD-DP tree) over time and space; (ii) in an online phase, it uses the TBD-DP tree to recover data within a certain accuracy. In our experimental setup we measure the efficiency of the proposed operator using a ∼10GB anonymized real telco network trace and our experimental results in Tensorflow over HDFS are extremely encouraging as they show that TBD-DP saves an order of magnitude storage space while maintaining a high accuracy on the recovered data.