Project description


PROTEUS will contribute to the maturity of this field by designing and developing a library of Scalable Online Machine Learning and Data Mining Algorithms (named SOLMA) adapted to the data analytics platform, Apache Flink. The SOLMA library will consist of efficient distributed online algorithms for basic utilities, sketches as well as advanced online predictive analytics for various tasks like classification, clustering, regression, ensemble methods, and novelty and change detection.

In particular PROTEUS will not address only scalability, but also complexity. While for scalability various computational concepts will applied to develop highly scalable streaming algorithms, for complexity we will consider distributed multivariate streams where data is potentially complex (text, picture, video) to reflect on the third element of big data which is variety. All algorithms developed will be theoretically analysed to determine their bounds.  SOLMA will be enriched with a set of techniques that will enhance the applicability of the algorithms in various real-world situations by considering drift handling, novelty detection, active learning and semi-supervised learning.