PROTEUS will contribute to the maturity of this field by designing and developing a library of Scalable Online
Machine Learning and Data Mining Algorithms (named SOLMA) adapted to the data analytics platform, Apache Flink. The SOLMA library will consist of efficient distributed online algorithms for basic
utilities, sketches as well as advanced online predictive analytics for various tasks like classification, clustering, regression, ensemble methods, and novelty and change detection.
In particular PROTEUS will not address only scalability, but also complexity. While for scalability various
computational concepts will applied to develop highly scalable streaming algorithms, for complexity we will consider distributed multivariate streams where data is potentially complex (text,
picture, video) to reflect on the third element of big data which is variety. All algorithms developed will be theoretically analysed to determine their bounds. SOLMA will be enriched with a set of techniques that will enhance the applicability of the algorithms in various real-world situations by considering drift
handling, novelty detection, active learning and semi-supervised learning.