Event Prediction
The Objective
The goal is to make an automated decision about whether or not an event is occurring, based on evidence in your data, and there may be multiple distributed sources of complementary data in which views of it could appear. The event could be a signal, an anomaly, an impending failure, an emerging threat, a fraud activity - in general a class attribute you are interested to detect. If you can use the information you have together intelligently, you should be able to better detect the event with a lower false positive rate. Note that many continuous monitoring applications require such decisions to be made in real-time.
The Challenge
The conventional approach will integrate the data from the disparate silos to perform a centralized machine learning analysis. But the data may be distributed geographically, heterogeneous in phenomenology and format (e.g., image, text, chemical, etc.), and also evolving on different timescales, creating complexity that requires substantial investment and time to address. Some sources may be available and/or useful only intermittently, compounding issues with missing or low quality data.
Model learning with the conventional method can be challenged to keep pace with data that is evolving and exhibiting statistical non-stationarities. If a centralized model is learned on integrated data, “incremental updates” based on new data in some silos cannot be made, meaning that an entirely new global model must be learned from scratch to replace the existing model which is now based on stale assumptions. Also, if entirely new sources become available to the system, their data must be combined into the existing training set and a new model learned all over again. For large numbers of sources (sensors), the dimensionality of the integrated feature data themselves may present scaling challenges for the computational aspects of learning.
Online prediction can be challenged to provide real-time response because the new test inputs arriving to silos must be communicated to a central location to feed to the model. Depending on how the model was learned and constructed, if some data is late when decisions need to be made, missing data must be imputed and can lead to loss in performance.
The Collaborative Analytics Solution for Event Prediction
Collaborative Analytics (CA) automatically detects patterns in distributed high variety sources based on natively distributed algorithms that do not require integrating data. Reductions in cost and complexity make it an effective way to leverage more sources of data for superior detection at lower false alarm rates, and the approach scales well to very large numbers of sources (thousands+). The method also preserves privacy across silos because the raw data is never shared or combined. External sources for which the data could never be accessed and integrated can still be connected into the global networked decision process.
Model learning is highly adaptive because the global data model is learned over a collection of local models that are associated with the silos (via a hierarchical learning structure). This means that if local models are continuously refined and improved, either to extract more discrimination performance and/or to adapt to non-stationarities, they can be integrated incrementally to refresh the global model and improve the performance of the overall system, while it operates (no downtime offline). The decoupling of the local learning problems means that the best techniques may be applied to each silo, for example deep neural nets might be used for image data, random forests for text, and SVM’s for sensor data, and CA will compose those methods into an optimized global model that operates as a unified system.
Online prediction is accelerated because there is no need to bring the raw data together, and the CA decision system will optimize on-the-fly and in real-time to provide the best performance possible with the information at hand when the system is queried, providing a graceful high performance solution to the missing data challenge.
If some silos are costly to include, they may be deferred and only selectively invoked only when residual ambiguity in the event prediction is too high and more information is required.
The Collaborative Analytics approach provides a cost effective and scalable way to leverage the most data possible for superior event prediction.