The notion of AI systems in which the learning is distributed to reside with local data sources, e.g., on hardware devices or at aggregation points near the edge of the network, is increasingly discussed, even in the popular press. For example see:
WIRED, UK, Dec 2018; I. Ng, H. Haddadi
The key drivers behind the emergence of these ideas are the scaling, privacy, and cost challenges associated with increasingly distributed data and sensors, as reflected in evolving architectural frameworks for the Internet of Things (IoT).
The referenced article groups the decentralized learning approaches into three categories:
local learning
distributed or federated learning
cooperative learning
We refer to our approach at Prism as "collaborative analytics", and while collaborative learning embodies aspects of the previous approaches, it is unique and different. In our system, models are learned locally, using local data, and that data is never shared beyond the local source. Instead, the collection of local models self-organizes into a global network model that is optimized given the quality and dependency of information across the local sources. The only information that is communicated are compact messages (signaling statistics) that are privacy-preserving in the sense that the data cannot be backed out, even if the messages are observed. While some of the data that is being modeled might reside in cloud storage, there is no such requirement, only that the local sources being modeled can be networked together in order to organize and collaborate.
In such a scheme, cost savings accrue because the distributed data never need to be assembled and semantically unified, which requires ongoing investment to sustain. Instead, local models can be adapted to newly arriving data at the source, and the integration into an overall "ensembled" system occurs at the level of the models, not the data. Scalability in terms of sources is enabled by the loose coupling of the distributed system, which means sources may come and go, and the system will automatically adapt to assimilate them into the global model, incrementally and on-the-fly (high agility). Compactness in the model representation ensures that very large numbers of sources can be accommodated (in the thousands). Privacy is provided "by design", both in the messaging scheme, and by eliminating situations in which the privacy violation is actually created when the data are assembled together.
Centralizing data for analysis that is natively derived from distributed collection points becomes "cumbersome", as the title of the article suggests, because the analytics are fundamentally mismatched to the way the data is generated in the first place. An illustration of this would be a case where certain data reside in slowly evolving databases, whereas other data are coming from real-time streaming sensing, and other from the scraping of social media posts. Not only is the "phenomenology" of the data types disparate, but their natural time-course-evolution is unfolding at different rates. In methods for which the learning is a truly distributed process associated with the distributed data, the data “turbulence" that is intrinsic to mashing such data together into a common data lake for analysis is avoided because the analytics themselves are distributed to the sources, and then assembled into a global model hierarchically.
As the article points out, the economics in online information markets will be driven by new mechanisms that enable transactional business models that are efficient and protect the privacy of the sources. See our tab at www.prisminformatix.com -> Use Cases -> Information Markets for how our Collaborative Analytics provide one approach to address that need.