On Federated Learning — Prism Informatix

On April 6, 2017, Google Research Labs published in its “latest research news” a blog entitled “Federated Learning: Collaborative Machine Learning without Centralizing Training Data”. It can be found at:

https://research.googleblog.com/2017/04/federated-learning-collaborative.html?m=1

posted by Brendan McMahan and Daniel Ramage, Research Scientists.

We were excited to see their blog. There are two main reasons:

Their vision for distributed analytics and the attendant benefits of avoiding data integration that the blog promotes, such as preserving privacy and operating across distributed data asynchronously on the Internet, are very similar to the vision we have been espousing
The problem they address is complementary to the problem we are solving, in a way that could be useful

Their post describes an application setting of mobile phones collaboratively learning a shared prediction model, while keeping all the training data on the devices, and sharing only compact privacy-preserving messages. A focus of their message is on contrasting the challenges of learning models in a distributed environment with asynchronous, intermittently available, low bandwidth communications to what is typically found in tightly-controlled, high bandwidth, and synchronized cloud computing environments. Such settings are likely to be increasingly prevalent with the Internet of Things (IoT).

While similar in concept to our approach, a clear distinction between Google’s technology and Prism’s is that Google is addressing data that is partitioned by instance or example, which is also referred to as “horizontally distributed data”; while our technology deals with fusing data that is partitioned by feature, or so-called “vertically distributed data”. In fact, practical parallelization schemes for stochastic gradient descent (SGD) and the limited memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algorithms used in Google's TensorFlow, as well as the state of the art (stochastic) distributed coordinate ascent (DCA) algorithms used to solve regularized regression (with convex cost) problems, are all based on a horizontal partitioning of the data. Naturally, the business application that they are addressing fits this horizontally distributed scheme. That is, each device holds the data (all the features) for a given individual (example) and the goal is to learn the model over all individuals.

As stated in their blog, their federated learning cannot solve all machine learning problems. Similarly, our technology cannot address applications with horizontally distributed data, but it instead can address valuable business applications with vertically distributed data, as described on this website and in our downloadable whitepaper.

A final comment made in their blog to the research community: “Applying Federated Learning requires machine learning practitioners to adopt new tools and a new way of thinking: model development, training, and evaluation with no direct access to or labeling of raw data, with communication cost as a limiting factor.”

We would add, to the business community, that new approaches and a new way of thinking are needed for predictive analysis of distributed big data, with dollar cost, time to results, and privacy taken as limiting factors.