Consistent Structured Prediction with Max-Min Margin Markov Networks

Part of Proceedings of the International Conference on Machine Learning 1 pre-proceedings (ICML 2020)

Bibtex »Metadata »Paper »Supplemental »

Bibtek download is not availble in the pre-proceeding


Alex Nowak, Francis Bach, Alessandro Rudi


Max-margin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of max-margin Markov networks ($M^3N$), or more generally structural SVMs. These methods are able to model interactions between output parts and incorporate a cost between labels. Unfortunately, these methods are inconsistent when the relationship between inputs and labels is far from deterministic. To overcome such limitations, in this paper we go beyond max-margin, defining the learning problem in terms of a ``max-min'' margin formulation. The resulting method, which we name max-min margin Markov networks ($M^4N$), provides a correction of the $M^3N$ loss that is key to achieve consistency in the general case. In this paper, we prove consistency and finite sample generalization bounds for $M^4N$ and provide an explicit algorithm to compute the estimator. The algorithm has strong statistical and computational guarantees: in a worst case scenario it achieves a generalization error of $O(1/\sqrt{n})$ for a total cost of $O(n\sqrt{n})$ marginalization-oracle calls, which have essentially the same cost as the max-oracle from $M^3N$. Experiments on multi-class classification and handwritten character recognition demonstrate the effectiveness of the proposed method over $M^3N$ networks.