Machine learning systems regularly deal with structured data in real-world
applications. Unfortunately, such data has been difficult to faithfully
represent in a way that most machine learning techniques would expect, i.e. as
a real-valued vector of a fixed, pre-specified size. In this work, we introduce
a novel approach that compiles structured data into a satisfiability problem
which has in its set of solutions at least (and often only) the input data. The
satisfiability problem is constructed from constraints which are generated
automatically a priori from a given signature, thus trivially allowing for a
bag-of-words-esque vector representation of the input to be constructed. The
method is demonstrated in two areas, automated reasoning and natural language
processing, where it is shown to be near-perfect in producing vector
representations of natural-language sentences and first-order logic clauses
that can be translated back to their original, structured input forms.

Source link