On the Machine Learning Models Inversion Attack Detector
Junzhe Song, Dmitry Namiot
20m
This article is devoted to the detection of adversarial attacks on machine
learning models. In the most general case, adversarial attacks are special data
changes at one of the stages of the machine learning pipeline, which are designed
to either prevent the operation of the machine learning system, or vice versa, to
achieve the desired result for the attacker. But there is also a form of attack
aimed at extracting non-public information from machine learning models. These
include model inversion attacks. These types of attacks pose a threat to the use
of machine learning as a service (MLaaS). Machine learning models accumulate
a lot of redundant information during training, and the possibility of exposing
this data while using the model can come as an unpleasant surprise.
Keywords: machine learning, adversarial attacks, model inversion
1. Introduction
Machine learning systems (and, at least now, it is a synonym for artificial
intelligence systems) depend on data. This tautological statement leads, in fact, to
quite serious consequences. Changing the data then, generally speaking, changes the
performance of the model. Purposeful data changes are attacks on machine learning
models [1]. But the models themselves can be directly affected during attacks. For
example, weights can change on the fly, malicious code can be loaded into weights, etc.
Adversarial attacks, which are possible for any discriminant machine learning models,
pose a great threat to machine learning systems, since they do not guarantee the
results and quality of the system. And such guarantees are, for example, mandatory
for the use of a machine learning (artificial intelligence) system in critical areas such
as avionics, automatic driving, special applications, etc. [2 , 3 ]. An attack directly on
the model also carries additional risks of extracting private information stored in
machine learning models [4]. Detection (determination of the fact of implementation)
of attacks related to the extraction of information is the subject of this work.
The remainder of the article is structured as follows. In section 2, we focus on
attacks on intellectual property. Section 3 presents the developed algorithm for
detecting extraction attacks. Section 4 presents the conclusion.