Seven security concerns in Machine Learning (ML) –
- Data privacy and security: ML requires large amounts of data to be trained, and this data may contain sensitive or personal information. Appropriate measures need to be put in place to prevent data from being accessed by unauthorized parties.
- Notebooks security: ML typically requires Jupyter or similar notebooks to be served for data scientists to work on data, code, and models, both individually and collaboratively. These notebooks need to be access controlled and protected from unauthorized access. This includes the code and git repos that host the code, and the model artifacts that the notebook uses or creates.
- Model serving and inference security: ML models in production are commonly served and accessed over inference endpoints and such endpoints need authentication, authorization, encryption for protection against misuse. During model upgrades to an endpoint or changes to an endpoint and its configuration, a number of attacks are possible that are typical of a devops/devsecops pipeline. These need to be protected against.
- Model security: Models can be vulnerable to attacks such as adversarial inputs, such as when an attacker intentionally manipulates the input to the model in order to cause it to make incorrect predictions. Another example is when the model makes an egregiously bad decision on an input, for example a self-driving car hitting an obstacle instead of avoiding it. It is important to harden the model and bound the decisions that come from its use.
- Misuse: Even if a model works as designed, it can be misused, for example by generating fake or misleading content. It is important to consider the potential unintended consequences of using models and to put safeguards in place to prevent their misuse.
- Bias: ML models can sometimes exhibit biases due to the data they are trained on. There should be a plan to identify biases in a model and take steps to mitigate them.
- Intellectual property: ML models may be protected by intellectual property laws, and it is important to respect these laws and obtain the appropriate licenses when using language models developed by others.