Evaluating recurrent neural networks over encrypted data using NumPy and Concrete

October 13, 2021

Ayoub Benaissa

This is also a heading
This is a heading
This is also a heading

× 🚨🚨🚨 Attention — The code in this blog post is deprecated
Check out the latest version of Concrete ML here..

Fully Homomorphic Encryption (FHE) is a cryptographic technique that allows you to compute on ciphertexts (encrypted messages) without needing to decrypt the messages inside them. FHE programming is notoriously hard, which is why we created an experimental compiler that can convert a classical Numpy program into an FHE circuit that can then be run using the Concrete FHE library.

Homomorphic NumPy

The Homomorphic NumPy (HNP) library allows you to convert functions operating on NumPy multidimensional arrays into a homomorphic equivalent. Think of it as writing your computation in the usual way using NumPy only, then HNP will take care of the conversion.

In this article and in the accompanying video, we will showcase two examples using HNP. We will start with logistic regression to get familiar with the process of converting NumPy functions, then go for a more involved use case, Recurrent Neural Networks.

Disclaimer: HNP is an experimental tool that will not be further developed, as we are working on a new compiler with better performances and reliability. We still wanted to show how far FHE has gotten and enable the community to experiment while we work on the stable release.

Installing HNP

Before we can get into the examples, we need to install HNP using the Zama docker image. The container comes with the necessary libraries preinstalled, including Jupyter so that you can directly start playing with it.


# Pull the docker image

docker pull docker.io/zamafhe/hnp

# Start a Jupyter notebook

docker run — rm -it -p 8888:8888 -v /src/path:/data zamafhe/hnp

You’re ready to go!
‍‍

Homomorphic Logistic Regression

Let’s start with a simple example of how to use HNP: performing inference using a logistic regression model.

The first part is to import the libraries and define the inference function. This is pretty straightforward since we assume the model is already trained.

To compile our NumPy function into its homomorphic equivalent, we need to provide some information about the inputs, namely, the shape of the multi-dimensional array, and its bounds. The bounds are the range in which the values of the input array fall. It’s important to note that these bounds should only take into account the input, and not any computation that might occur later on (this will be taken care of by the compiler).

FHE is currently limited in terms of precision, which means bounds have to be as tight as possible. Here, we will generate some random data between -1 and 1, and check that it will run correctly when encrypted using the simulate method. The result can be a little different between simulation and the original NumPy computation, but as far as it doesn’t exceed h.expected_precision() then it’s considered valid.

Next, we need to generate public and private keys for the user. In FHE, the server doing the computation doesn’t need the private key since nothing is decrypted. Instead, a public key is sent for each user of the service. Note that the compilation itself is user-independent, so you only need to compile once and it will run for any public key and user of your system. Key generation currently takes a while (tens of seconds, sometimes minutes), but it only needs to be done once.

Finally, we can run the computation. This consists of three steps: encryption of the input, evaluation of the program, and decryption of the output. In a real application, encryption and decryption is done on the user’s device, while the evaluation is done server side.

For convenience when debugging, HNP also provides a shortcut to do all the steps at once: h.encrypt_and_run(keys, x)

That’s it! You have now successfully created your first homomorphic NumPy program. A more complete Logistic Regression example can be found here.

Recurrent Neural Networks

In this example, we will use a simple LSTM (long short-term memory) to do simple sentiment analysis and classify a sentence as either positive or negative.

Deep learning, and in particular RNNs, are notoriously hard to implement using FHE, as it used to be impossible to evaluate non-linear activation functions homomorphically, as well as impossible to go beyond a few layers deep because of noise accumulation in the ciphertext. Both of these issues are solved in Concrete by implementing a novel operator called “programmable bootstrapping”, which HNP relies upon heavily.

Our model is an LSTM followed by a linear layer and a sigmoid activation function. We use a pre-trained word embedding and this dataset.

For this example, we will need some additional boilerplate code, as we will be using PyTorch, but the overall compilation process remains the same. The complete notebook for this example can be found here, so we will only focus on the important parts.

First, let’s define the model in PyTorch:

We will assume at this point that we have the trained model and want to compile it into its homomorphic equivalent. Our compile function can only take NumPy computation, so we will need to manually convert this PyTorch model to work with NumPy. Here is how to extract the learned parameters and implement the forward pass using NumPy:

‍
Now that we have our forward pass in pure NumPy, we can compile it. We will be using some advanced configuration options to make things better:

The handselected parameter optimizer will use a pre-computed set of parameters that have been known to work well in machine learning usecases by sacrificing some precision for faster execution.
The apply_topological_optimization parameter should be enabled by default to ensure the FHE circuit is correctly optimized.
The probabilistic_bounds parameter controls how big the margin of error can be around the bounds of data; a bigger value will guarantee a bigger margin of error, but less precision.

You can play with these parameters and see how they affect the final result. Here, we limit the length of sentence to five words in order to keep the running time reasonable, but feel free to use longer sentences.

Then we generate some user keys:

We are now ready to run the evaluation. To verify that the compilation went well, we will also output some debugging info using the simulate function. Note that sentences of less than five words will need to be padded with zeros.

And finally, we evaluate an example:

This might take 30 seconds up to 20 minutes or more, depending on your hardware, so the more cores you have on your CPU the better!

Conclusion

FHE is still in its infancy, and until recently was not even working at all. While the precision and speed is still a barrier to adoption, they are improving following a Moore-like law where we have a 10x gain in speed every 18 months or so. This means that by 2025, FHE should be usable everywhere on the internet, from databases to machine learning and analytics!

Let us know what you build!

Useful Links

Estimating the Security of Homomorphic Encryption Schemes

Tools like the Lattice Estimator are vital to ensure that homomorphic encryption schemes are deployed in a secure manner.

December 15, 2021

Ben Curtis

Engineering

Quantization of Neural Networks for Fully Homomorphic Encryption

Machine Learning and the Need for Privacy‍

January 26, 2022

Jordan Frery

Tutorials

Ben Curtis

December 15, 2021

Estimating the Security of Homomorphic Encryption Schemes

Tools like the Lattice Estimator are vital to ensure that homomorphic encryption schemes are deployed in a secure manner.

Read Article

Jordan Frery

January 26, 2022

Quantization of Neural Networks for Fully Homomorphic Encryption

Machine Learning and the Need for Privacy‍

Read Article

Privacy is necessary for an open society in the electronic age. Privacy is not secrecy. A private matter is something one doesn't want the whole world to know, but a secret matter is something one doesn't want anybody to know. Privacy is the power to selectively reveal oneself to the world.If two parties have some sort of dealings, then each has a memory of their interaction. Each party can speak about their own memory of this; how could anyone prevent it? One could pass laws against it, but the freedom of speech, even more than privacy, is fundamental to an open society; we seek not to restrict any speech at all. If many parties speak together in the same forum, each can speak to all the others and aggregate together knowledge about individuals and other parties. The power of electronic communications has enabled such group speech, and it will not go away merely because we might want it to.Since we desire privacy, we must ensure that each party to a transaction have knowledge only of that which is directly necessary for that transaction. Since any information can be spoken of, we must ensure that we reveal as little as possible. In most cases personal identity is not salient. When I purchase a magazine at a store and hand cash to the clerk, there is no need to know who I am. When I ask my electronic mail provider to send and receive messages, my provider need not know to whom I am speaking or what I am saying or what others are saying to me; my provider only need know how to get the message there and how much I owe them in fees. When my identity is revealed by the underlying mechanism of the transaction, I have no privacy. I cannot here selectively reveal myself; I must always reveal myself.Therefore, privacy in an open society requires anonymous transaction systems. Until now, cash has been the primary such system. An anonymous transaction system is not a secret transaction system. An anonymous system empowers individuals to reveal their identity when desired and only when desired; this is the essence of privacy.Privacy in an open society also requires cryptography. If I say something, I want it heard only by those for whom I intend it. If the content of my speech is available to the world, I have no privacy. To encrypt is to indicate the desire for privacy, and to encrypt with weak cryptography is to indicate not too much desire for privacy. Furthermore, to reveal one's identity with assurance when the default is anonymity requires the cryptographic signature.We cannot expect governments, corporations, or other large, faceless organizations to grant us privacy out of their beneficence. It is to their advantage to speak of us, and we should expect that they will speak. To try to prevent their speech is to fight against the realities of information. Information does not just want to be free, it longs to be free. Information expands to fill the available storage space. Information is Rumor's younger, stronger cousin; Information is fleeter of foot, has more eyes, knows more, and understands less than Rumor.We must defend our own privacy if we expect to have any. We must come together and create systems which allow anonymous transactions to take place. People have been defending their own privacy for centuries with whispers, darkness, envelopes, closed doors, secret handshakes, and couriers. The technologies of the past did not allow for strong privacy, but electronic technologies do.We the Cypherpunks are dedicated to building anonymous systems. We are defending our privacy with cryptography, with anonymous mail forwarding systems, with digital signatures, and with electronic money.Cypherpunks write code. We know that someone has to write software to defend privacy, and since we can't get privacy unless we all do, we're going to write it. We publish our code so that our fellow Cypherpunks may practice and play with it. Our code is free for all to use, worldwide. We don't much care if you don't approve of the software we write. We know that software can't be destroyed and that a widely dispersed system can't be shut down.Cypherpunks deplore regulations on cryptography, for encryption is fundamentally a private act. The act of encryption, in fact, removes information from the public realm. Even laws against cryptography reach only so far as a nation's border and the arm of its violence. Cryptography will ineluctably spread over the whole globe, and with it the anonymous transactions systems that it makes possible.For privacy to be widespread it must be part of a social contract. People must come and together deploy these systems for the common good. Privacy only extends so far as the cooperation of one's fellows in society. We the Cypherpunks seek your questions and your concerns and hope we may engage you so that we do not deceive ourselves. We will not, however, be moved out of our course because some may disagree with our goals.The Cypherpunks are actively engaged in making the networks safer for privacy. Let us proceed together apace.Onward. By Eric Hughes. 9 March 1993.

Evaluating recurrent neural networks over encrypted data using NumPy and Concrete

Homomorphic NumPy

Installing HNP

Homomorphic Logistic Regression

Recurrent Neural Networks

Conclusion

Useful Links

Read more related posts

Estimating the Security of Homomorphic Encryption Schemes

Quantization of Neural Networks for Fully Homomorphic Encryption

Estimating the Security of Homomorphic Encryption Schemes

Quantization of Neural Networks for Fully Homomorphic Encryption

Libraries and Solutions

Developers

Company