Encrypted Key-value Database Using Homomorphic Encryption
During the first season of the Zama Bounty Program, we asked you to create an encrypted key-value database using FHE. A proper solution was required to ensure that both keys and values were encrypted while further adhering to a number of conditions, provided in the bounty’s description.
Github user Alpaylan successfully completed this bounty, which is outlined in detail below.
A key-value database is designed for the storage and retrieval of data via a unique key. While both more flexible and more efficient than standard relational databases, key-value databases are still prone to data theft. The solution? An encrypted key-value database that can store and retrieve data without ever decrypting it. When both keys and values are encrypted, storing data in a cloud-based setup or even in a large, open internal environment does not create a security concern.
The following tutorial of an Encrypted Key Value Database explains three operations: `insert`, `replace`, and `query`. All the operations are implemented as fully-homomorphic encrypted circuits.
In `examples/key-value-database/`, you can find the following files. Use them as a reference or to build your own database:
- `static-size.py`: This file contains a static size database implementation, meaning that the number of entries is given as a parameter at the beginning.
- `dynamic-size.py`: This file contains a dynamic size database implementation, meaning that the database starts as a zero entry database, and is grown as needed.
First will come an examination of the statically-sized database implementation. The dynamic database implementation is very similar, so you are encouraged to review the code to identify the differences.
Below are the import statements.
`concrete.numpy`: Used for implementing homomorphic circuits.
`numpy`: Used for mathematical operations.
The database configuration parameters are as follows:
Number of Entries: Defines the maximum number of insertable (key/value) pairs.
Chunk Size: Defines the size of each chunk, the smallest substructure of keys and values.
Key Size: Defines the size of each key.
Value Size: Defines the size of each value.
The definition of the state and the accessors/indexers to the state come next.
The shape of the state is defined with respect to the size of the key/value with the table given below.
Slices below are used to index certain parts of the state.
Encode/Decode functions are used to convert between integers and NumPy arrays. The interface exposes integers, but the state is stored and processed as a NumPy array.
Encodes a number into a NumPy array.
- The number is encoded in binary and then split into chunks;
- Each chunk is then converted to an integer;
- The integers are then stored in a NumPy array.
Decodes a NumPy array into a number.
Row selection with table lookups
The `keep_selected` function is used to select the correct row of the database for each operation.
Below is the Python definition of the function.
This function takes any value, and a boolean flag indicates if value is selected or not. Within homomorphic encryption circuits, we cannot compile this function as encrypted values cannot affect control flow. Instead, we turn this function into a lookup table.
`Selected` is prepended to the value, and the function is modified to act as below.
keep_selected(i=0..15, 1) -> 0 keep_selected(i=16..31, 0) -> i-16
Next comes the Python code for the lookup table.
The most significant bit of the input to the lookup table represents the `select` bit. If select=0 <=> i=0..15 , then the output is 0. If select=1 <=> i=16..31, then the output is i-16, the value itself.
To summarize, you can implement the `keep_selected` function as below.
Circuit implementation functions
The following functions are used to implement the key-value database circuits. Three circuits are implemented as:
- `insert`: Inserts a key value pair into the database
- `replace`: Replaces the value of a key in the database
- `query`: Queries the database for a key and returns the value
The algorithm of the `insert` function is as follows:
- Create a selection array to select a certain row of the database;
- Fill this array by setting the first non-empty row of the database to 1;
- Create a state update array, where the first non-empty row of the database is set to the new key and value;
- Add the state update array to the state.
The full implementation is provided here:
The algorithm of the `replace` function is as follows:
- Create an equal-rows array to select rows that match the given key in the database;
- Create a selection array to select the row that is currently used in the database;
- Set the selection array to 1 for the row that contains the key, and 0 for all other rows;
- Create an inverse selection array by inverting the selection array;
- Update rows set to 1 in the selection array – all other values will stay the same. To do this, we multiply the selection array with the new key and value, and the inverse selection array with the old key and value;
- then, add the two arrays to get the new state.
The algorithm of the `query` function is as follows:
- Create a selection array to select a certain row of the database;
- Set the selection array to 1 for the row that contains the key;
- Multiply the selection array with the state to zero all rows that do not contain the key;
- Sum the rows of the state to get the remaining non-zero row, basically doing a dimension reduction;
- Prepend the found flag to the value, and return the resulting array;
- Destructure the resulting array in the non-encrypted query function.
Key-value database interface
The `KeyValueDatabase` class is the interface that exposes the functionality of the key-value database.
Whereas static implementation works with circuits over the entire database, dynamic implementation works with circuits over a single row.
In the dynamic implementation, you can iterate over the rows of the database in a simple Python loop and run the circuits over each row. This difference in implementation is reflected in the `insert`, `replace`, and `query` functions.
The static implementation is more efficient with dense databases as it works with parallelized tensors, resulting in a fixed amount of time for both simple and complex queries. The dynamic implementation is more efficient with sparse databases as it grows with the number of entries, but it doesn't use circuit-level parallelization. Also, insertion is free in the dynamic implementation as it only appends a new item to a Python list.
The initialization of the database is given below. Parameters are provided globally, so you can simply initialize the database with the following command.
Use the interface functions as provided below.
Now, we’ll test the limits. Using the hyper-parameters KEY_SIZE and VALUE_SIZE, we ensure that the examples work robustly against changes to the parameters.
Below are the usage examples with these bounds.
FHE offers an innovative approach to the problem of storing and querying data within an open environment, and this bounty is further proof of concept. Modern shopping carts and classic session stores, for example, could rely on a database of this kind to operate more securely, so your online orders and session history are sure to be kept confidential. There are a number of other use cases that can benefit from encrypted key-value databases and this solution provides us with a sense of what is possible.
- ⭐️ Like our work? Support us and star the Github repo.
- 👋 Questions? Join the Zama community.
- 💸 Join the Zama Bounty Program. Solve problems, write tutorials and earn rewards, more than €500,000 in prizes.