Encrypted Key-value Database Using Homomorphic Encryption

March 16, 2023
  -  
Umut Sahin

During the first season of the Zama Bounty Program, we asked you to create an encrypted key-value database using FHE. A proper solution was required to ensure that both keys and values were encrypted while further adhering to a number of conditions, provided in the bounty’s description.

Github user Alpaylan successfully completed this bounty, which is outlined in detail below.

A key-value database is designed for the storage and retrieval of data via a unique key. While both more flexible and more efficient than standard relational databases, key-value databases are still prone to data theft. The solution? An encrypted key-value database that can store and retrieve data without ever decrypting it. When both keys and values are encrypted, storing data in a cloud-based setup or even in a large, open internal environment does not create a security concern. 

The following tutorial of an Encrypted Key Value Database explains three operations: `insert`, `replace`, and `query`. All the operations are implemented as fully-homomorphic encrypted circuits.

In `examples/key-value-database/`, you can find the following files. Use them as a reference or to build your own database:

  • `static-size.py`: This file contains a static size database implementation, meaning that the number of entries is given as a parameter at the beginning.
  • `dynamic-size.py`: This file contains a dynamic size database implementation, meaning that the database starts as a zero entry database, and is grown as needed.

First will come an examination of the statically-sized database implementation. The dynamic database implementation is very similar, so you are encouraged to review the code to identify the differences.

Below are the import statements.

`concrete.numpy`: Used for implementing homomorphic circuits.

`numpy`: Used for mathematical operations. 

import concrete.numpy as cnp
import numpy as np

The database configuration parameters are as follows:

Number of Entries: Defines the maximum number of insertable (key/value) pairs.

Chunk Size: Defines the size of each chunk, the smallest substructure of keys and values.

Key Size: Defines the size of each key.

Value Size: Defines the size of each value.

# The number of entries in the database
NUMBER_OF_ENTRIES = 5
# The number of bits in each chunk
CHUNK_SIZE = 4

# The number of bits in the key and value
KEY_SIZE = 32
VALUE_SIZE = 32

The definition of the state and the accessors/indexers to the state come next.

The shape of the state is defined with respect to the size of the key/value with the table given below.

# Key and Value size must be a multiple of chunk size
assert KEY_SIZE % CHUNK_SIZE == 0
assert VALUE_SIZE % CHUNK_SIZE == 0

# Required number of chunks to store keys and values
NUMBER_OF_KEY_CHUNKS = KEY_SIZE // CHUNK_SIZE
NUMBER_OF_VALUE_CHUNKS = VALUE_SIZE // CHUNK_SIZE

# The shape of the state as a tensor
# Shape:
# | Flag Size | Key Size | Value Size |
# | 1         | 32/4 = 8 | 32/4 = 8   |
STATE_SHAPE = (NUMBER_OF_ENTRIES, 1 + NUMBER_OF_KEY_CHUNKS + NUMBER_OF_VALUE_CHUNKS)

Slices below are used to index certain parts of the state.

# Indexers for each part of the state
FLAG = 0
KEY = slice(1, 1 + NUMBER_OF_KEY_CHUNKS)
VALUE = slice(1 + NUMBER_OF_KEY_CHUNKS, None)

Encode/Decode functions

Encode/Decode functions are used to convert between integers and NumPy arrays. The interface exposes integers, but the state is stored and processed as a NumPy array.

Encode.

Encodes a number into a NumPy array.

  • The number is encoded in binary and then split into chunks;
  • Each chunk is then converted to an integer;
  • The integers are then stored in a NumPy array.
def encode(number: int, width: int) -> np.array:
    binary_repr = np.binary_repr(number, width=width)
    blocks = [binary_repr[i:i+CHUNK_SIZE] for i in range(0, len(binary_repr), CHUNK_SIZE)]
    return np.array([int(block, 2) for block in blocks])

# Encode a number with the key size
def encode_key(number: int) -> np.array:
    return encode(number, width=KEY_SIZE)

# Encode a number with the value size
def encode_value(number: int) -> np.array:
    return encode(number, width=VALUE_SIZE)

Decode.

Decodes a NumPy array into a number.

def decode(encoded_number: np.array) -> int:
    result = 0
    for i in range(len(encoded_number)):
        result += 2**(CHUNK_SIZE*i) * encoded_number[(len(encoded_number) - i) - 1]
    return result

Row selection with table lookups

The `keep_selected` function is used to select the correct row of the database for each operation.

Below is the Python definition of the function.

def keep_selected(value, selected):
  if selected:
      return value
  else:
      return 0

This function takes any value, and a boolean flag indicates if value is selected or not. Within homomorphic encryption circuits, we cannot compile this function as encrypted values cannot affect control flow. Instead, we turn this function into a lookup table.

`Selected` is prepended to the value, and the function is modified to act as below.

keep_selected(i=0..15, 1) -> 0 keep_selected(i=16..31, 0) -> i-16

Next comes the Python code for the lookup table.

keep_selected_lut = cnp.LookupTable([0 for _ in range(16)] + [i for i in range(16)])

The most significant bit of the input to the lookup table represents the `select` bit. If select=0 <=> i=0..15 , then the output is 0. If select=1 <=> i=16..31, then the output is i-16, the value itself.

To summarize, you can implement the `keep_selected` function as below.

def keep_selected_using_lut(value, selected):
  packed = (2 ** CHUNK_SIZE) * selected + value
  return keep_selected_lut[packed]

Circuit implementation functions

The following functions are used to implement the key-value database circuits. Three circuits are implemented as:

  • `insert`: Inserts a key value pair into the database
  • `replace`: Replaces the value of a key in the database
  • `query`: Queries the database for a key and returns the value

Insert.

The algorithm of the `insert` function is as follows:

  • Create a selection array to select a certain row of the database;
  • Fill this array by setting the first non-empty row of the database to 1;
  • Create a state update array, where the first non-empty row of the database is set to the new key and value;
  • Add the state update array to the state.

The full implementation is provided here:

# Insert a key value pair into the database
# - state: The state of the database
# - key: The key to insert
# - value: The value to insert
# Returns the updated state
def _insert_impl(state, key, value):
    # Get the used bit from the state
    # This bit is used to determine if an entry is used or not
    flags = state[:, FLAG]

    # Create a selection array
    # This array is used to select the first unused entry
    selection = cnp.zeros(NUMBER_OF_ENTRIES)

    # The found bit is used to determine if an unused entry has been found
    found = cnp.zero()
    for i in range(NUMBER_OF_ENTRIES):
        # The packed flag and found bit are used to determine if the entry is unused
        # | Flag | Found |
        # | 0    | 0     | -> Unused, select
        # | 0    | 1     | -> Unused, skip
        # | 1    | 0     | -> Used, skip
        # | 1    | 1     | -> Used, skip
        packed_flag_and_found = (found * 2) + flags[i]
        # Use the packed flag and found bit to determine if the entry is unused
        is_selected = (packed_flag_and_found == 0)

        # Update the selection array
        selection[i] = is_selected
        # Update the found bit, so all entries will be 
        # skipped after the first unused entry is found
        found += is_selected

    # Create a state update array
    state_update = cnp.zeros(STATE_SHAPE)
    # Update the state update array with the selection array
    state_update[:, FLAG] = selection

    # Reshape the selection array to be able to use it as an index
    selection = selection.reshape((-1, 1))

    # Create a packed selection and key array
    # This array is used to update the key of the selected entry
    packed_selection_and_key = (selection * (2 ** CHUNK_SIZE)) + key
    key_update = keep_selected_lut[packed_selection_and_key]

    # Create a packed selection and value array
    # This array is used to update the value of the selected entry
    packed_selection_and_value = selection * (2 ** CHUNK_SIZE) + value
    value_update = keep_selected_lut[packed_selection_and_value]

    # Update the state update array with the key and value update arrays
    state_update[:, KEY] = key_update
    state_update[:, VALUE] = value_update

    # Update the state with the state update array
    new_state = state + state_update
    return new_state

Replace.

The algorithm of the `replace` function is as follows:

  • Create an equal-rows array to select rows that match the given key in the database;
  • Create a selection array to select the row that is currently used in the database;
  • Set the selection array to 1 for the row that contains the key, and 0 for all other rows;
  • Create an inverse selection array by inverting the selection array;
  • Update rows set to 1 in the selection array – all other values will stay the same. To do this, we multiply the selection array with the new key and value, and the inverse selection array with the old key and value;
  • then, add the two arrays to get the new state.
# Replace the value of a key in the database
#   If the key is not in the database, nothing happens
#   If the key is in the database, the value is replaced
# - state: The state of the database
# - key: The key to replace
# - value: The value to replace
# Returns the updated state
def _replace_impl(state, key, value):
    # Get the flags, keys and values from the state
    flags = state[:, FLAG]
    keys = state[:, KEY]
    values = state[:, VALUE]

    # Create an equal_rows array
    # This array is used to select all entries with the given key
    # The equal_rows array is created by comparing the keys in the state
    # with the given key, and only setting the entry to 1 if the keys are equal
    # Example:
    #   keys = [[1, 0, 1, 0], [0, 1, 0, 1, 1]]
    #   key = [1, 0, 1, 0]
    #   equal_rows = [1, 0]
    equal_rows = (np.sum((keys - key) == 0, axis=1) == NUMBER_OF_KEY_CHUNKS)

    # Create a selection array
    # This array is used to select the entry to change the value of
    # The selection array is created by combining the equal_rows array
    # with the flags array, which is used to determine if an entry is used or not
    # The reason for combining the equal_rows array with the flags array
    # is to make sure that only used entries are selected
    selection = (flags * 2 + equal_rows == 3).reshape((-1, 1))
    
    # Create a packed selection and value array
    # This array is used to update the value of the selected entry
    packed_selection_and_value = selection * (2 ** CHUNK_SIZE) + value
    set_value = keep_selected_lut[packed_selection_and_value]

    # Create an inverse selection array
    # This array is used to pick entries that are not selected
    # Example:
    #   selection = [1, 0, 0]
    #   inverse_selection = [0, 1, 1]
    inverse_selection = 1 - selection

    # Create a packed inverse selection and value array
    # This array is used to keep the value of the entries that are not selected
    packed_inverse_selection_and_values = inverse_selection * (2 ** CHUNK_SIZE) + values
    kept_values = keep_selected_lut[packed_inverse_selection_and_values]

    # Update the values of the state with the new values
    new_values = kept_values + set_value
    state[:, VALUE] = new_values

    return state

Query.

The algorithm of the `query` function is as follows:

  • Create a selection array to select a certain row of the database;
  • Set the selection array to 1 for the row that contains the key;
  • Multiply the selection array with the state to zero all rows that do not contain the key;
  • Sum the rows of the state to get the remaining non-zero row, basically doing a dimension reduction;
  • Prepend the found flag to the value, and return the resulting array;
  • Destructure the resulting array in the non-encrypted query function.
# Query the database for a key and return the value
# - state: The state of the database
# - key: The key to query
# Returns an array with the following format:
#   [found, value]
#   found: 1 if the key was found, 0 otherwise
#   value: The value of the key if the key was found, 0 otherwise
def _query_impl(state, key):
    # Get the keys and values from the state
    keys = state[:, KEY]
    values = state[:, VALUE]

    # Create a selection array
    # This array is used to select the entry with the given key
    # The selection array is created by comparing the keys in the state
    # with the given key, and only setting the entry to 1 if the keys are equal
    # Example:
    #   keys = [[1, 0, 1, 0], [0, 1, 0, 1, 1]]
    #   key = [1, 0, 1, 0]
    #   selection = [1, 0]
    selection = (np.sum((keys - key) == 0, axis=1) == NUMBER_OF_KEY_CHUNKS).reshape((-1, 1))

    # Create a found bit
    # This bit is used to determine if the key was found
    # The found bit is set to 1 if the key was found, and 0 otherwise
    found = np.sum(selection)

    # Create a packed selection and value array
    # This array is used to get the value of the selected entry
    packed_selection_and_values = selection * (2 ** CHUNK_SIZE) + values
    value_selection = keep_selected_lut[packed_selection_and_values]

    # Sum the value selection array to get the value
    value = np.sum(value_selection, axis=0)

    # Return the found bit and the value
    return cnp.array([found, *value])

Key-value database interface

The `KeyValueDatabase` class is the interface that exposes the functionality of the key-value database.

class KeyValueDatabase:
    """
    A key-value database that uses fully homomorphic encryption circuits to store the data.
    """

    # The state of the database, it holds all the 
    # keys and values as a table of entries
    _state: np.ndarray

    # The circuits used to implement the database
    _insert_circuit: cnp.Circuit
    _replace_circuit: cnp.Circuit
    _query_circuit: cnp.Circuit

    # Below is the initialization of the database.

    # First, we initialize the state, and provide the necessary input sets. 
    # In versions later than concrete-numpy.0.9.0, we can use the `direct circuit` 
    # functionality to define the bit-widths of encrypted values rather than using 
    # `input sets`. Input sets are used to determine the required bit-width of the 
    # encrypted values. Hence, we add the largest possible value in the database 
    # to the input sets.

    # Within the initialization phase, we create the required configuration, 
    # compilers, circuits, and keys. Circuit and key generation phase is 
    # timed and printed in the output.

    def __init__(self):
        # Initialize the state to all zeros
        self._state = np.zeros(STATE_SHAPE, dtype=np.int64)

        ## Input sets for initialization of the circuits
        # The input sets are used to initialize the circuits with the correct parameters

        # The input set for the query circuit
        inputset_binary = [
            (
                np.zeros(STATE_SHAPE, dtype=np.int64), # state
                np.ones(NUMBER_OF_KEY_CHUNKS, dtype=np.int64) * (2**CHUNK_SIZE - 1), # key
            )
        ]
        # The input set for the insert and replace circuits
        inputset_ternary = [
            (
                np.zeros(STATE_SHAPE, dtype=np.int64), # state
                np.ones(NUMBER_OF_KEY_CHUNKS, dtype=np.int64) * (2**CHUNK_SIZE - 1), # key
                np.ones(NUMBER_OF_VALUE_CHUNKS, dtype=np.int64) * (2**CHUNK_SIZE - 1), # value
            )
        ]

        ## Circuit compilation

        # Create a configuration for the compiler
        configuration = cnp.Configuration(
            enable_unsafe_features=True,
            use_insecure_key_cache=True,
            insecure_key_cache_location=".keys",
        )

        # Create the compilers for the circuits
        # Each compiler is provided with
        # - The implementation of the circuit
        # - The inputs and their corresponding types of the circuit
        #  - "encrypted": The input is encrypted
        #  - "plain": The input is not encrypted
        insert_compiler = cnp.Compiler(
            _insert_impl,
            {"state": "encrypted", "key": "encrypted", "value": "encrypted"}
        )
        replace_compiler = cnp.Compiler(
            _replace_impl,
            {"state": "encrypted", "key": "encrypted", "value": "encrypted"}
        )
        query_compiler = cnp.Compiler(
            _query_impl,
            {"state": "encrypted", "key": "encrypted"}
        )


        ## Compile the circuits
        # The circuits are compiled with the input set and the configuration

        self._insert_circuit = insert_compiler.compile(inputset_ternary, configuration)
        self._replace_circuit = replace_compiler.compile(inputset_ternary, configuration)
        self._query_circuit = query_compiler.compile(inputset_binary, configuration)

        ## Generate the keys for the circuits
        # The keys are seaparately generated for each circuit

        self._insert_circuit.keygen()
        self._replace_circuit.keygen()
        self._query_circuit.keygen()

    ### The Interface Functions
        
    # The following methods are used to interact with the database. 
    # They are used to insert, replace and query the database. 
    # The methods are implemented by encrypting the inputs, 
    # running the circuit and decrypting the output.

    # Insert a key-value pair into the database
    # - key: The key to insert
    # - value: The value to insert
    # The key and value are encoded before they are inserted
    # The state of the database is updated with the new key-value pair
    def insert(self, key, value):
        self._state = self._insert_circuit.encrypt_run_decrypt(
            self._state, encode_key(key), encode_value(value)
        )

    # Replace a key-value pair in the database
    # - key: The key to replace
    # - value: The new value to insert with the key
    # The key and value are encoded before they are inserted
    # The state of the database is updated with the new key-value pair
    def replace(self, key, value):
        self._state = self._replace_circuit.encrypt_run_decrypt(
            self._state, encode_key(key), encode_value(value)
        )

    # Query the database for a key
    # - key: The key to query
    # The key is encoded before it is queried
    # Returns the value associated with the key or None if the key is not found
    def query(self, key):
        result = self._query_circuit.encrypt_run_decrypt(
            self._state, encode_key(key)
        )

        if result[0] == 0:
            return None

        return decode(result[1:])

Whereas static implementation works with circuits over the entire database, dynamic implementation works with circuits over a single row.

In the dynamic implementation, you can iterate over the rows of the database in a simple Python loop and run the circuits over each row. This difference in implementation is reflected in the `insert`, `replace`, and `query` functions.

The static implementation is more efficient with dense databases as it works with parallelized tensors, resulting in a fixed amount of time for both simple and complex queries. The dynamic implementation is more efficient with sparse databases as it grows with the number of entries, but it doesn't use circuit-level parallelization. Also, insertion is free in the dynamic implementation as it only appends a new item to a Python list.

Usage

The initialization of the database is given below. Parameters are provided globally, so you can simply initialize the database with the following command.

## Test: Initialization
# Initialize the database
db = KeyValueDatabase()

Use the interface functions as provided below.

# Test: Insert/Query
# Insert (key: 3, value: 4) into the database
db.insert(3, 4)
# Query the database for the key 3
# The value 4 should be returned
assert db.query(3) == 4
# Test: Replace/Query
# Replace the value of the key 3 with 1
db.replace(3, 1)
# Query the database for the key 3
# The value 1 should be returned
assert db.query(3) == 1
# Test: Insert/Query
# Insert (key: 25, value: 40) into the database
db.insert(25, 40)
# Query the database for the key 25
# The value 40 should be returned
assert db.query(25) == 40
# Test: Query Not Found
# Query the database for the key 4
# None should be returned
assert db.query(4) == None
# Test: Replace/Query
# Replace the value of the key 3 with 5
db.replace(3, 5)
# Query the database for the key 3
# The value 5 should be returned
assert db.query(3) == 5

Now, we’ll test the limits. Using the hyper-parameters KEY_SIZE and VALUE_SIZE, we ensure that the examples work robustly against changes to the parameters.

# Define lower/upper bounds for the key
minimum_key = 1
maximum_key = 2 ** KEY_SIZE - 1
# Define lower/upper bounds for the value
minimum_value = 1
maximum_value = 2 ** VALUE_SIZE - 1

Below are the usage examples with these bounds.

# Test: Insert/Replace/Query Bounds
# Insert (key: minimum_key , value: minimum_value) into the database
db.insert(minimum_key, minimum_value)

# Query the database for the key=minimum_key
# The value minimum_value should be returned
assert db.query(minimum_key) == minimum_value

# Insert (key: maximum_key , value: maximum_value) into the database
db.insert(maximum_key, maximum_value)

# Query the database for the key=maximum_key
# The value maximum_value should be returned
assert db.query(maximum_key) == maximum_value

# Replace the value of key=minimum_key with maximum_value
db.replace(minimum_key, maximum_value)

# Query the database for the key=minimum_key
# The value maximum_value should be returned
assert db.query(minimum_key) == maximum_value

# Replace the value of key=maximum_key with minimum_value
db.replace(maximum_key, minimum_value)

# Query the database for the key=maximum_key
# The value minimum_value should be returned
assert db.query(maximum_key) == minimum_value

Conclusion

FHE offers an innovative approach to the problem of storing and querying data within an open environment, and this bounty is further proof of concept. Modern shopping carts and classic session stores, for example, could rely on a database of this kind to operate more securely, so your online orders and session history are sure to be kept confidential. There are a number of other use cases that can benefit from encrypted key-value databases and this solution provides us with a sense of what is possible.

Additional links

Read more related posts

Encrypted Image Filtering Using Homomorphic Encryption

This tutorial introduces a way to apply image filters, such as the use of blurring and sharpening tools, on encrypted images.

Read Article

Titanic Competition with Privacy Preserving Machine Learning

A Privacy-Preserving Machine Learning (PPML) solution to the famous ML Titanic challenge using concrete-ml

Read Article