TFHE-rs v0.6: Zero-Knowledge Support and Signed Integer Operations on GPU

April 8, 2024
Jean-Baptiste Orfila, Arthur Meyre, Agnes Leroy

TFHE-rs v0.6 introduces a cryptographic technique that complements FHE, known as Zero-Knowledge Proofs. Additionally, this version enhances GPU support for arithmetic operations with signed integer operations. Furthermore, it introduces additional cryptographic features, such as the generation of encrypted randomness. 

Zero-Knowledge Proof for Compact Public Key encryption

In addition to the standard private key settings, TFHE-rs now encompasses the public key scheme as described in Marc Joye's work. This approach allows anyone to encrypt a ciphertext, making it essential in some cases to prove that the encryption was correctly performed. The latest version of TFHE-rs enables the generation of a Zero-Knowledge Proof to verify that a public key encryption process has been performed correctly. In other words, the creation of a proof reveals nothing about the encrypted message, except for its already known range. This technique is derived from Benoit Libert’s work.

Deploying this feature is straightforward: the client generates the proof at the time of encryption, while the server verifies it before proceeding with homomorphic computations. Below is an example demonstrating how a client can encrypt and prove a ciphertext, and how a server can verify the ciphertext and carry out computations on it:

use rand::prelude::*;
use tfhe::prelude::FheDecrypt;
use tfhe::set_server_key;
use tfhe::zk::{CompactPkeCrs, ZkComputeLoad};

pub fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut rng = thread_rng();

    let max_num_message = 1;

    let params = tfhe::shortint::parameters::PARAM_MESSAGE_2_CARRY_2_COMPACT_PK_KS_PBS_TUNIFORM_2M40;

    let client_key = tfhe::ClientKey::generate(tfhe::ConfigBuilder::with_custom_parameters(params, None));
    // This is done in an offline phase and the CRS is shared to all clients and the server
    let crs = CompactPkeCrs::from_shortint_params(params, max_num_message).unwrap();
    let public_zk_params = crs.public_params();
    let server_key = tfhe::ServerKey::new(&client_key);
    let public_key = tfhe::CompactPublicKey::try_new(&client_key).unwrap();

    let clear_a = rng.gen::();
    let clear_b = rng.gen::();

    let a = tfhe::ProvenCompactFheUint64::try_encrypt(
        clear_a,
        public_zk_params,
        &public_key,
        ZkComputeLoad::Proof,
    )?;
    let b = tfhe::ProvenCompactFheUint64::try_encrypt(
        clear_b,
        public_zk_params,
        &public_key,
        ZkComputeLoad::Proof,
    )?;

    // Server side
    let result = {
        set_server_key(server_key);

        // Verify the ciphertexts
        let a = a.verify_and_expand(&public_zk_params, &public_key)?;
        let b = b.verify_and_expand(&public_zk_params, &public_key)?;

        a + b
    };

    // Back on the client side
    let a_plus_b: u64 = result.decrypt(&client_key);
    assert_eq!(a_plus_b, clear_a.wrapping_add(clear_b));

    Ok(())
}

Encrypting and proving an FheUint64 takes 6.9 seconds on a Dell XPS 15 9500, simulating a client machine. On the other hand, verification on an hpc7a.96xlarge, available on AWS, is completed in just 123 milliseconds using a mode where the verification is cheaper.

There is another mode with a more expansive verification, in this setting the proof generation only takes 2.5 seconds on the same laptop and verification takes 467 milliseconds on the same AWS instance.

Enhanced GPU support

This release introduces support for signed integer operations on GPU, as well as:

  • unsigned and signed scalar multiplication,
  • unsigned and signed encrypted shift and rotate,
  • unsigned overflowing subtraction. 

Cross-language support is now possible thanks to the new C API that wraps integer arithmetics on GPU.

Performance improvements are also brought in this release: the multi-bit PBS (a.k.a. multithreaded PBS) support has been stabilized and is now recommended for GPU users, as it is significantly faster than the classical PBS. It is indeed an algorithm for the PBS that exposes more parallelism, hence why it performs better on GPU than on CPU. Here is an example of how to use it:

use tfhe::{ConfigBuilder, set_server_key, FheUint8, ClientKey, CompressedServerKey};
use tfhe::prelude::*;
use tfhe::shortint::parameters::PARAM_GPU_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS;

fn main() {

    let config = ConfigBuilder::with_custom_parameters(PARAM_GPU_MULTI_BIT_MESSAGE_2_CARRY_2_GROUP_3_KS_PBS, None).build();

    let client_key= ClientKey::generate(config);
    let compressed_server_key = CompressedServerKey::new(&client_key);

    let gpu_key = compressed_server_key.decompress_to_gpu();

    let clear_a = 27u8;
    let clear_b = 128u8;

    let a = FheUint8::encrypt(clear_a, &client_key);
    let b = FheUint8::encrypt(clear_b, &client_key);

    //Server-side

    set_server_key(gpu_key);
    let result = a + b;

    //Client-side
    let decrypted_result: u8 = result.decrypt(&client_key);

    let clear_result = clear_a + clear_b;

    assert_eq!(decrypted_result, clear_result);
}

Additionally, H100 GPUs have become increasingly easy and cheap to access with the rise of LLM training and inference, and offer much more compute throughput than the V100 GPUs targeted previously. H100 support has been enhanced in TFHE-rs v0.6, and these GPUs are now targeted in the reference benchmark results, summarized in Table 1.

On a single H100, the GPU performance is now very close to the performance of the high-end CPU used as a reference.

Miscellaneous

The latest version of TFHE-rs also includes new operations, new noise distributions and some other enhancements:

  • Support of leading/trailing zeros/ones and [.c-inline-code]log2[.c-inline-code];
  • Implementation of checked division, returning an encrypted flag indicating whether the divisor is equal to 0 or not;
  • Improvement of multiplication speed by 8% now running in 366 ms for 64 bit integers;
  • Introduction of a counter to track the number of PBS executions;
  • Support for the TUniform noise distribution has been added.

For the forthcoming release, the focus will shift to reducing the size of ciphertexts and introducing support for multi-GPU computations to further enhance performance.

Additional links

Read more related posts