Bootstrapping TFHE ciphertexts in less than one millisecond

September 17, 2025

—

Agnes Leroy

Buckle up. The team at Zama is thrilled to announce that we have broken the 1 millisecond barrier for TFHE bootstrapping; latency is now measured in microseconds on GPU, while maintaining the same security level and probability of failure, for 4-bit messages. In this blog post, we look back at how far we've come to reach this milestone.

Fully Homomorphic Encryption (FHE) makes it possible to apply an arbitrary number of operations on encrypted data. This is possible thanks to a special operation named bootstrapping. It is the main performance bottleneck: for FHE to reach widespread use, its latency and throughput have to be pushed to their limits. Only then will computation on encrypted data reach latencies and throughput similar to cleartext computation.

At Zama, efforts have been placed on the acceleration of the TFHE programmable bootstrap. It is at the heart of all the operations in TFHE-rs: not only does it reset the noise in a ciphertext, but it can also apply a function onto it. This is very powerful to build general purpose arithmetic on encrypted data.

The first bootstrap implementation at Zama took 53 ms on CPU, with 128 bits of security and a probability of failure of 2^-128 for 4 bit messages. Today, it is our pleasure to announce that the 1 millisecond frontier has been crossed, and the TFHE bootstrap latency is now measured in microseconds on GPU, while maintaining the same security level and probability of failure, for 4 bit messages. Let’s go back in time to see how this occurred.

Accelerating the bootstrap

Fully Homomorphic Encryption (FHE) makes it possible to apply an arbitrary number of operations on encrypted data. This is possible thanks to a special operation named bootstrapping. This operation was invented by Craig Gentry in 2009, relying on lattice-based cryptography: the main idea is to evaluate the decryption circuit homomorphically. At the time, the estimate was that it would take up to 30 minutes to compute.

In 2018 the TFHE bootstrapping was introduced [Chillotti2020], following up on a previous scheme named FHEW. With most other FHE schemes, one bootstrap deals with thousands of messages at once, but when all you need is to bootstrap one or a few, you still pay the price for the whole batch. The TFHE bootstrap latency, however, is very good: as mentioned in the introduction, the first Zama implementation of this bootstrap was taking 53 ms.

For FHE computations to become seamless, both latency and throughput of the bootstrap have to come as close as possible to cleartext calculations. TFHE opened the door to low latency for a single or few bootstraps, which was not possible before.

Zama has been working on GPU acceleration of the TFHE bootstrap almost since its beginning, betting that there was a way to make it faster on GPU. The bootstrap algorithm is highly sequential, making it badly suited for a GPU. Still, little by little, the bootstrap time has been pushed down. In 2024, one TFHE bootstrap took only 2 ms on one H100 GPU, 26 times faster than the original measurement on CPU.

This was achieved thanks to the use of an alternative algorithm for the TFHE bootstrap: the multi-bit algorithm [Zhou2018][Joye2022], that offers more parallelism. That algorithm can also be implemented on CPU to achieve better latencies, but then the throughput is degraded significantly. On GPU, this algorithm is very well suited in the sense that it reduces the latency significantly while maintaining throughput. After the first implementation, many low level optimizations were implemented to make best use of GPU resources and maximize parallelism. Little by little, performance improved.

Between 2021 and 2024, the security level had changed: TFHE-rs is now IND-CPA^D secure, but in 2021 Concrete was only IND-CPA secure, as the related attack was not known yet. Covering IND-CPA^D attacks with 128 bits of security required changing the cryptographic parameters and introducing new techniques to mitigate the attack [Bernard2025, Ruijter2025]. This had a strong effect on performance, and was mitigated by optimizations and by new cryptographic techniques to reduce the noise level after a bootstrap.

Still, 2 ms was too slow. For the past few months, the GPU team at Zama has been focusing on improving the bootstrap performance further. In particular, an implementation specialized at compile time for blockchain cryptographic parameters was introduced. Having more variables known at compile time reduces register pressure in the GPU, and combining this with fine tuned optimizations it was possible to achieve significant performance improvement.

The bootstrap now takes around 800 µs on one GPU, with 128 bits IND-CPA^D security.

This bootstrap encrypts two bits of message to deal with booleans and uses Gaussian noise: this is considered as the reference in the literature. In practice, in TFHE-rs a bootstrap that encrypts 4 bits of message with a TUniform noise distribution is used for blockchain: with these parameters the bootstrap takes 945 µs.

Benchmarks

Below comes a comparison of the current GPU implementation of the bootstrap vs the original CPU one from 2021. Parameters are for IND-CPA^D security, i.e. 128 bit of security and a failure probability of 2^-128 or less, and a TUniform noise distribution.


Latency	Booleans	4-bit Integers (what we use today)
2021	19 ms	53 ms
2025	796 µs	945 µs
Speedup	24×	56×

Booleans
2021	19 ms
2025	796 µs
Speedup	24×

4-bit Integers (what we use today)
2021	53 ms
2025	945 µs
Speedup	56×

Latency of the bootstrap on CPU and GPU: the CPU latency is measured with Concrete-core 0.1.10 from 2021. This is to put in perspective the current GPU latency. Ciphertexts are encrypted using a Gaussian noise distribution, for 128 bits of security and a probability of failure of 2^-128. GPU results were measured on the Nebius platform with 1xH100, CPU results were measured on AWS on an hpc7a.96xlarge instance.

For the original TFHE boolean bootstrap, we achieve a 24x improvement. For 4-bit integers, which is what we use today in all our products, we have a 56x improvement.

What’s also very interesting with TFHE is that computing large batches of bootstraps on multiple GPUs is very straightforward: it’s as simple as copying chunks of inputs to different GPUs and bootstrapping them independently. It does not require synchronization or cooperation between GPUs to perform one bootstrap. The throughput can thus reach 189K bootstraps per second on a single node with 8xH100 GPUs for 4 bit Integers, as shown in the Table below.


Throughput	Booleans	4-bit Integers (what we use today)
2021	135 PBS/s	74 PBS/s
2025	223,440 PBS/s	189,000 PBS/s
Improvement	1,655×	2,554×

Booleans
2021	135 PBS/s
2025	223,440 PBS/s
Improvement	1,655×

4-bit Integers (what we use today)
2021	74 PBS/s
2025	189,000 PBS/s
Improvement	2,554×

Throughput of the bootstrap on CPU and GPU: the CPU throughput is measured with Concrete-core 0.1.10 from 2021. Ciphertexts are encrypted using a Gaussian noise distribution, for 128 bits of security and a probability of failure of 2^-128. GPU results were measured on the Nebius platform with 8×H100, CPU results were measured on AWS on an hpc7a.96xlarge instance.

The effect on large integer (FheUint) operations

The latency of one bootstrap is a good indicator of FHE performance, but real use cases rarely involve the computation of a single bootstrap. This is why the current GPU implementation in TFHE-rs is not latency oriented, neither throughput oriented, but provides a good tradeoff for the two. This is important to accelerate higher level operations, like an addition or a multiplication of ciphertexts encrypting 32 or 64 bit messages. Further performance improvements could be achieved by having specialized implementations for latency & throughput. The advantage of the current approach is that it provides a strong basis to start this new journey.

With the current implementation, very good latencies can be achieved for the addition and multiplication of ciphertexts encrypting 64 bit integers. Currently, on a single node with 8xH100, the addition of two 64-bit encrypted messages takes 8.7 ms, and their multiplication takes 32 ms, as shown in the Table below:


Latency	64-bit encrypted addition	64-bit encrypted multiplication
2022	2 s	13 s
2025	8.7 ms	32 ms
Improvement	230×	406×

64-bit encrypted addition
2022	2 s
2025	8.7 ms
Improvement	230×

64-bit encrypted multiplication
2022	13 s
2025	32 ms
Improvement	406×

Latency of the 64-bit encrypted addition and multiplication on CPU and GPU: the CPU latency is measured with a version of Concrete from December 2022. Ciphertexts are encrypted using a TUniform noise distribution, for 128 bits of security and a probability of failure of 2^-128. GPU results were measured on the Nebius platform with 8xH100, CPU results were measured on AWS on an hpc7a.96xlarge instance.

The full table of benchmarks will be made public when the next TFHE-rs version is released, stay tuned for updates!

We expect this latest achievement shall have a tremendous impact on the adoption of FHE in the industry, particularly in blockchain applications. Bear in mind that in such applications, FHE computation is not the only bottleneck: network communication, MPC protocols, data exchanges, zero knowledge proofs also come into play. Still, FHE performance has never been closer to cleartext computation. And this is only the beginning, as dedicated accelerators are expected to go beyond GPU performance.

Bibliography

Chillotti, I., Gama, N., Georgieva, M. et al. (2020) TFHE: Fast Fully Homomorphic Encryption Over the Torus. J Cryptol 33, 34–91. https://doi.org/10.1007/s00145-019-09319-x

Zhou, T., Yang, X., Liu, L., Zhang, W. and Li, N., (2018) Faster bootstrapping with multiple addends, IEEE Access, volume 6, pages 49868-49876. https://eprint.iacr.org/2017/735.pdf

Joye, M., Paillier, P. (2022). Blind Rotation in Fully Homomorphic Encryption with Extended Keys. In: Dolev, S., Katz, J., Meisels, A. (eds) Cyber Security, Cryptology, and Machine Learning. CSCML 2022. Lecture Notes in Computer Science, vol 13301. Springer, Cham. https://doi.org/10.1007/978-3-031-07689-3_1

Bernard, O., Joye, M., Smart, N. P. and Walter, M., (2025) Drifting towards better error probabilities in fully homomorphic encryption schemes, In S. Fehr and P.-A. Fouque, Eds., Advances in Cryptology – EUROCRYPT 2025, Part VIII, vol. 15608 of Lecture Notes in Computer Science, pp. 181-211, Springer, https://doi.org/10.1007/978-3-031-91101-9_7
De Ruijter, T., D'Anvers, J.-P. and Verbauwhede, I. (2025) Don’t be mean: Reducing Approximation Noise in TFHE through Mean Compensation, https://eprint.iacr.org/2025/809

Latest Blog Posts

Bootstrapping TFHE ciphertexts in less than one millisecond

Hardware

The Zama team is happy to announce that the 1 millisecond frontier for a TFHE bootstrap has been crossed.

Zama Bounty Program Season 10: Create a “Hello FHEVM” Tutorial

Announcements

Create a complete, reproducible dApp example that helps new developers ship their first confidential application using FHEVM.

Launching the Zama Developer Program to support developers interested in building the next blockchain primitive with FHE.

Announcements

A way to support, recognize, reward and fast track the builders who choose primitives over trends.

Read more →

Back to blog

Privacy is necessary for an open society in the electronic age. Privacy is not secrecy. A private matter is something one doesn't want the whole world to know, but a secret matter is something one doesn't want anybody to know. Privacy is the power to selectively reveal oneself to the world.If two parties have some sort of dealings, then each has a memory of their interaction. Each party can speak about their own memory of this; how could anyone prevent it? One could pass laws against it, but the freedom of speech, even more than privacy, is fundamental to an open society; we seek not to restrict any speech at all. If many parties speak together in the same forum, each can speak to all the others and aggregate together knowledge about individuals and other parties. The power of electronic communications has enabled such group speech, and it will not go away merely because we might want it to.Since we desire privacy, we must ensure that each party to a transaction have knowledge only of that which is directly necessary for that transaction. Since any information can be spoken of, we must ensure that we reveal as little as possible. In most cases personal identity is not salient. When I purchase a magazine at a store and hand cash to the clerk, there is no need to know who I am. When I ask my electronic mail provider to send and receive messages, my provider need not know to whom I am speaking or what I am saying or what others are saying to me; my provider only need know how to get the message there and how much I owe them in fees. When my identity is revealed by the underlying mechanism of the transaction, I have no privacy. I cannot here selectively reveal myself; I must always reveal myself.Therefore, privacy in an open society requires anonymous transaction systems. Until now, cash has been the primary such system. An anonymous transaction system is not a secret transaction system. An anonymous system empowers individuals to reveal their identity when desired and only when desired; this is the essence of privacy.Privacy in an open society also requires cryptography. If I say something, I want it heard only by those for whom I intend it. If the content of my speech is available to the world, I have no privacy. To encrypt is to indicate the desire for privacy, and to encrypt with weak cryptography is to indicate not too much desire for privacy. Furthermore, to reveal one's identity with assurance when the default is anonymity requires the cryptographic signature.We cannot expect governments, corporations, or other large, faceless organizations to grant us privacy out of their beneficence. It is to their advantage to speak of us, and we should expect that they will speak. To try to prevent their speech is to fight against the realities of information. Information does not just want to be free, it longs to be free. Information expands to fill the available storage space. Information is Rumor's younger, stronger cousin; Information is fleeter of foot, has more eyes, knows more, and understands less than Rumor.We must defend our own privacy if we expect to have any. We must come together and create systems which allow anonymous transactions to take place. People have been defending their own privacy for centuries with whispers, darkness, envelopes, closed doors, secret handshakes, and couriers. The technologies of the past did not allow for strong privacy, but electronic technologies do.We the Cypherpunks are dedicated to building anonymous systems. We are defending our privacy with cryptography, with anonymous mail forwarding systems, with digital signatures, and with electronic money.Cypherpunks write code. We know that someone has to write software to defend privacy, and since we can't get privacy unless we all do, we're going to write it. We publish our code so that our fellow Cypherpunks may practice and play with it. Our code is free for all to use, worldwide. We don't much care if you don't approve of the software we write. We know that software can't be destroyed and that a widely dispersed system can't be shut down.Cypherpunks deplore regulations on cryptography, for encryption is fundamentally a private act. The act of encryption, in fact, removes information from the public realm. Even laws against cryptography reach only so far as a nation's border and the arm of its violence. Cryptography will ineluctably spread over the whole globe, and with it the anonymous transactions systems that it makes possible.For privacy to be widespread it must be part of a social contract. People must come and together deploy these systems for the common good. Privacy only extends so far as the cooperation of one's fellows in society. We the Cypherpunks seek your questions and your concerns and hope we may engage you so that we do not deceive ourselves. We will not, however, be moved out of our course because some may disagree with our goals.The Cypherpunks are actively engaged in making the networks safer for privacy. Let us proceed together apace.Onward.Eric Hughes9 March 1993