CryptDB: Protecting Confidentiality with Encrypted Query Processing (SOSP '11)
# 1. Problem
Provide confidentiality for applications using DBMSes to solve two threats:
- Passive DB server attacks. The curious database administrator (DBA) learns private data.
- Any attacks on all servers. The adversary gains complete control over application and DBMS servers.
# 2. Challenges
- Tension between confidentiality and efficiency. (Threat 1 + 2)
- too slow or not enough confidentiality
- incapable of executing SQL queries without giving keys to DBMS servers
- Support user-shared data while minimizing data leakage when both application servers and DBMS servers are compromised.
- “Different keys for different users” alone is not suitable for user-shared data.
# 3. Solutions
How to solve Challenge 1 (Threat 1+2) while still retaining data sharing (Challenge 2)?
- No plaintext in database at all. (eliminate Threat 1)
- Fine-grind keys chained to user passwords. (eliminate Threat 2 and Challenge 2)
# 3.1 Solution to Threat 1
How to deal with Passive DB server attacks? No plaintext in database at all.
# 3.1.1 Execute SQL queries over encrypted data
Six types of SQL-aware encryption
SQL-aware encryption strategy with symmetric-key encryption (for efficiency).
- Random (RND)
- support indistinguishability under an adaptive chosen-plaintext attack (IND-CPA), same plain texts -> different cipher texts
- AES / Blowfish in CBC mode, a random initialization vector (IV)
- Deterministic (DET)
- support equality checks, same plain texts -> same cipher texts
- different keys for different columns to prevent cross-column correlations
- Pseudo-random permutation (PRP), Blowfish for 64-bit block, AES for 128-bit block
- AES in CMC mode (= one round of CBC + another round of CBC in reverse order), zero IV. (to prevent leakage of prefix equality in CBC mode)
- Order-preserving (OPE)
- support order relations. If x < y, then OPEk(x) < OPEk(y) for any secret key K.
- Weaker. Thus, OPE-encrypted columns are revealed to the server only when the order query is needed.
- Implementation and optimization of Order-preserving symmetric encryption. Use AVL BST for batch encryption. 25 ms/encryption -> 7 ms/encryption
- Homomorphic encryption(HOM)
- retains IND-CPA while allows computations to be conducted on ciphertext and obtain an encrypted result which is the ciphertext of the result of operations performed on the plaintext.
- UDF calling Paillier crytosystem
- e.g. SUM: HOMk(x)*HOMk(y)=HOMk(x+y)
- Join (JOIN and OPE-JOIN)
- support joins between two columns (because of DET)
- Word search (SEARCH)
- support LIKE operation
- A new implementation of the protocol from Practical Techniques for Searches on Encrypted Data
- split texts to keywords
- remove repetitions
- randomly permute the positions of the words
- encrypt each of the words
- paddle each word to the same size
- send the encrypted words to the server as a token
- Server can only know whether the token matches.
# 3.1.2 Adjustable query-based encryption
However, the query set is not always known in advance. So we need to dynamically adjust a layer of SQL-aware encryption scheme for queries at runtime. Onions of encryption: different keys for different layers of onions. The proxy will not give all keys to the server at any time.
- Encrypt data in one or more onions
- Multiple onions are needed in practice
- Onion decryption (stripe-off) happens only when operations on a column are required
- Once a layer of decryption happens, the layer remains its new state
keys for each layer are calculated by
MK is the proxy’s Master Key and only intended for a single principal (user).
# a) Read query execution
- Application server issues a query
- SELECT ID FROM Employees WHERE Name = ‘Alice’
- Proxy server anonymizes the query, and adjust encryption layers if stripe-off is needed
- UPDATE Table1 SET C2-Eq = DECRYPT RND(KT1,C2,Eq,RND, C2-Eq, C2-IV)
- DBMS server execute the adjusted query and return the results
- SELECT C1-Eq, C1-IV FROM Table1 WHERE C2-Eq = x7…d
- Proxy server decrypts the results and sends them to the user
- No need to decrypt data in DBMS when a new similar type of query is issued
# b) Write query execution
- INSERT and UPDATE queries make the proxy encrypts data to the layer that has not yet been stripped off.
- Addition and direct comparison should not be allowed simultaneously.
- If a column is incremented and then only projected, DBMS operates on add onion.
- If a column is used in comparisons after increasing, proxy calculates the final value and updates it once and for all.
At this time, we can defend against Threat 1 (passive DB server attacks) by not exposing any plaintext to the DBMS server while most SQL queries can still be supported.
# 3.2 Solution to Threat 2 and Challenge 2
How to deal with any attacks on all servers while still retaining data-sharing among users? The proxy’s Master Key (MK) is not adequate for multi-users. The solution is to chain encryption keys to user passwords with the specification of key access policy among users.
# 3.2.1 Chain encryption keys to user passwords
- What does the key derives from?
- One Single principal uses Global proxy Master Key (MK).
- Multiple principals chain encryption keys to user passwords.
# Define principal
A principal is an entity such as a user or a group, over which it is natural to specify an access policy.
- external principals: real users authenticated with their passwords
- internal principals: entities whose privileges are acquired through delegations
# Annotate the schema with access control policy
We need to support data-sharing, so the access control policy should be specified to the schema.
- ENC_FOR(Sensitive-Column1, PrincipalA), A has access to column1.
- (principalA typeX) SPEAKS_FOR(principalB typeY), A has access to all keys that b has access to.
# Chain encryption keys
- Each principal is associated with a secret and random-generated key
- symmetric key (online) + public/private key pair (off-line)
- External principal’s random-generated symmetric key and private key is encrypted with user’s password and stored in external_keys
- If B speaks for A, A’s key is encrypted with B’s key and stored in access_keys table
- When a user logs in, the proxy loads all the key the user has access to.
- When a user logs out, the proxy delete all the user’s keys in the memory.
At this time, threat 2 is successfully conquered while the data-sharing feature is retained.
# 4. Implementation & Evaluation
# 4.1 Implementation
- No change to the DBMS and applications
- Multi-user keys: annotations and login/logout
# 4.2 Evaluation
- Support most queries
- Exceptional confidentiality
- Most columns at RND layer
- Most columns at OPE analyzed were less sensitive
- Throughput loss 26%
# 5. Conclusion
- First practical DBMS to process most SQL queries on encrypted data
- Hide DB from DBA, outsource DB
- Protects data of users logged out during attack, even when all servers are compromised
- Limit leakage from compromised applications
- Modest overhead: 26% throughput loss for TPC-C
- No changes to DBMS (e.g., Postgres, MySQL)