CryptDB: Protecting Confidentiality with Encrypted Query Processing (SOSP '11)

2012-11-26 19:45

MIT CSAIL, papers, slides

# 1. Problem

Provide confidentiality for applications using DBMSes to solve two threats:

  1. Passive DB server attacks. The curious database administrator (DBA) learns private data.
  2. Any attacks on all servers. The adversary gains complete control over application and DBMS servers.

# 2. Challenges

  1. Tension between confidentiality and efficiency. (Threat 1 + 2)
    • too slow or not enough confidentiality
    • incapable of executing SQL queries without giving keys to DBMS servers
  2. Support user-shared data while minimizing data leakage when both application servers and DBMS servers are compromised.
    • “Different keys for different users” alone is not suitable for user-shared data.

# 3. Solutions

How to solve Challenge 1 (Threat 1+2) while still retaining data sharing (Challenge 2)?

Architecture Overview

architecture overview

  1. No plaintext in database at all. (eliminate Threat 1)
  2. Fine-grind keys chained to user passwords. (eliminate Threat 2 and Challenge 2)

# 3.1 Solution to Threat 1

How to deal with Passive DB server attacks? No plaintext in database at all.

# 3.1.1 Execute SQL queries over encrypted data

Six types of SQL-aware encryption

SQL-aware encryption strategy with symmetric-key encryption (for efficiency).

  1. Random (RND)
    • support indistinguishability under an adaptive chosen-plaintext attack (IND-CPA), same plain texts -> different cipher texts
    • AES / Blowfish in CBC mode, a random initialization vector (IV)
  2. Deterministic (DET)
    • support equality checks, same plain texts -> same cipher texts
    • different keys for different columns to prevent cross-column correlations
    • Pseudo-random permutation (PRP), Blowfish for 64-bit block, AES for 128-bit block
    • AES in CMC mode (= one round of CBC + another round of CBC in reverse order), zero IV. (to prevent leakage of prefix equality in CBC mode)
  3. Order-preserving (OPE)
    • support order relations. If x < y, then OPEk(x) < OPEk(y) for any secret key K.
    • Weaker. Thus, OPE-encrypted columns are revealed to the server only when the order query is needed.
    • Implementation and optimization of Order-preserving symmetric encryption. Use AVL BST for batch encryption. 25 ms/encryption -> 7 ms/encryption
  4. Homomorphic encryption(HOM)
    • retains IND-CPA while allows computations to be conducted on ciphertext and obtain an encrypted result which is the ciphertext of the result of operations performed on the plaintext.
    • UDF calling Paillier crytosystem
    • e.g. SUM: HOMk(x)*HOMk(y)=HOMk(x+y)
  5. Join (JOIN and OPE-JOIN)
    • support joins between two columns (because of DET)
    • TODO
  6. Word search (SEARCH)
    • support LIKE operation
    • A new implementation of the protocol from Practical Techniques for Searches on Encrypted Data
    • Proxy
      1. split texts to keywords
      2. remove repetitions
      3. randomly permute the positions of the words
      4. encrypt each of the words
      5. paddle each word to the same size
      6. send the encrypted words to the server as a token
    • Server can only know whether the token matches.
# 3.1.2 Adjustable query-based encryption

However, the query set is not always known in advance. So we need to dynamically adjust a layer of SQL-aware encryption scheme for queries at runtime. Onions of encryption: different keys for different layers of onions. The proxy will not give all keys to the server at any time.

onion encryption

table

  • Encrypt data in one or more onions
  • Multiple onions are needed in practice
  • Onion decryption (stripe-off) happens only when operations on a column are required
  • Once a layer of decryption happens, the layer remains its new state

keys for each layer are calculated by

keys

MK is the proxy’s Master Key and only intended for a single principal (user).

# a) Read query execution
  1. Application server issues a query
    • SELECT ID FROM Employees WHERE Name = ‘Alice’
  2. Proxy server anonymizes the query, and adjust encryption layers if stripe-off is needed
    • UPDATE Table1 SET C2-Eq = DECRYPT RND(KT1,C2,Eq,RND, C2-Eq, C2-IV)
  3. DBMS server execute the adjusted query and return the results
    • SELECT C1-Eq, C1-IV FROM Table1 WHERE C2-Eq = x7…d
  4. Proxy server decrypts the results and sends them to the user
  5. No need to decrypt data in DBMS when a new similar type of query is issued
# b) Write query execution
  • INSERT and UPDATE queries make the proxy encrypts data to the layer that has not yet been stripped off.
  • Addition and direct comparison should not be allowed simultaneously.
    • If a column is incremented and then only projected, DBMS operates on add onion.
    • If a column is used in comparisons after increasing, proxy calculates the final value and updates it once and for all.

At this time, we can defend against Threat 1 (passive DB server attacks) by not exposing any plaintext to the DBMS server while most SQL queries can still be supported.

# 3.2 Solution to Threat 2 and Challenge 2

How to deal with any attacks on all servers while still retaining data-sharing among users? The proxy’s Master Key (MK) is not adequate for multi-users. The solution is to chain encryption keys to user passwords with the specification of key access policy among users.

# 3.2.1 Chain encryption keys to user passwords
  • What does the key derives from?
    1. One Single principal uses Global proxy Master Key (MK).
    2. Multiple principals chain encryption keys to user passwords.
# Define principal

A principal is an entity such as a user or a group, over which it is natural to specify an access policy.

  • external principals: real users authenticated with their passwords
  • internal principals: entities whose privileges are acquired through delegations
# Annotate the schema with access control policy

We need to support data-sharing, so the access control policy should be specified to the schema.

  • ENC_FOR(Sensitive-Column1, PrincipalA), A has access to column1.
  • (principalA typeX) SPEAKS_FOR(principalB typeY), A has access to all keys that b has access to.

Access Control Policy Annotation

# Chain encryption keys
  • Each principal is associated with a secret and random-generated key
    • symmetric key (online) + public/private key pair (off-line)
    • External principal’s random-generated symmetric key and private key is encrypted with user’s password and stored in external_keys
  • If B speaks for A, A’s key is encrypted with B’s key and stored in access_keys table
  • When a user logs in, the proxy loads all the key the user has access to.
  • When a user logs out, the proxy delete all the user’s keys in the memory.

At this time, threat 2 is successfully conquered while the data-sharing feature is retained.

# 4. Implementation & Evaluation

# 4.1 Implementation

  • No change to the DBMS and applications
  • Portable
  • Multi-user keys: annotations and login/logout

# 4.2 Evaluation

  • Support most queries
  • Exceptional confidentiality
    • Most columns at RND layer
    • Most columns at OPE analyzed were less sensitive
  • Throughput loss 26%

# 5. Conclusion

  1. First practical DBMS to process most SQL queries on encrypted data
    • Hide DB from DBA, outsource DB
  2. Protects data of users logged out during attack, even when all servers are compromised
    • Limit leakage from compromised applications
  3. Modest overhead: 26% throughput loss for TPC-C
  4. No changes to DBMS (e.g., Postgres, MySQL)
© 2010-2018 Tian
Built with ❤️ in San Francisco