• realtime / low-latency typeahead and autocomplete service for social networks, like Linkedin or Facebook
  • search social profiles with prefixes
  • newly added account appear instantly in the scope of the search
  • not for “query autocomplete” (like the Google search-box dropdown), but for displaying actual search results, including
    • generic typeahead: network-agnostic results from a global ranking scheme like popularity.
    • network typeahead: results from user’s 1st and 2nd-degree network connections, and People You May Know scores.

Linkedin Search


Multi-layer architecture

  • browser cache
  • web tier
  • result aggregator
  • various typeahead backend

Cleo Architecture

Result Aggregator

The abstraction of this problem is to find documents by prefixes and terms in a very large number of elements. The solution leverages these four major data structures:

  1. InvertedIndex<prefixes or terms, documents>: given any prefix, find all the document ids that contain the prefix.
  2. for each document, prepare a BloomFilter<prefixes or terms>: with user typing more, we can quickly filter out documents that do not contain the latest prefixes or terms, by check with their bloom filters.
  3. ForwardIndex<documents, prefixes or terms>: previous bloom filter may return false positives, and now we query the actual documents to reject them.
  4. scorer(document):relevance: Each partition return all of its true hits and scores. And then we aggregate and rank.


  • generic typeahead: latency <= 1 ms within a cluster
  • network typeahead (very-large dataset over 1st and 2nd degree network): latency <= 15 ms
  • aggregator: latency <= 25 ms

Acquisition Efficiency Problem:How to achieve a better ROI in advertising?

In details, Lyft’s advertisements should meet requirements as below:

  1. being able to manage region-specific ad campaigns
  2. guided by data-driven growth: The growth must be scalable, measurable, and predictable
  3. supporting Lyft’s unique growth model as shown below

lyft growth model

However, the biggest challenge is to manage all the processes of cross-region marketing at scale, which include choosing bids, budgets, creatives, incentives, and audiences, running A/B tests, and so on. You can see what occupies a day in the life of a digital marketer:


We can find out that execution occupies most of the time while analysis, thought as more important, takes much less time. A scaling strategy will enable marketers to concentrate on analysis and decision-making process instead of operational activities.

Solution: Automation

To reduce costs and improve experimental efficiency, we need to

  1. predict the likelihood of a new user to be interested in our product
  2. evaluate effectively and allocate marketing budgets across channels
  3. manage thousands of ad campaigns handily

The marketing performance data flows into the reinforcement-learning system of Lyft: Amundsen

The problems that need to be automated include:

  1. updating bids across search keywords
  2. turning off poor-performing creatives
  3. changing referrals values by market
  4. identifying high-value user segments
  5. sharing strategies across campaigns


Lyft Symphony Architecture

The tech stack includes - Apache Hive, Presto, ML platform, Airflow, 3rd-party APIs, UI.

Main components

Lifetime Value(LTV) forecaster

The lifetime value of a user is an important criterion to measure the efficiency of acquisition channels. The budget is determined together by LTV and the price we are willing to pay in that region.

Our knowledge of a new user is limited. The historical data can help us to predict more accurately as the user interacts with our services.

Initial eigenvalue:


The forecast improves as the historical data of interactivity accumulates:

根据历史记录判断 LTV

Budget allocator

After LTV is predicted, the next is to estimate budgets based on the price. A curve of the form LTV = a * (spend)^b is fit to the data. A degree of randomness will be injected into the cost-curve creation process in order to converge a global optimum.



Bidders are made up of two parts - the tuners and actors. The tuners decide exact channel-specific parameters based on the price. The actors communicate the actual bid to different channels.

Some popular bidding strategies, applied in different channels, are listed as below:



We have to value human experiences in the automation process; otherwise, the quality of the models may be “garbage in, garbage out”. Once saved from laboring tasks, marketers can focus more on understanding users, channels, and the messages they want to convey to audiences, and thus obtain better ad impacts. That’s how Lyft can achieve a higher ROI with less time and efforts.


  • for guests
    • search rooms by locations, dates, number of rooms, and number of guests
    • get room details (like picture, name, review, address, etc.) and prices
    • pay and book room from inventory by date and room id
      • checkout as a guest
      • user is logged in already
    • notification via Email and mobile push notification
  • for hotel or rental administrators (suppliers/hosts)
    • administrators (receptionist/manager/rental owner): manage room inventory and help the guest to check-in and check out
    • housekeeper: clean up rooms routinely



Inventory <> Bookings <> Users (guests and hosts)

Suppliers provide their room details in the inventory. And users can search, get, and reserve rooms accordingly. After reserving the room, the user’s payment will change the status of the reserved_room as well. You could check the data model in this post.

How to find available rooms?

  • by location: geo-search with spatial indexing, e.g. geo-hash or quad-tree.
  • by room metadata: apply filters or search conditions when querying the database.
  • by date-in and date-out and availability. Two options:
    • option 1: for a given room_id, check all occupied_room today or later, transform the data structure to an array of occupation by days, and finally find available slots in the array. This process might be time-consuming, so we can build the availability index.
    • option 2: for a given room_id, always create an entry for an occupied day. Then it will be easier to query unavailable slots by dates.

For hotels, syncing data

If it is a hotel booking system, then it will probably publish to Booking Channels like GDS, Aggregators, and Wholesalers.

Hotel Booking Ecosystem

To sync data across those places. We can

  1. retry with idempotency to improve the success rate of the external calls and ensure no duplicate orders.
  2. provide webhook callback APIs to external vendors to update status in the internal system.

Payment & Bookkeeping

Data model: double-entry bookkeeping

To execute the payment, since we are calling the external payment gateway, like bank or Stripe, Braintree, etc. It is crucial to keep data in-sync across different places. We need to sync data across the transaction table and external banks and vendors.

Notifier for reminders / alerts

The notification system is essentially a delayer scheduler (priority queue + subscriber) plus API integrations.

For example, a daily cronjob will query the database for notifications to be sent out today and put them into the priority queue by date. The subscriber will get the earliest ones from the priority queue and send out if reaching the expected timestamp. Otherwise, put the task back to the queue and sleep to make the CPU idle for other work, which can be interrupted if there are new alerts added for today.


  1. High-performance, distributed key-value store
  • Why distributed?
    • Answer: to hold a larger size of data
  1. For in-memory storage of small data objects
  2. Simple server (pushing complexity to the client) and hence reliable and easy to deploy


Big Picture: Client-server

  • client
  • given a list of Memcached servers
  • chooses a server based on the key
  • server
  • store KVs into the internal hash table
  • LRU eviction

The Key-value server consists of a fixed-size hash table + single-threaded handler + coarse locking

hash table

How to handle collisions? Mostly three ways to resolve:

  1. Separate chaining: the collided bucket chains a list of entries with the same index, and you can always append the newly collided key-value pair to the list.
  2. open addressing: if there is a collision, go to the next index until finding an available bucket.
  3. dynamic resizing: resize the hash table and allocate more spaces; hence, collisions will happen less frequently.

How does the client determine which server to query?

See Data Partition and Routing

How to use cache?

See Key value cache

How to further optimize?

See How Facebook Scale its Social Graph Store? TAO

Why bookkeeping?

Everyone has advice about how to manage money. Search Google for “manage money,” and you’ll get back over 1,690,000,000 links. You’ll find tons of life-hacking or self-help articles and books. You’ll find professional coaches or courses witch will coach you for a fee. You’ll find financial and investment services. Feel free to try what appeals to you and grow your assets through trials and errors.

I think the most important thing to remember is that asking the question of money comes from fear and self-doubt. We all fear change. We all doubt our ability to make more money.

Instead of spending time worrying and doubting, focusing the opposite — your confidence. If you are playing poker and with few chips, you can only make small bets and only win a small amount of money. When you have a lot of chips, you can make big bets and win big. You have more room for taking risks. You can try things which you cannot try when you have fewer chips.

Here is the magic - by understanding more of your financial status, you gain confidence! With more confidence, we can make better judgment and would like to bet the best amount for more significant success, and then win more.


Know your expenses and plan for next spending

Where is the end of the wining-more concern? People often talk about the buzzword of financial freedom. However, talk is cheap, but bookkeeping precisely answers the question.

Four main financial statements for the overview of financial status

Unfortunately, bookkeeping is not easy in our modern life. We are in a new age of abundance. We have a lot of accounts - cash, bank accounts, payment apps, credit cards, stock or crypto broker accounts, discount cards, … We have assets like houses, cars, gold, jewelry, … To make things even worse, some of us may live across countries and have to deal with different currencies. How could we draw an accurate map of our financial life and navigate through the future uncertainties?

By “accurate map of our financial life,” I mean these four primary financial statements:

  1. Income Statement: It shows how much revenue we earned over a specific period. This statement is usually considered the most important of the financial statements because it reflects the operating results.
  2. Balance Sheets: It answers how much assets, liabilities, and equity of the entity we have. This statement is the second most important because it reports the liquidity and capitalization of our assets.
  3. Cash Flow Statements: It reports our inflows and outflows of cash and answers whether we generated cash. We need enough cash on hand to pay expenses and purchase assets.
  4. Equity Statement: This is not helpful for your personal accounting. However, for a company, this statement reports how it distributes equity among stakeholders.

Income Statement

Balance Sheet

With, you can quickly generate statements like the above. But wait… How to prepare data for these statements?

Double-entry Bookkeeping for Correctness

To ensure the accuracy and internalize the error detection into the system, double-entry bookkeeping requires every entry to an account has at-least a corresponding entry to a different account. One transaction involves at least two accounts with two operations - debit (+) and credit (-).

1970-01-01 open Income:BeancountCorp
1970-01-01 open Assets:Cash
1970-01-01 open Expenses:Food
1970-01-01 open Assets:Receivables:Alice
1970-01-01 open Assets:Receivables:Bob
1970-01-01 open Assets:Receivables:Charlie
1970-01-01 open Liabilities:CreditCard

2019-05-31 * "BeancountCorp" "Salary of May 15th to May 31st"
  Income:BeancountCorp               -888 USD
  Assets:Cash                         888 USD

2019-07-12 * "Popeyes chicken sandwiches" "dinner with Alice, Bob, and Charlie"
  Expenses:Food 20 USD
  Assets:Receivables:Alice 20 USD
  Assets:Receivables:Bob 20 USD
  Assets:Receivables:Charlie 20 USD
  Liabilities:CreditCard -80 USD

As you can see in the two examples above, every transaction must fulfill the accounting equation.

Assets = Liabilities + Equity(aka Net Assets)

We used the Beancount syntax by Martin Blais and the web project Fava by Jakob Schnitzer to build this website. And it will alert you if any transaction has any legs not summing to zero.

Error Alert

Now you understand how we enforce the correctness of the ledger. But you may ask what are those “accounts”?

Accounts for money as buckets for water

Thinking your assets as water running in and out of different buckets and “accounts” are those buckets holding your money. With double-entry bookkeeping, it becomes obvious how money is flowing across different accounts, just like how water is flowing across different buckets. introduces five kinds of accounts.

  1. Income — Its amount is always negative or in debit. This is because you are making money, and then the money is debiting from “Income” account and crediting to your “Assets.”
  2. Expenses — Its amount is always positive or in credit. This is because you are spending money, and the money is flowing from the “Assets” or “Liabilities” to the “Expenses.”
  3. Liabilities — Its amount is positive or zero. Your credit card liabilities are a good example, which rises and falls in cycles.
  4. Assets — Its amount is positive or zero. Your cash or houses are always worthing some prices.
  5. Equity — Your net assets. The system will calculate automatically for you. Equity = Assets - Liabilities and it reflects how wealthy you are.

Now you can open your customized accounts with those keywords above:

1970-01-01 open Assets:Cash
1970-01-01 open Assets:Stock:Robinhood
1970-01-01 open Assets:Crypto:Coinbase
1970-01-01 open Expenses:Transportation:Taxi
1970-01-01 open Equity:OpeningBalance

Commodities: Tracking your investment

Yes, you can track your investment with For example, we buy 10 Bitcoins at the price of $100 in 2014:

2014-08-08 * "Buy 10 Bitcoin"
  Assets:Trade:Cash -1000.00 USD
  Assets:Trade:Positions  10 BTC {100.00 USD}

And then three years later, you sell them (originally with costs of $100 per unit annotated with {100.00 USD}) at the price of $10,000 per unit annotated with @ 10,000.00 USD.

2017-12-12 * "Sell 2 Bitcoin"
  Assets:Trade:Positions                   -2 BTC {100.00 USD} @ 10,000.00 USD
  Assets:Trade:Cash                          20,000.00 USD
  Income:Trade:PnL                          -19,800.00 USD

Or the same transaction with @@ 20,000.00 USD means that at the price of $20,000 in total.

2017-12-12 * "Sell 2 Bitcoin"
  Assets:Trade:Positions                   -2 BTC {100.00 USD} @@ 20,000.00 USD
  Assets:Trade:Cash                          20,000.00 USD
  Income:Trade:PnL                          -19,800.00 USD

The sum of all legs of the transaction, including -2 BTC {100.00 USD}, are still, as always, zero.

The costs {100.00 USD} tag is important because you might have bought the same commodity at different costs.

100 BTC {10.00 USD, 2012-08-08}
10 BTC {100.00 USD, 2014-08-08}

If you want to simplify the process, you can set up the account at the beginning with FIFO or LIFO. FIFO stands for first in, first out, while LIFO stands for last in, first out. In the US, IRS uses FIFO to calculate your PnL and tax accordingly.

1970-01-01 open Assets:Trade:Positions "FIFO"

And then when you sell it in shorthand like -2 BTC {}, beancount will apply FIFO strategy automatically and sell the oldest commodity. is such a cloud service for recording your financial transactions in text files, visualize them into financial statements (income statement, balance sheet, trial balance, etc.), and helps you live a better financial life. Sign up now - It’s in Promotional Period and Free!

Startup Engineering
© 2010-2018 Tian
Built with in San Francisco