Fast and Parallel Webpage Layout (WWW '10)
The browsing of webpages is slow on smartphones for their limited CPU computational resources. The power wall forces hardware architects to apply increases in transistor counts towards improving parallel performance, not sequential performance. So the authors introduce the parallel mobile browser.
In the analysis, three core limitations of the rendering speed are:
CSS selector matching
Box and text layout
Overall Input and Output
- Input: an HTML tree of content, CSS style rules, font files.
- Output: absolute element positions.
# 3.1 Algo1: CSS Selector Matching
Rule matcher associates CSS rule set with HTML node tree.
- In general, selector language is an exact subset of regular expression.
- Disjunctions are split into separate selectors
- sequentially read rules and correspondingly build hash maps
- parallelly map nodes to different kinds of rules
- parallelly reduce several rules to each node
Optimizations from WebKit:
- Hashtables. [×] check CSS for every node [√] read once, build hashmap, and check hash
- Right-to-left matching.
- Redundant selector elimination.
- Hash Tiling. partition the hashtable. reduce cache misses.
- Tokenization. store attributes as int instead of string to save cache.
- Parallel document traversal.
- Random load balancing. If in sequence, neighboring nodes will cause load imbalance.
- Result pre-allocation.
- Delayed set insertion.
- Non-STL sets.preallocate a vector with a size of potential matches.
Overall Speedup = 60x: 204ms->3.5ms, 3s->50ms
# 3.2 Algo2:
- Input: HTML tree nodes with symbolic constraint attributes
- Output: layout actual details (size, shape, position)
Because CSS is confusing and informally-writtened, we create a new simple, concise, uniform, and intermediate language, Berkeley Style Sheets (BBS), which is transformed from CSS and will be specified with an attribute grammar (which shows potential for parallelization).
- Increase performance. decompse the tasks.
- Uniform a correct, concise specification.
- Prove it is at most linear in the size of HTML tree.
Two steps recursively for every node in the DOM tree
- calculate inherited attributes (top-down). Every level of childs in the tree enjoyes the parallelization.
- calculate synthesized attributes (node’s own attributes) (bottom-up). Every level of parents in the tree enjoys the parallelization.
2 is dependent on 1.
Speedup of box + text layout = 2-3x
Advanced layouts: floats
# 3.3 Algo3: Font Handling
- create a pool of necessary font library request
- group the requests
- make parallel calls to process each group
# 4. Conclusions
Address three bottlenecks of loading a page
- CSS selector matching
- Pre-built hash tables, map-reduce
- Box and text layout solving
- Specify layout as attribute grammars
- Glyph rendering
- Combine requests to groups and render in parallel
Milestone in building a parallel and mobile browser