Dremel: Interactive Analysis of Web-Scale Datasets

A graph from dremel paper explaining the difference between record-oriented storage and column-oriented storage.

Algorithms

The striping and assembly algorithms from the Dremel paper

  • Picture serializing a record as (depth-first) traversing a tree.
    • When a leaf is reached, write out the value for the corresponding column, with the maximum definition level for this column (meaning it is defined) and the current repetition level.
    • When a field is not defined, the definition level will be smaller than the definition level of this field.

Only repeated fields increment the repetition level, only non-required fields increment the definition level.