🔭👽📡 Volker

Dataframe Storage Mini-Benchmark

27 Dec 2019

Quick benchmark on how to read/write/store your Pandas dataframe if you don’t want to read from CSV all the time. Conclusion:

  1. If file size matters, use Parquet
  2. If read speed matters, use Feather or HDF5/Static
  3. If both matter, use Parquet
  4. Write speeds don’t differ much, except HDF5/PyTables and CSV
  5. Avoid CSV

If you want to load only some columns, use Parquet or Feather. The way Pandas uses HDF5 cannot deal with this.

Figures:

File Size Comparison

Read Speed Comparison

Write Speed Comparison

Code: