🔭👽📡 Volker

Dataframe Storage Mini-Benchmark

27 Dec 2019

Quick benchmark on how to read/write/store your Pandas dataframe if you don’t want to read from CSV all the time. Conclusion:

  1. If file size matters, use Parquet
  2. If read speed matters, use Feather or HDF5/Static
  3. If both matter, use Parquet
  4. Write speeds don’t differ much, except HDF5/PyTables and CSV
  5. Avoid CSV

If you want to load only some columns, use Parquet or Feather. The way Pandas uses HDF5 cannot deal with this.


File Size Comparison

Read Speed Comparison

Write Speed Comparison