Quick benchmark on how to read/write/store your Pandas dataframe if you don’t want to read from CSV all the time. Conclusion:
- If file size matters, use Parquet
- If read speed matters, use Feather or HDF5/Static
- If both matter, use Parquet
- Write speeds don’t differ much, except HDF5/PyTables and CSV
- Avoid CSV
If you want to load only some columns, use Parquet or Feather. The way Pandas uses HDF5 cannot deal with this.