It is a simple idea: Decoding raw data and log files from instruments and providers can be a pain. Why not consolidate efforts between data users, and only have to write a certain decoder once.
Created to enable consistent decoding of data log files from meteorological instruments, the Pyodec package is able to reduce duplication of efforts when trying to decode obscure, proprietary or otherwise unstructured data files. Using Pyodec, a researcher can both check if there is already a reliable tool for decoding a certain obscure data file, and if not, provide such a tool to the wider science community.
Pyodec can, and is, used for decoding structured and binary files as well. By creating a decoder, certain structured files can be even easier to read, simply by having a decoder which produces a sorted recarray from a CSV. Similarly, binary formats such as netCDF, which are very easy to read on their own, can be completely dumped to recarray objects, if that is deemed to be of utility. These decoders should use standard python structured file libraries, such as NumPy, scipy, pytables, etc.
We want Pyodec to be accurate, diverse, and fast, in that order.
This library is useless if it produces innaccurate or incomplete decodings of input files. Nobody can use it if it does not have a broad ability to decode complex file types. Finally, speed is important to users, but not nearly as accuracy and diversity.
I think we say this a few times here, but by not having to write and decode all the obscure data formats you work with, you can proceed to analysis much faster.
This is more than simply using decoders someone else made for you, it also means you can write decoders faster, and only write them once. Share them and improve them, and help your research community work faster.
When you decode file after file, you will probably skip some data. When there are several of you writing a single decoder, there is time to get it right the first time.
It is incredible what a difference it makes to have every morsel of data contained within a data or log file decoded. It can be the difference between changing an array index in your code, and taking an entire day to rewrite your decoder to fetch an overlooked variable. It is not because we are lazy, but because we are the opposite: urgently working towards a goal, trying to do as little extra work as possible.
Just reiterating here, don't worry as much about decoding your data, worry about using it.
This library truly needs to be written by a community of contributors. No single researcher or developer can create decoders for file formats they have never had to deal with.
Reserachers should use Pyodec to develop and share their decoders. This package should make decoding files easier in two ways: Providing full black-box style file decoding, and providing you some additional tools for writing your own file decoders.
Though the decoder library of Pyodec relies on community development, the core toolkits and classes are also in need of continuous improvement. The core library holds the resources researchers can use to simplify their decoders, and could always use more.
Documentation is never given enough time, so, naturally it could benefit from community attention.
Thank you very much for taking this time to check out Pyodec.