It is a simple idea: Decoding raw data and log files from instruments and providers can be a pain. Why not consolidate efforts between data users, and only have to write a certain decoder once.

Created to enable consistent decoding of data log files from meteorological instruments, the Pyodec package is able to reduce duplication of efforts when trying to decode obscure, proprietary or otherwise unstructured data files. Using Pyodec, a researcher can both check if there is already a reliable tool for decoding a certain obscure data file, and if not, provide such a tool to the wider science community.

Pyodec will convert any known input data format into a numpy recarray, either representing the entire file, or representing a slice of the file, when run as a generator.

Pyodec can, and is, used for decoding structured and binary files as well. By creating a decoder, certain structured files can be even easier to read, simply by having a decoder which produces a sorted recarray from a CSV. Similarly, binary formats such as netCDF, which are very easy to read on their own, can be completely dumped to recarray objects, if that is deemed to be of utility. These decoders should use standard python structured file libraries, such as NumPy, scipy, pytables, etc.

Goals of Pyodec

We want Pyodec to be accurate, diverse, and fast, in that order.

This library is useless if it produces innaccurate or incomplete decodings of input files. Nobody can use it if it does not have a broad ability to decode complex file types. Finally, speed is important to users, but not nearly as accuracy and diversity.

Save time

I think we say this a few times here, but by not having to write and decode all the obscure data formats you work with, you can proceed to analysis much faster.

This is more than simply using decoders someone else made for you, it also means you can write decoders faster, and only write them once. Share them and improve them, and help your research community work faster.

Decode better

When you decode file after file, you will probably skip some data. When there are several of you writing a single decoder, there is time to get it right the first time.

It is incredible what a difference it makes to have every morsel of data contained within a data or log file decoded. It can be the difference between changing an array index in your code, and taking an entire day to rewrite your decoder to fetch an overlooked variable. It is not because we are lazy, but because we are the opposite: urgently working towards a goal, trying to do as little extra work as possible.

Simplify science

Just reiterating here, don't worry as much about decoding your data, worry about using it.

Community involvement & contriubtion

This library truly needs to be written by a community of contributors. No single researcher or developer can create decoders for file formats they have never had to deal with.

Reserachers should use Pyodec to develop and share their decoders. This package should make decoding files easier in two ways: Providing full black-box style file decoding, and providing you some additional tools for writing your own file decoders.

Developing the core library

Though the decoder library of Pyodec relies on community development, the core toolkits and classes are also in need of continuous improvement. The core library holds the resources researchers can use to simplify their decoders, and could always use more.

This website & documentation

Documentation is never given enough time, so, naturally it could benefit from community attention.

Thank you very much for taking this time to check out Pyodec.