Pyodec Documentation

Welcome to Pyodec. This is the reference to the library of core resources and techniques within the package.

Getting Started

How to hit the ground running with pyodec.

Installation

Download a distribution, or clone a version of the repo from Github, and execute the setup script in the traditional fashion.

$ python setup.py install

Append --user to install the package only for the local user.

This is for Linux/Unix installations, and will make pyodec available for import in locally run python executables. To update, simply download a newer version of the code, (ie a new release) and execute the update script as well with the same command.

Pyodec may be added to pypi some day, but at this time it must be installed manually

Dependencies

The basic modules of pyodec currently require only Python 2.7 or greater (this library is not currently written in python 3), and numpy, along with the standard libraries that come with python 2.7, such as gzip. However, certain decoders may leverage other libraries, particularly certain types of binary file decoders. Those dependencies will be exposed when importing/executing the specific decoder module.

Future dependencies

Current dependencies may not reflect installation requirements in the future.

Decoding with Pyodec

Pyodec offers two ways to access the decode method for any specific file decoder. One is based on the core library, where you pass the name of the decoder to use, and the decode method is returned. Alternatively you can utilize the strucutre of the Pyodec package, and directly import the decoder module, to access its decode method.

Accessing the decoder

Pyodec is structured where decoders are stored in a module called files. To access a decoder, say vaisala_cl31_uu, you could directly access it with

import pyodec.files.vaisala_cl31_uu as decoder
>>> data = decoder.decode(src)

so the decode method is an available method within the named decoder module. This decoder, in return, is an instantiated member of the decoder class contained within the module, which inherits the core FileDecoder methods, along with some decoder metadata.

Every properly constructed decoder will have a decoder attribute, which is executable. There are several other variables within a decoder module, one of which being the decoder's class, but you will have to look at the docs for that specific decoder to work with those, as their names are not fixed.

Using simple decoder access

Another way to access a decoder is to import and call pydec.decode, which is a compressed function which will load and exectue the decoder you want, when passed as a string argument. Passing a decoder is required for this method. Here is an example of using it to decode src as above.

import pyodec
>>> pyodec.decode(src, decoder='vaisala_cl31_uu')

This elimintates the importing process, but it also makes some things harder, like getting metadata from the decoder. To do that, we recommend you directly import the decoder as above.

Running a decoder

There are two ways to decode, first, can decode as a procedural process, where an entire file or dataset is decoded, and the output is returned. Alternatively you can run the decoder as a generator, (assuming the returned data can be grouped in some fashion), where you can control how many values are returned per iteration.

Iterative decoding

Warning Pyodec currently only returns an iterator object.

Assuming your data file will produce groups of data (like such as a collection of obseravations), a decoder can run in a for loop, as a generator, returning a list of sets every N sets (N defaults to 1000). This is Pyodec's default behavior. For data files which are not broken into discrete observations, or in some way do not fit this data model, all the data will be returned at once, via the generator.

for data in pyodec.decode(src, 'decoderName', limit=1000):
    # do something with decoded collection of data points

#or

import pyodec.files.decoderName
for data in pyodec.files.decoderName.decode(src, limit=1000):
    # do an equivalent something wit the returned data

The limit argument is optional. When a data file is finished, and there are < N sets int the dataset, they are yielded, so no data are missed.

Prodecural decoding

As a wrapper around the generator function, you can pass the argument generator=False, which will then execute the generator internally, and return a single set of values.

data = pyodec.decode(src, 'decoderName', generator=False)

And that is how you use pyodec, should you have an existing decoder available for your data file (and how you use it once you have built a decoder for your file)

Building a decoder

Well, this isn't really a "getting started" scenario, unfortunately. Read further down in the docs for the guide on developing a decoder. Here!

Protocols and rules

Under Development

Though Pyodec is really a loosely connected set of independently written codes, it is important for the data model API to be compatible between different decoders. The goal of these rules will be to allow a single piece of code to take a diverse set of files (knowing their decoders), and both decode them and look at the data with the same code.

Basic python rules for internal consistency, variable and method naming, etc. should be followed.

This is python, so there are no private methods. Rewrite/reallocate any internal method as you please, but when doing so please try to follow these usage guidelines. If you find a better way to do the central decode method, then share it!

Input & client interfacing

Decoders should accept file path string, file object, and (where possible) string inputs to the decoder method as the first argument.

At a minimum, a decoder class should contain a decode method which can be run conforming to these input and interface protocols.

Decoders should produce a generator object, (usually using yield), and should allow the keyword argument generator control whether a generator or list is returned. generator should default to True.

Available `.decode()` keyword arguments

Argument	Type	Default	Purpose
`limit`	int	`1000`	Number of discrete data results to accumulate before yielding
`generator`	bool	`False`	Produce a generator based on `limit`, or a single set of values representing the entire dataset. Warning Not all decoders at this time implement this protocol, and simply return generators at all times.

These input requirements will evolve and expand over time (such as a requirement to handle certain keyword arguments properly)

Data output and returning

The possible range of outputs is far more diverse than the possible inputs, however there are a few goals that can be shared.

Discrete observations are collected in a list

Not the other way around. This is slightly less efficient with memory, but it is conceptually much simpler, and allows the second rule to actually work.

yield [(12,[4,4,4]), (12,[5,5,5]), ... ]

not

yield [[[12,12,...],[[4,4,4],[5,5,5],...]]

Note: This allows the returned data to be considered "records" as far as numpy is concerned, and can be easily converted into a very powerful recarray type simply with

data = np.rec.fromrecords(data, dtype=decoder.get_dtype())

Iterative and procedural output are the same structure

Simply, you interact with a returned list of values in the same manner, whether you receive them from a yield or a return.

Variable and data descriptions

This is the least-defined rule of the library. Many data files are self-describing in some manner, and it is essential to extract this metadata from files.

The current procedure for a decoder class object to reveal the descriptions of the variables is through three methods with the following functionality.

Method	Function
`.getvars()`	Return a list of dictionaries containing name, dtype and shape info, whose indices correspond to the index of the returned dataset (e.g. column names and descriptions).
`.get_fixed_vars()`	Return similar info to `.getvars()`, but includes the actual data elements as well, since fixed vars are not yielded with the rest of the data by default.
`.get_dtype()`	Return a valud Numpy-recarray dtype description, such that you could say `import pyodec.files.myDecoder as dec import numpy as np for data in dec.decode(src): # convert data into a super awesome numpy recarray data = np.rec.fromrecords(data, dtype=decoder.get_dtype())`

By default, the return values of these methods are defined by the VariableList and FixedVariableList class objects, but as always, the functionality can be overwritten in a decoder class when necessary.

As noted, these variable/metadata requirements are not set in stone, and will likely change through development with other users. Obviously backwards compatibility will become an issue pretty quickly, however.

Naming conventions

In accordance with python convention, classes will be in CamelCase, and everything else will be in lowercase with underscores. However, there is some inconsistency regarding naming of decoders, and decoder modules.

The current naming convention for the files that contain the decoders is to use lowercase and underscores where necessary.

This is up for debate.

Core objects and methods

When using the Pyodec decoders, you mostly only need to know the getting started guides, and the protocols of the library. However, if you want to write a decoder, it is important that you understand the tools that are made available to the developer, and how they are used.

The following documentation outlines the classes and methods that are shared among all of the original decoders. Many of the features presented here are to enable compliance with the protocols outlined above. Ideally they can help making the development of a decoder simpler as well.

These classes are located in the pyodec.core directory, and should be imported from there.

The Decoder Class

A file decoder should inherit the Decoder class. The main function of the decoder class is to facilitate variable (metadata) import, inheritance, and managment for a decoder instance.

Attributes

The Decoder class object contains a set of attributes which can be set at init, or altered during use to share additional data from a decoding process.

vars

Where the VariableList object for this decoder's data set is held. This can be set at runtime, or defined when writing a decoder.

fixed_vars

Like Decoder.vars, this attribute holds the local FixedVariableList object for the dataset.

inherit

Holds the interited Decoder descendant which can be used to populate vars and fixed_vars on __init__(). This attribute exists primarily to hard-code the inheritance class.

state

A persistent dictionary which can be modified by the decoder in order to inform the processing code about the state of the decoder. This is most valuable for decoders which may determine data which may affect how the data is processed once it is produced.

The default state value is

state = {'identifier':False,
         'index':False}

An example would be a decoder which produces data which are from different stations. You can use the state['identifier'] variable to indicate the station this data is from (even though station ID should be among the variables produced). This can be used as a trigger for decoders which make data which do not logically fit into a single table.

Because the tabular-typing and 'stations' example is unstandard, it will be important to define a single protocol for the usage of the Decoder.state attribute, so end users can have predictable results.

Additionally a decoder can define its own state dictionary keys, again for usage by a process running a decoder.

An example of this usage

station= "KSLC"
for data in decoder.decode(source, generator=True):
    if decoder.state['identifier'] is not False:
        # then there are multiple identifiers
        station = decoder.state['identifier']
    write_data_function(station, data)

Initialization

A Decoder object is initialized with one or two keyword arguments

Decoder(vars=False, inherit=False, fixed_vars={})

vars: A description of the contained variables (Ideally a VariableList object)
inherit: A different Decoder descendant (another FileDecoder or a MessageDecoder) from which variable information can be imported)
fixed_vars: The description and values of fixed variables within the dataset. Like vars this behaves best with a FixedVariableList object.

A Decoder object must be initialized with either vars or inherit defined, or else it will fail.

Caution there is no de-facto method of checking whether or not the number of variables indicated in the vars object is actually related to the data returned. If this is not the case, the most obvious place where it will break is in the end-user's code when they are attempting to utilize the data and metadata.

Methods

getvars(): Wrapper for the self.vars.getvars() method of the contained VariableList object. Returns a list of dictionaries corresponding to the variables produced by the Decoder inheritant.
get_fixed_vars(): Like getvars() this returns the self.fixed_vars.getvars() method resut. Returns a list of dictionaries corresponding to the decoder's fixed variables
get_dtype(): Produces a numpy-recarray-valid dtype string for the data variables (not fixed)

The FileDecoder Class

This is the primary descendant of the Decoder class, and is for the development of decoders which read in individual string or gzipped files. An individual decoder, if designed to handle files, should inherit the FileDecoder class.

Warning: due to a current compatibility issue, the file reading methods in the FileDecoder class will not read lines beginning with :: (two colons).

The initialization method is the same as for the Decoder class.

Class Methods

The FileDecoder class exposes a simple file opening utility that can be used both within an instance, and outside of it.

open_ascii( filepath ): Open either an ASCII-text file (not actually ASCII-restricted) and return the file handle. Also opens gzipped text files (with the extension .gz) using the gzip library.

"Public" Methods

These are the methods that can be used in creating a decoder. Following these methods are methods that must be defined in the process of creating a decoder.

Each of these methods takes a file handle, and runs through it, calling a defined string reading function on each segment of the data file, which is what produces the actual data returned.

decode( filepath, generator=False, limit=1000 , **kwargs)

The method the client/user uses to decode a file. This is a wrapper around decode_proc, which controls whether the output is a generator or not (and it standardizes keyword arguments).

filepath: A string or open file handler to be decoded (if already open, the file should be readable with the pointer located where you should start decoding)
generator: The boolean key instructing the code to use iterative yields, or to accumulate the returned data internally, and return once. Currently this defaults to False
limit: Equivalent to yieldcount elsewhere. How many data observations to accumulate before yielding.
**kwargs: Have the same meaning as elsewhere in Python. Passing additional keyword arguments will be passed directly to the decoder_proc method.

read_lines( yieldcount, filehandle )

Read a file, line by line, and call the self.on_line() method for every line it encounters. The function accumulates the data returned from self.on_line(), and at the end returns a list of the produced data. For every yieldcount number of observations returned (lines read with self.on_line() that do not return False) the entire list of datapoints is yielded, and the list resets.

read_chunks( yieldcount, filehandle [begin, [end] ] )

Read a file as chunks, identified by either a begin character, such as a BOM or an end character/string. You can also specify both. The self.on_chunk() method is called on each chunk read, and the data returned from self.on_chunk() is appended to a list, which is returned at the end.

Either begin or end must be specified.

Same as read_lines_gen, this requires the same arguments as read_chunks, plus a yieldcount which determines how many successfully decoded obs to store before yielding the data to an iterator.

"Open" Methods

To write a decoder, there are three methods of which two must be defined for the decoder. These methods have been mentioned previously, and should be written to accept a certain input and provide a certain output. This structure allows a developer to completely skip any constructs of the library which do not fit the requirements of their decoding process.

Warning You must specify the same function arguments as the null versions specified here, as these methods are all called indirectly. This is particularly important on the decode_proc method. Don't forget the **kwargs

on_line(self, string )

Accepts a string and produces a list of data values from that string. The list of values should be of the same format as the decoder.get_dtype() output. So if a decoder produces four columns of data, with two single values and two arrays, the output should look like:

[14,23,[1,3,4,8,4,3,3],[44,3,3,22,11,22,3,4215]]

If you read a file using the read_lines() method, then you MUST define the on_line method, as that is what converts the string into data

on_chunk(self, string )

Following above, this is a function you define which accepts a string which is the chunk of data between the delimiters provided, and produces a set of data from that chunk of string. The relationship between the output and the result of deoder.get_dtype() is the same as on_line() above.

If you read a file using the read_chunks() method, then you MUST define the on_chunk method, as that is what converts the string into data

decode_proc(self, filepath, limit, **kwargs )

This is the only mandatory function for you to define, and it is the function called when a user or client calls the .decode() method. This function must accept a file path as its first parameter, and developers are cautioned to limit other inputs required, as they are non-standard.

This is a generator, always

The decoder should open the file, using either the contained open_ascii method or something else, and then it should call either the self.read_lines or self.read_chunks with the apropriate arguments. Noting that these methods are generators, and decode_proc is a generator, the basic coding for this should be

def decode_proc(self, filepath, limit, **kwargs):
    filehandle = self.open_ascii(filepath)
    if filehandle is False: return False
    for data in self.read_chunks(limit, filehandle, begin=chr(2)):
        yield data

The reason the decode_proc method is as complex as this is because there are a number of different procedures that might have to be done when a file is opened.

If a decoder will only return one set of data for an entire file decoding procedure, then simply

yield [data]

at the end of your process, as opposed to returning anything.

The FileDecoder is demonstrated in the examples of creating a decoder at the end of this documentation.

MessageDecoder Class

An implementation of the Decoder class which is used simply for decoding strings of data, commonly used as individual messages. The primary purpose of the MessageDecoder class is to encapsulate the process of decoding an ecoded message, so it may be used by several decoders. The class offers the ability (requires it, actually) to share decoders as well as variables.

decode(self, string )

A developer-defined method which receives a string, and returns a list of values, using whatever means necessary.

Note This .decode() method is not a generator.

An example of this class is when you have an instrument that produces several iterations of a similar message format, where the core of the message is decoded the same. While you have to write 3 or 4 different decoders for the actual files, the bulk of the work can be encapsulated as a MessageDecoder. Likewise, each file produces the same variables and structure, so the MessageDecoder can have the variables defined, and they can simply be inherited by the FileDecoder.

Message decoders cannot be run in the same way as FileDecoder objects, because they do not return generators. The decode method only accepts the string for decoding. They can absolutely be used externally in the application though.

Variable Lists and Fixed Variable Lists

In the observation-based return schema described above, metadata would not necessarily be included as discrete sets of data. The current best-practice for handling metadata -- and a method built into Pyodec -- is to add variable names, and descriptions to the decoder itself.

Pyodec surfaces two class objects which can, and should, be used for logging and sharing the variables within the decoded file. These are

Class	Purpose
`VariableList`	The description of the values that are stored in each discrere observation.
`FixedVariableList`	Each variable which is a consant throughout the dataset, such as `height` in a profiler dataset.

These classes/objects contain the column/variable name, the ideal numpy dtype for that column, a shape vector, a long name, and a unit. Long name can be omitted, but really shouldn't. As it is currently designed, varaibles must be appended to the list in the order which they will be found in the dataset. FixedVariableList objects contain the data within them, such as the height values, or elevation angles, so order is unimportant.

Usually you will specify these values manually in the decoder code. But it is certainly possible to do it programatically, but the information can be hard to acquire. When you instantiate your decoder class, you can also inherit the VariableList and/or FixedVariableList from another class.

By using these two objects, and appending them to the decoder class on instantiation will provide the client user with several ways to access metadata.

Each returned decoder instance will have methods .getvars(), .get_fixed_vars(), and .get_dtype() in addition to the instance attributes .vars and .fixed_vars. They function as follows

Method	Function
`.getvars()`	Return a list of dictionaries containing name, dtype and shape info.
`.get_fixed_vars()`	Return similar info to `.getvars()`, but includes the actual data files as well.
`.get_dtype()`	Return a valud Numpy-recarray dtype description, such that you could say `import pyodec.files.myDecoder as dec import numpy as np for data in dec.decode(src): data0_in_numpy = np.array(data[0], dtype = dec.get_dtype())`
`.vars`	An attribute that contains the original `VariableList` object, and all its information/methods
`.fixed_vars`	An attribute that contains the original `FixedVariableList` object, and all its information/methods

VariableList Class

A container for the metadata about data columns returned by a decoder which are considered the data returned.

VariableList objects can be added with the + operator. This will result in the two sets of variables being stacked. Calling len() on a VariableList object will return the number of variables contained within.

Attributes

Though accessible, these lists are managed by the methods for both reading and writing. It is not required to be familiar with the internal organization of the contained data.

varnames: List of contained variable names (e.g. column names)
longnames: List of string long names for the variables contained
dtypes: List of strings or dtype objects corresponding to the data type of the returned column
shapes: List of sizes of the returned dataset, either in tuple form, or a single integer .
offsets: List of data compression offset, rarely used
scales: list of data compression scaling values
units: List of strings representing the physical unit of the variable.
mins: List of QC minimum reasonable value for a variable
maxs: List of QC maximum reasonable values for a variable

Methods

addvar(name, longname, dtype, shape, unit, [index=None, [scale=1, [offset=0, [mn=0, [mx=1] ] ] ] ])

Append a variable to the list of variables.

name: Variable Name (e.g. column name)
longname: More complete/clear string name of the variable.
dtype: The datatype the values should be stored as, a minimum (e.g. float32)
shape: Shape of the referenced dataset, either an integer or a tuple in the same manner as a numpyarray.shape
unit: The scientific unit of the returned variable
index: Optional - the index position of the variable. If missing, then the position will simply be following the previous (appended to the list)
scale: Data value scale (used for compression)
offset: Data value offset (also used for compression)
mn: Minimum acceptable value for this variable (a QC value)
mx: Maximum acceptable value for this variable (QC)

getvar( varname )

Grab the dictionary of stored values corresponding to a variable.

getvar_by_id(id)

Get the dictionary stored values corresponding to the variable at index id. The unnamed attribute id corresponds to the position in the internal list of variables.

getvars()

Return the dictionary of stored values for every stored value in a list of dictionaries.

dtype()

Return a numpy recarray-compliant dtype statement. The current name reflects that this was desgined for integration of the output datatypes into Pytables.

tables_desc()

Deprecated alias to VariableList().dtype()

FixedVariableList Class

A class similar to VariableList, except instead of shape the actual value of the fixed variable is held within, and thus passed to the decoder client. The index is not used here, as the values in a FixedVariableList only direcly relate to themselves.

Methods

addvar(name, unit, dtype, data)

Append a variable to the list of variables.

name: Variable Name (e.g. column name)
unit: The stored unit of the set of values
dtype: The datatype the values should be stored as, a minimum (e.g. float32)
data: The value of the fixed variable. For a height index, it would be the array of heights at which the non-fixed variables report.

getvars()

Return the dictionary of stored values for every stored value in a list of dictionaries.

Writing a decoder

The best we can do here is to give some nice examples of writing a decoder. Remember that pyodec is not really for decoding files that can just as easily be decoded with np.genfromtext or many of its other quick, easy methods for reading structured data. This is for the ugly files, files that are delightfully readable by humans but a true pain for computers.

A complete decoder example


from pyodec.core import FileDecoder, VariableList, FixedVariableList
import numpy as np
import os
import time
import gzip
from datetime import datetime as dt
from calendar import timegm

class ASCSdrD(FileDecoder):
    i=0
    select_keys=[]
    def on_line(self, line):
        self.i+=1
        l = line.split(',')
        if self.i == 1:
            # header line, learn great things!
            # we are going to learn exactly what place in each row, the variables are
            # this is assuming only 250 values per variable - this is in the order of the HDF
            l = np.array(l[2:], dtype='|S10')
            rng = np.arange(l.shape[0])
            self.select_keys = [
                rng[l=='sitename'],
                rng[(l == 'ws10') | (l == 'ws250')],
                rng[(l == 'wd10') | (l == 'wd250')],
                rng[(l == 'w10') | (l == 'w250')],
                rng[(l == 'iu10') | (l == 'iu250')],
                rng[(l == 'iv10') | (l == 'iv250')],
                rng[(l == 'iw10') | (l == 'iw250')],
                rng[(l == 'snru10') | (l == 'snru250')],
                rng[(l == 'snrv10') | (l == 'snrv250')],
                rng[(l == 'snrw10') | (l == 'snrw250')],
                rng[(l == 'sds10') | (l == 'sds250')],
                rng[(l == 'sdw10') | (l == 'sdw250')],
                rng[(l == 'gspd10') | (l == 'gspd250')],
                rng[l == 'tempc'],
                rng[l == 'batv'],
                rng[l == 'antstatus'],  # str
                rng[l == 'heater'],  # str
                rng[l == 'genon'],  # str
                rng[l == 'fuel'],  # str
                rng[l == 'rain'],
                rng[l == 'snow'],
                rng[l == 'rh'],
                rng[l == 'pressure'],
                [-1],  # I am assuming dewpoint will remain last ('dewpt')
            ]
            return False
        try:
            # get the date
            d = l[0] + l[1]
            t = time.mktime(time.strptime(d, '%m/%d/%Y%H:%M:%S'))
        except ValueError:
            return None
        # again. we are going to assume the dataset is not changing!
        data = l[2:]
        row = [t]
        for k in self.select_keys:
            if len(k) == 2:
                row.append(np.array(data[k[0]:k[1] + 1],dtype=np.float32)) 
            else:
                row.append(data[k[0]])
        # and produce this delightful set of datas!
        return row
    
    
    def decode_proc(self, filepath, limit=1000):
        # open the file
        if not os.path.exists(filepath):
            print "NO SUCH FILE"
            return 
        gzfh = gzip.open(filepath,'r')
        for d in self.read_lines_gen(limit, gzfh):
            #every 1000 obs, this should return somethin
            yield d
            
        gzfh.close()


# define the dtypes of the variables we produce here (in the correct order!)
V = VariableList()
V.addvar('DATTIM','seconds since 1970-01-01 00:00 UTC',int,1,'S')
V.addvar('STNAME','Station Name',str,20,'')
V.addvar('WSPD','Wind speed','float32',(25,),'m/s')
V.addvar('WDIR','Wind gust','float32',(25,),'deg')
V.addvar('WVERT','','float32',(25,),'m/s')
V.addvar('UINT','','float32',(25,),'mv')
V.addvar('VINT','','float32',(25,),'mv')
V.addvar('WINT','','float32',(25,),'mv')
V.addvar('SNRU','','float32',(25,),'')
V.addvar('SNRV','','float32',(25,),'')
V.addvar('SNRW','','float32',(25,),'')
V.addvar('SDS','','float32',(25,),'')
V.addvar('SDW','','float32',(25,),'')
V.addvar('GUST','','float32',(25,),'m/s')
V.addvar('TMPC','','float32',1,'C')
V.addvar('BATV','','float32',1,'V')
V.addvar('ANTSTAT','',str,5,'')
V.addvar('HEAT','',str,5,'')
V.addvar('GEN','',str,5,'')
V.addvar('FUEL','',str,5,'')
V.addvar('RAIN','','float32',1,'')
V.addvar('SNOW','','float32',1,'')
V.addvar('RH','','float32',1,'%')
V.addvar('PRES','','float32',1,'hPa')
V.addvar('DEWP','','float32',1,'C')

# define any fixed variables which would be beneficial to our cause
FV = FixedVariableList()
FV.addvar('HEIGHT','m AGL','int',np.arange(25)*10+10)

decoder = ASCSdrD(vars=V, fixed_vars=FV)

Decode using a separate message decoder

This separate decoder handles both the decoding of the message strings (chunks in this case) and the variable definitions. The message decoder code is below.


from pyodec.core import FileDecoder
from pyodec.messages.vaisalacl31 import decoder as msgdecode
import numpy as np
import os
import time
import gzip

"""
U of Utah CL-31 message type: epoch times stored *before* the message
"""

class uucl31D(FileDecoder):
    def on_chunk(self, message):
        # receive a chunk, so grab the time from the chunk, and then use the imported decoder to decode the rest of the message.
        ob = message.split(unichr(001))
        try:
            tmstr = ob[0].strip().split()[-1]
            otime = float(tmstr)
        except:
            # there was a formatting problem of sime kind, so return nothing to skip the row.
            return False
        try:
            data = msgdecode.decode(ob[1])
        except:
            # there was something ugly in this data... serial hiccups.
            data=False
        if data is False:
            return None
        output = (otime,data[0],data[1])
        return output
    
    def decode_proc(self, filepath, limit=1000):
        # open the file
        if not os.path.exists(filepath):
            print "NO SUCH FILE"
            return 
        filehandle = gzip.open(filepath,'r')
        for d in self.read_chunks(limit, filehandle, end=unichr(004)):
            yield d
        gzfh.close()

decoder = uucl31D(inherit=msgdecode)
# initialize a decoder variable (which can be imported) and inherit the variable from the imported msgdecode class (which is yet another instantiated MessageDecoder object).

And the associated message decoder

from pyodec.core import MessageDecoder, VariableList, FixedVariableList
import numpy as np


class cl31Dm2(MessageDecoder):
    def decode(self, message):
        OB_LENGTH = 770  # FIXME - the current return length is limited to 770
        SCALING_FACTOR = 1.0e9
        
        'break the full ob text into it\'s constituent parts'
        p1 = message.split(unichr(002))
        p2 = p1[1].split(unichr(003))
        code = p1[0].strip()
        ob = p2[0].strip()  # just contents between B and C
        # unused currently checksum = p2[1].strip()
    
        data = ob.split("\n")  # split into lines
    
        'the last line of the profile should be the data line'
        prof = data[-1].strip()
        'grab status lines'
        sl1 = data[0].strip()
        sl2 = data[-2].strip()  # I will skip any intermediate data lines...
    
        status = np.array([sl1[0].replace('/', '0'),
                        sl1[1].replace('A', '2').replace('W', '1')] +
                        sl1[2:-13].replace('/', '0').split() + sl2[:-14].split(),
                        dtype=np.float32)
        'status should have a length of 13... we shall see...'
        # determine height difference by reading the last digit of the code
        height_codes = [0, 10, 20, 5, 5]  # '0' is not a valid key, and will not happen
        data_lengths = [0, 770, 385, 1500, 770]
        'length between 770 and 1500'
        datLen = data_lengths[int(code[-1])]
        htMult = height_codes[int(code[-1])]
        values = np.zeros(datLen, dtype=np.float32)
        ky = 0
        for i in xrange(0, len(prof), 5):
    
            ven = prof[i:i + 5]
    
            values[ky] = twos_comp(int(ven, 16), 20)  # scaled to 100000sr/km (x1e9 sr/m)FYI
            ky += 1  # keep the key up to date
    
        # then the storage will be log10'd values
        values[values <= 0] = 1.
        out = (np.log10(values[:OB_LENGTH] / SCALING_FACTOR),status)
        return out

# I thanks Travc at stack overflow for this method of converting values
# See here: http://stackoverflow.com/questions/1604464/twos-complement-in-python
def twos_comp(val, bits):
    """compute the 2's compliment of int value val"""
    if((val & (1 << (bits - 1))) != 0):
        val = val - (1 << bits)
    return val


# set decoder parameters for this type of message.

vvars = VariableList()
vvars.addvar('DATTIM','Seconds since 1970-01-01 00:00:00 UTC',int,1,'S')
vvars.addvar('BS','Attenuated bacscatter coefficient','float32',(770,),'1/(m sr)')
vvars.addvar('STATUS','CL-31 Status values','float32',(13,),'Null')

# for now we are going to assume height is fixed, and return it as such
fvars = FixedVariableList()
fvars.addvar('HEIGHT','m AGL','int',np.arange(770)*10)

decoder = cl31Dm2(vars=vvars,fixed_vars=fvars)

Pyodec Documentation

Getting Started

Installation

Dependencies

Decoding with Pyodec

Accessing the decoder

Using simple decoder access

Running a decoder

Iterative decoding

Prodecural decoding

Building a decoder

Protocols and rules

Input & client interfacing

Available .decode() keyword arguments

Data output and returning

Discrete observations are collected in a list

Iterative and procedural output are the same structure

Variable and data descriptions

Naming conventions

Core objects and methods

The Decoder Class

Attributes

Initialization

Methods

The FileDecoder Class

Class Methods

"Public" Methods

"Open" Methods

MessageDecoder Class

Variable Lists and Fixed Variable Lists

VariableList Class

Attributes

Methods

FixedVariableList Class

Methods

Writing a decoder

A complete decoder example

Decode using a separate message decoder

And the associated message decoder

Available `.decode()` keyword arguments