Pyodec Documentation
Welcome to Pyodec. This is the reference to the library of core resources and techniques within the package.
Getting Started
How to hit the ground running with pyodec.
Installation
Download a distribution, or clone a version of the repo from Github, and execute the setup script in the traditional fashion.
$ python setup.py install
Append --user
to install the package only for the local user.
This is for Linux/Unix installations, and will make pyodec
available
for import in locally run python executables. To update, simply
download a newer version of the code, (ie a new release) and execute
the update script as well with the same command.
Pyodec may be added to pypi
some day, but at this time
it must be installed manually
Dependencies
The basic modules of pyodec currently require only Python 2.7 or greater
(this library is not currently written in python 3), and numpy
, along with the standard libraries that come with python 2.7, such as gzip
.
However, certain decoders may leverage other libraries, particularly
certain types of binary file decoders. Those dependencies will be
exposed when importing/executing the specific decoder module.
Current dependencies may not reflect installation requirements in the future.
Decoding with Pyodec
Pyodec offers two ways to access the decode
method
for any specific file decoder. One is based on the core
library, where you pass the name of the decoder to use, and the
decode
method is returned. Alternatively you can
utilize the strucutre of the Pyodec package, and directly import
the decoder module, to access its decode
method.
Accessing the decoder
Pyodec is structured where decoders are stored in a
module called files
. To access a decoder, say
vaisala_cl31_uu
, you could directly access it with
import pyodec.files.vaisala_cl31_uu as decoder
>>> data = decoder.decode(src)
so the decode
method is an available method within
the named decoder module. This decoder, in return, is an instantiated
member of the decoder class contained within the module, which inherits the
core FileDecoder
methods, along with some decoder
metadata.
Every properly constructed decoder will have a decoder
attribute, which
is executable. There are several other variables within a decoder
module, one of which being the decoder's class, but you will have to
look at the docs for that specific decoder to work with those, as their names
are not fixed.
Using simple decoder access
Another way to access a decoder is to import and call
pydec.decode
, which is a compressed function
which will load and exectue the decoder you want, when passed as a
string argument. Passing a decoder is required for this method.
Here is an example of using it to decode src
as above.
import pyodec
>>> pyodec.decode(src, decoder='vaisala_cl31_uu')
This elimintates the importing process, but it also makes some things harder, like getting metadata from the decoder. To do that, we recommend you directly import the decoder as above.
Running a decoder
There are two ways to decode, first, can decode as a procedural process, where an entire file or dataset is decoded, and the output is returned. Alternatively you can run the decoder as a generator, (assuming the returned data can be grouped in some fashion), where you can control how many values are returned per iteration.
Iterative decoding
Warning Pyodec currently only returns an iterator object.
Assuming your data file will produce groups of data (like such as a collection of obseravations), a decoder can run in a for loop, as a generator, returning a list of sets every N sets (N defaults to 1000). This is Pyodec's default behavior. For data files which are not broken into discrete observations, or in some way do not fit this data model, all the data will be returned at once, via the generator.
for data in pyodec.decode(src, 'decoderName', limit=1000):
# do something with decoded collection of data points
#or
import pyodec.files.decoderName
for data in pyodec.files.decoderName.decode(src, limit=1000):
# do an equivalent something wit the returned data
The limit
argument is optional.
When a data file is finished, and there are < N sets int the
dataset, they are yielded, so no data are missed.
Prodecural decoding
As a wrapper around the generator function, you can pass the
argument generator=False
, which will then execute
the generator internally, and return a single set of values.
data = pyodec.decode(src, 'decoderName', generator=False)
And that is how you use pyodec, should you have an existing decoder available for your data file (and how you use it once you have built a decoder for your file)
Building a decoder
Well, this isn't really a "getting started" scenario, unfortunately. Read further down in the docs for the guide on developing a decoder. Here!
Protocols and rules
Under DevelopmentThough Pyodec is really a loosely connected set of independently written codes, it is important for the data model API to be compatible between different decoders. The goal of these rules will be to allow a single piece of code to take a diverse set of files (knowing their decoders), and both decode them and look at the data with the same code.
Basic python rules for internal consistency, variable and method naming, etc. should be followed.
This is python, so there are no private methods. Rewrite/reallocate any internal method as you please, but when doing so please try to follow these usage guidelines. If you find a better way to do the central decode
method, then share it!
Input & client interfacing
Decoders should accept file path string, file object, and (where possible) string inputs to the decoder method as the first argument.
At a minimum, a decoder class should contain a decode
method which can be run conforming to these input and interface protocols.
Decoders should produce a generator object, (usually using yield
), and should allow the keyword argument generator
control whether a generator or list is returned. generator
should default to True
.
Available .decode()
keyword arguments
Argument | Type | Default | Purpose |
---|---|---|---|
limit |
int | 1000 |
Number of discrete data results to accumulate before yielding |
generator |
bool | False |
Produce a generator based on limit ,
or a single set of values representing the entire dataset.
Warning Not all decoders at this time implement this protocol, and simply return generators at all times. |
These input requirements will evolve and expand over time (such as a requirement to handle certain keyword arguments properly)
Data output and returning
The possible range of outputs is far more diverse than the possible inputs, however there are a few goals that can be shared.
Discrete observations are collected in a list
Not the other way around. This is slightly less efficient with memory, but it is conceptually much simpler, and allows the second rule to actually work.
yield [(12,[4,4,4]), (12,[5,5,5]), ... ]
not
yield [[[12,12,...],[[4,4,4],[5,5,5],...]]
data = np.rec.fromrecords(data, dtype=decoder.get_dtype())
Iterative and procedural output are the same structure
Simply, you interact with a returned list of values in the same
manner, whether you receive them from a yield
or a
return
.
Variable and data descriptions
This is the least-defined rule of the library. Many data files are self-describing in some manner, and it is essential to extract this metadata from files.
The current procedure for a decoder class object to reveal the descriptions of the variables is through three methods with the following functionality.
Method | Function |
---|---|
.getvars() |
Return a list of dictionaries containing name, dtype and shape info, whose indices correspond to the index of the returned dataset (e.g. column names and descriptions). |
.get_fixed_vars() |
Return similar info to .getvars() , but includes the
actual data elements as well, since fixed vars are not
yielded with the rest of the data by default.
|
.get_dtype() |
Return a valud Numpy-recarray dtype description, such that you could say
|
By default, the return values of these methods are defined by the
VariableList
and FixedVariableList
class
objects, but as always, the functionality can be overwritten in a decoder
class when necessary.
As noted, these variable/metadata requirements are not set in stone, and will likely change through development with other users. Obviously backwards compatibility will become an issue pretty quickly, however.
Naming conventions
In accordance with python convention, classes will be in CamelCase, and everything else will be in lowercase with underscores. However, there is some inconsistency regarding naming of decoders, and decoder modules.
The current naming convention for the files that contain the decoders is to use lowercase and underscores where necessary.
This is up for debate.
Core objects and methods
When using the Pyodec decoders, you mostly only need to know the getting started guides, and the protocols of the library. However, if you want to write a decoder, it is important that you understand the tools that are made available to the developer, and how they are used.
The following documentation outlines the classes and methods that are shared among all of the original decoders. Many of the features presented here are to enable compliance with the protocols outlined above. Ideally they can help making the development of a decoder simpler as well.
These classes are located in the pyodec.core
directory, and
should be imported from there.
The Decoder Class
A file decoder should inherit the Decoder
class. The main function of the decoder class is to facilitate variable (metadata) import, inheritance, and managment for a decoder instance.
Attributes
The Decoder
class object contains a set of attributes which can be set at init, or altered during use to share additional data from a decoding process.
- vars
- Where the
VariableList
object for this decoder's data set is held. This can be set at runtime, or defined when writing a decoder. - fixed_vars
- Like
Decoder.vars
, this attribute holds the localFixedVariableList
object for the dataset. - inherit
- Holds the interited
Decoder
descendant which can be used to populatevars
andfixed_vars
on__init__()
. This attribute exists primarily to hard-code the inheritance class. - state
-
A persistent dictionary which can be modified by the decoder in order to inform the processing code about the state of the decoder. This is most valuable for decoders which may determine data which may affect how the data is processed once it is produced.
The default
state
value isstate = {'identifier':False, 'index':False}
An example would be a decoder which produces data which are from different stations. You can use the
state['identifier']
variable to indicate the station this data is from (even though station ID should be among the variables produced). This can be used as a trigger for decoders which make data which do not logically fit into a single table.Because the tabular-typing and 'stations' example is unstandard, it will be important to define a single protocol for the usage of the
Decoder.state
attribute, so end users can have predictable results.Additionally a decoder can define its own state dictionary keys, again for usage by a process running a decoder.
An example of this usage
station= "KSLC" for data in decoder.decode(source, generator=True): if decoder.state['identifier'] is not False: # then there are multiple identifiers station = decoder.state['identifier'] write_data_function(station, data)
Initialization
A Decoder
object is initialized with one or two keyword
arguments
Decoder(vars=False, inherit=False, fixed_vars={})
- vars
- A description of the contained variables (Ideally a
VariableList
object) - inherit
-
A different
Decoder
descendant (anotherFileDecoder
or aMessageDecoder
) from which variable information can be imported) - fixed_vars
- The description and values of fixed variables within the dataset. Like
vars
this behaves best with aFixedVariableList
object.
A Decoder
object must be initialized with either vars
or inherit
defined, or else it will fail.
vars
object is actually related to the data returned. If this is not the case, the most obvious place where it will break is in the end-user's code when they are attempting to utilize the data and metadata.Methods
getvars()
- Wrapper for the
self.vars.getvars()
method of the containedVariableList
object. Returns a list of dictionaries corresponding to the variables produced by theDecoder
inheritant. get_fixed_vars()
- Like
getvars()
this returns theself.fixed_vars.getvars()
method resut. Returns a list of dictionaries corresponding to the decoder's fixed variables get_dtype()
- Produces a numpy-recarray-valid dtype string for the data variables (not fixed)
The FileDecoder Class
This is the primary descendant of the Decoder
class, and is for the development of decoders which read in individual string or gzipped files. An individual decoder, if designed to handle files, should inherit the FileDecoder
class.
FileDecoder
class will not read lines beginning with ::
(two colons).
The initialization method is the same as for the Decoder
class.
Class Methods
The FileDecoder
class exposes a simple file opening utility that can be used both within an instance, and outside of it.
open_ascii( filepath )
- Open either an ASCII-text file (not actually ASCII-restricted) and return the file handle. Also opens gzipped text files (with the extension
.gz
) using thegzip
library.
"Public" Methods
These are the methods that can be used in creating a decoder. Following these methods are methods that must be defined in the process of creating a decoder.
Each of these methods takes a file handle, and runs through it, calling a defined string reading function on each segment of the data file, which is what produces the actual data returned.
decode( filepath, generator=False, limit=1000 , **kwargs)
-
The method the client/user uses to decode a file. This is a wrapper around
decode_proc
, which controls whether the output is a generator or not (and it standardizes keyword arguments).- filepath
- A string or open file handler to be decoded (if already open, the file should be readable with the pointer located where you should start decoding)
- generator
- The boolean key instructing the code to use iterative yields, or to accumulate the returned data internally, and return once. Currently this defaults to
False
- limit
- Equivalent to
yieldcount
elsewhere. How many data observations to accumulate before yielding. - **kwargs
- Have the same meaning as elsewhere in Python. Passing additional keyword arguments will be passed directly to the
decoder_proc
method.
read_lines( yieldcount, filehandle )
- Read a file, line by line, and call the
self.on_line()
method for every line it encounters. The function accumulates the data returned fromself.on_line()
, and at the end returns a list of the produced data. For everyyieldcount
number of observations returned (lines read withself.on_line()
that do not returnFalse
) the entire list of datapoints isyield
ed, and the list resets. read_chunks( yieldcount, filehandle [begin, [end] ] )
-
Read a file as chunks, identified by either a
begin
character, such as aBOM
or anend
character/string. You can also specify both. Theself.on_chunk()
method is called on each chunk read, and the data returned fromself.on_chunk()
is appended to a list, which is returned at the end.Either
Same asbegin
orend
must be specified.read_lines_gen
, this requires the same arguments asread_chunks
, plus ayieldcount
which determines how many successfully decoded obs to store beforeyield
ing the data to an iterator.
"Open" Methods
To write a decoder, there are three methods of which two must be defined for the decoder. These methods have been mentioned previously, and should be written to accept a certain input and provide a certain output. This structure allows a developer to completely skip any constructs of the library which do not fit the requirements of their decoding process.
decode_proc
method. Don't forget the **kwargs
on_line(self, string )
-
Accepts a string and produces a list of data values from that string. The list of values should be of the same format as the
decoder.get_dtype()
output. So if a decoder produces four columns of data, with two single values and two arrays, the output should look like:[14,23,[1,3,4,8,4,3,3],[44,3,3,22,11,22,3,4215]]
If you read a file using the
read_lines()
method, then you MUST define theon_line
method, as that is what converts the string into data on_chunk(self, string )
-
Following above, this is a function you define which accepts a string which is the chunk of data between the delimiters provided, and produces a set of data from that chunk of string. The relationship between the output and the result of
deoder.get_dtype()
is the same ason_line()
above.If you read a file using the
read_chunks()
method, then you MUST define theon_chunk
method, as that is what converts the string into data decode_proc(self, filepath, limit, **kwargs )
-
This is the only mandatory function for you to define, and it is the function called when a user or client calls the
.decode()
method. This function must accept a file path as its first parameter, and developers are cautioned to limit other inputs required, as they are non-standard.This is a generator, always
The decoder should open the file, using either the contained
open_ascii
method or something else, and then it should call either theself.read_lines
orself.read_chunks
with the apropriate arguments. Noting that these methods are generators, anddecode_proc
is a generator, the basic coding for this should bedef decode_proc(self, filepath, limit, **kwargs): filehandle = self.open_ascii(filepath) if filehandle is False: return False for data in self.read_chunks(limit, filehandle, begin=chr(2)): yield data
The reason the
decode_proc
method is as complex as this is because there are a number of different procedures that might have to be done when a file is opened.If a decoder will only return one set of data for an entire file decoding procedure, then simply
at the end of your process, as opposed toyield [data]
return
ing anything.
The FileDecoder
is demonstrated in the examples of creating a decoder at the end of this documentation.
MessageDecoder Class
An implementation of the Decoder
class which is used simply for decoding strings of data, commonly used as individual messages. The primary purpose of the MessageDecoder
class is to encapsulate the process of decoding an ecoded message, so it may be used by several decoders. The class offers the ability (requires it, actually) to share decoders as well as variables.
decode(self, string )
-
A developer-defined method which receives a string, and returns a list of values, using whatever means necessary.
Note This
.decode()
method is not a generator.
An example of this class is when you have an instrument that produces several iterations of a similar message format, where the core of the message is decoded the same. While you have to write 3 or 4 different decoders for the actual files, the bulk of the work can be encapsulated as a MessageDecoder
. Likewise, each file produces the same variables and structure, so the MessageDecoder
can have the variables defined, and they can simply be inherited by the FileDecoder
.
Message decoders cannot be run in the same way as FileDecoder
objects, because they do not return generators. The decode
method only accepts the string for decoding. They can absolutely be used externally in the application though.
Variable Lists and Fixed Variable Lists
In the observation-based return schema described above, metadata would not necessarily be included as discrete sets of data. The current best-practice for handling metadata -- and a method built into Pyodec -- is to add variable names, and descriptions to the decoder itself.
Pyodec surfaces two class objects which can, and should, be used for logging and sharing the variables within the decoded file. These are
Class | Purpose |
---|---|
VariableList |
The description of the values that are stored in each discrere observation. |
FixedVariableList |
Each variable which is a consant throughout the dataset,
such as height in a profiler dataset.
|
These classes/objects contain the column/variable name, the
ideal numpy dtype for that column, a shape vector, a long name,
and a unit. Long name can be omitted, but really shouldn't.
As it is currently designed, varaibles must be appended to the
list in the order which they will be found in the dataset.
FixedVariableList
objects contain the data within them,
such as the height values, or elevation angles, so order is
unimportant.
Usually you will specify these values manually in the decoder
code. But it is certainly possible to do it programatically,
but the information can be hard to acquire. When you instantiate
your decoder class, you can also inherit the VariableList
and/or FixedVariableList
from another class.
By using these two objects, and appending them to the decoder class on instantiation will provide the client user with several ways to access metadata.
Each returned decoder instance will have methods .getvars()
,
.get_fixed_vars()
, and .get_dtype()
in addition
to the instance attributes .vars
and .fixed_vars
.
They function as follows
Method | Function |
---|---|
.getvars() |
Return a list of dictionaries containing name, dtype and shape info. |
.get_fixed_vars() |
Return similar info to .getvars() , but includes the
actual data files as well.
|
.get_dtype() |
Return a valud Numpy-recarray dtype description, such that you could say
|
.vars |
An attribute that contains the original VariableList object, and all its information/methods |
.fixed_vars |
An attribute that contains the original FixedVariableList object, and all its information/methods |
VariableList Class
A container for the metadata about data columns returned by a decoder which are considered the data returned.
VariableList
objects can be added with the +
operator. This will result in the two sets of variables being stacked. Calling len()
on a VariableList
object will return the number of variables contained within.
Attributes
Though accessible, these lists are managed by the methods for both reading and writing. It is not required to be familiar with the internal organization of the contained data.
- varnames
- List of contained variable names (e.g. column names)
- longnames
- List of string long names for the variables contained
- dtypes
- List of strings or dtype objects corresponding to the data type of the returned column
- shapes
- List of sizes of the returned dataset, either in tuple form, or a single integer .
- offsets
- List of data compression offset, rarely used
- scales
- list of data compression scaling values
- units
- List of strings representing the physical unit of the variable.
- mins
- List of QC minimum reasonable value for a variable
- maxs
- List of QC maximum reasonable values for a variable
Methods
addvar(name, longname, dtype, shape, unit, [index=None, [scale=1, [offset=0, [mn=0, [mx=1] ] ] ] ])
-
Append a variable to the list of variables.
- name
- Variable Name (e.g. column name)
- longname
- More complete/clear string name of the variable.
- dtype
- The datatype the values should be stored as, a minimum (e.g.
float32
) - shape
- Shape of the referenced dataset, either an integer or a tuple in the same manner as a numpyarray.shape
- unit
- The scientific unit of the returned variable
- index
- Optional - the index position of the variable. If missing, then the position will simply be following the previous (appended to the list)
- scale
- Data value scale (used for compression)
- offset
- Data value offset (also used for compression)
- mn
- Minimum acceptable value for this variable (a QC value)
- mx
- Maximum acceptable value for this variable (QC)
getvar( varname )
- Grab the dictionary of stored values corresponding to a variable.
getvar_by_id(id)
-
Get the dictionary stored values corresponding to the variable
at index
id
. The unnamed attributeid
corresponds to the position in the internal list of variables. getvars()
- Return the dictionary of stored values for every stored value in a list of dictionaries.
dtype()
- Return a numpy recarray-compliant dtype statement. The current name reflects that this was desgined for integration of the output datatypes into Pytables.
tables_desc()
- Deprecated alias to
VariableList().dtype()
addvar(name, unit, dtype, data)
-
Append a variable to the list of variables.
- name
- Variable Name (e.g. column name)
- unit
- The stored unit of the set of values
- dtype
- The datatype the values should be stored as, a minimum (e.g.
float32
) - data
- The value of the fixed variable. For a height index, it would be the array of heights at which the non-fixed variables report.
getvars()
- Return the dictionary of stored values for every stored value in a list of dictionaries.
FixedVariableList Class
A class similar to VariableList
, except instead of shape
the actual value of the fixed variable is held within, and thus
passed to the decoder client. The index
is not used here, as the values in a FixedVariableList
only direcly relate to themselves.
Methods
Writing a decoder
The best we can do here is to give some nice examples of writing a decoder. Remember that pyodec is not really for decoding files that can just as easily be decoded with np.genfromtext
or many of its other quick, easy methods for reading structured data. This is for the ugly files, files that are delightfully readable by humans but a true pain for computers.
A complete decoder example
from pyodec.core import FileDecoder, VariableList, FixedVariableList
import numpy as np
import os
import time
import gzip
from datetime import datetime as dt
from calendar import timegm
class ASCSdrD(FileDecoder):
i=0
select_keys=[]
def on_line(self, line):
self.i+=1
l = line.split(',')
if self.i == 1:
# header line, learn great things!
# we are going to learn exactly what place in each row, the variables are
# this is assuming only 250 values per variable - this is in the order of the HDF
l = np.array(l[2:], dtype='|S10')
rng = np.arange(l.shape[0])
self.select_keys = [
rng[l=='sitename'],
rng[(l == 'ws10') | (l == 'ws250')],
rng[(l == 'wd10') | (l == 'wd250')],
rng[(l == 'w10') | (l == 'w250')],
rng[(l == 'iu10') | (l == 'iu250')],
rng[(l == 'iv10') | (l == 'iv250')],
rng[(l == 'iw10') | (l == 'iw250')],
rng[(l == 'snru10') | (l == 'snru250')],
rng[(l == 'snrv10') | (l == 'snrv250')],
rng[(l == 'snrw10') | (l == 'snrw250')],
rng[(l == 'sds10') | (l == 'sds250')],
rng[(l == 'sdw10') | (l == 'sdw250')],
rng[(l == 'gspd10') | (l == 'gspd250')],
rng[l == 'tempc'],
rng[l == 'batv'],
rng[l == 'antstatus'], # str
rng[l == 'heater'], # str
rng[l == 'genon'], # str
rng[l == 'fuel'], # str
rng[l == 'rain'],
rng[l == 'snow'],
rng[l == 'rh'],
rng[l == 'pressure'],
[-1], # I am assuming dewpoint will remain last ('dewpt')
]
return False
try:
# get the date
d = l[0] + l[1]
t = time.mktime(time.strptime(d, '%m/%d/%Y%H:%M:%S'))
except ValueError:
return None
# again. we are going to assume the dataset is not changing!
data = l[2:]
row = [t]
for k in self.select_keys:
if len(k) == 2:
row.append(np.array(data[k[0]:k[1] + 1],dtype=np.float32))
else:
row.append(data[k[0]])
# and produce this delightful set of datas!
return row
def decode_proc(self, filepath, limit=1000):
# open the file
if not os.path.exists(filepath):
print "NO SUCH FILE"
return
gzfh = gzip.open(filepath,'r')
for d in self.read_lines_gen(limit, gzfh):
#every 1000 obs, this should return somethin
yield d
gzfh.close()
# define the dtypes of the variables we produce here (in the correct order!)
V = VariableList()
V.addvar('DATTIM','seconds since 1970-01-01 00:00 UTC',int,1,'S')
V.addvar('STNAME','Station Name',str,20,'')
V.addvar('WSPD','Wind speed','float32',(25,),'m/s')
V.addvar('WDIR','Wind gust','float32',(25,),'deg')
V.addvar('WVERT','','float32',(25,),'m/s')
V.addvar('UINT','','float32',(25,),'mv')
V.addvar('VINT','','float32',(25,),'mv')
V.addvar('WINT','','float32',(25,),'mv')
V.addvar('SNRU','','float32',(25,),'')
V.addvar('SNRV','','float32',(25,),'')
V.addvar('SNRW','','float32',(25,),'')
V.addvar('SDS','','float32',(25,),'')
V.addvar('SDW','','float32',(25,),'')
V.addvar('GUST','','float32',(25,),'m/s')
V.addvar('TMPC','','float32',1,'C')
V.addvar('BATV','','float32',1,'V')
V.addvar('ANTSTAT','',str,5,'')
V.addvar('HEAT','',str,5,'')
V.addvar('GEN','',str,5,'')
V.addvar('FUEL','',str,5,'')
V.addvar('RAIN','','float32',1,'')
V.addvar('SNOW','','float32',1,'')
V.addvar('RH','','float32',1,'%')
V.addvar('PRES','','float32',1,'hPa')
V.addvar('DEWP','','float32',1,'C')
# define any fixed variables which would be beneficial to our cause
FV = FixedVariableList()
FV.addvar('HEIGHT','m AGL','int',np.arange(25)*10+10)
decoder = ASCSdrD(vars=V, fixed_vars=FV)
Decode using a separate message decoder
This separate decoder handles both the decoding of the message strings (chunks in this case) and the variable definitions. The message decoder code is below.
from pyodec.core import FileDecoder
from pyodec.messages.vaisalacl31 import decoder as msgdecode
import numpy as np
import os
import time
import gzip
"""
U of Utah CL-31 message type: epoch times stored *before* the message
"""
class uucl31D(FileDecoder):
def on_chunk(self, message):
# receive a chunk, so grab the time from the chunk, and then use the imported decoder to decode the rest of the message.
ob = message.split(unichr(001))
try:
tmstr = ob[0].strip().split()[-1]
otime = float(tmstr)
except:
# there was a formatting problem of sime kind, so return nothing to skip the row.
return False
try:
data = msgdecode.decode(ob[1])
except:
# there was something ugly in this data... serial hiccups.
data=False
if data is False:
return None
output = (otime,data[0],data[1])
return output
def decode_proc(self, filepath, limit=1000):
# open the file
if not os.path.exists(filepath):
print "NO SUCH FILE"
return
filehandle = gzip.open(filepath,'r')
for d in self.read_chunks(limit, filehandle, end=unichr(004)):
yield d
gzfh.close()
decoder = uucl31D(inherit=msgdecode)
# initialize a decoder variable (which can be imported) and inherit the variable from the imported msgdecode class (which is yet another instantiated MessageDecoder object).
And the associated message decoder
from pyodec.core import MessageDecoder, VariableList, FixedVariableList
import numpy as np
class cl31Dm2(MessageDecoder):
def decode(self, message):
OB_LENGTH = 770 # FIXME - the current return length is limited to 770
SCALING_FACTOR = 1.0e9
'break the full ob text into it\'s constituent parts'
p1 = message.split(unichr(002))
p2 = p1[1].split(unichr(003))
code = p1[0].strip()
ob = p2[0].strip() # just contents between B and C
# unused currently checksum = p2[1].strip()
data = ob.split("\n") # split into lines
'the last line of the profile should be the data line'
prof = data[-1].strip()
'grab status lines'
sl1 = data[0].strip()
sl2 = data[-2].strip() # I will skip any intermediate data lines...
status = np.array([sl1[0].replace('/', '0'),
sl1[1].replace('A', '2').replace('W', '1')] +
sl1[2:-13].replace('/', '0').split() + sl2[:-14].split(),
dtype=np.float32)
'status should have a length of 13... we shall see...'
# determine height difference by reading the last digit of the code
height_codes = [0, 10, 20, 5, 5] # '0' is not a valid key, and will not happen
data_lengths = [0, 770, 385, 1500, 770]
'length between 770 and 1500'
datLen = data_lengths[int(code[-1])]
htMult = height_codes[int(code[-1])]
values = np.zeros(datLen, dtype=np.float32)
ky = 0
for i in xrange(0, len(prof), 5):
ven = prof[i:i + 5]
values[ky] = twos_comp(int(ven, 16), 20) # scaled to 100000sr/km (x1e9 sr/m)FYI
ky += 1 # keep the key up to date
# then the storage will be log10'd values
values[values <= 0] = 1.
out = (np.log10(values[:OB_LENGTH] / SCALING_FACTOR),status)
return out
# I thanks Travc at stack overflow for this method of converting values
# See here: http://stackoverflow.com/questions/1604464/twos-complement-in-python
def twos_comp(val, bits):
"""compute the 2's compliment of int value val"""
if((val & (1 << (bits - 1))) != 0):
val = val - (1 << bits)
return val
# set decoder parameters for this type of message.
vvars = VariableList()
vvars.addvar('DATTIM','Seconds since 1970-01-01 00:00:00 UTC',int,1,'S')
vvars.addvar('BS','Attenuated bacscatter coefficient','float32',(770,),'1/(m sr)')
vvars.addvar('STATUS','CL-31 Status values','float32',(13,),'Null')
# for now we are going to assume height is fixed, and return it as such
fvars = FixedVariableList()
fvars.addvar('HEIGHT','m AGL','int',np.arange(770)*10)
decoder = cl31Dm2(vars=vvars,fixed_vars=fvars)