GetData-0.9.0 fixes bugs found in the 0.8-series. It also adds PHP
bindings, a FLAC encoding, write support for bzip2 and xz-compressed
datafiles. GetData-0.9.0 has been tested on various platforms including
Linux, MaxOS X, OpenBSD, Cygwin, and native Microsoft Windows.
Full release notes are given below; the following changes to the build
system should be noted especially:
* the legacy (pre-0.3) API is no longer built by default. Pass
--enable-legacy-api to the configure script to enable it.
* building support for the FLAC encoding reqires Xiph.Org's libFLAC.
* the Python bindings now require NumPy; previously, it was optional
* building the PHP bindings requires the PHP command-line interpreter to
be installed at configure time
---------------------------------------------------------------------------
Four packages are available:
* getdata-0.9.0.tar.bz2/.gz: the full source code to the library, with
bindings. This package uses the GNU autotools build system, and is
designed for POSIX systems (UNIX, Linux, BSD, MacOS X, Cygwin, MSys,
&c.)
* getdata_win-0.9.0.zip: a reduced source code package, with the CMake
build system designed to be built on Microsoft Windows, either using
the free MinGW compiler, or else Microsoft's Visual C++ compiler.
(The full source package above can also be built using MinGW, if the
MSys shell is used to run the build system.) Currently, the only
bindings provided by this package are the C++ bindings, and the
package lacks support for compressed dirfiles, the Legacy API, and a
few other features. This build is used in native Microsoft Windows
builds of kst2.
* idl_getdata-0.9.0.tar.bz2/.gz: the Interactive Data Language (IDL)
bindings, packaged separately with an autotools build system, designed
to be built against an already installed version of GetData. Due to
licensing restrictions, pre-built packages rarely come with these
bindings, and this package allows end-users to add support for IDL
without having to recompile the whole GetData package.
* matlab_getdata-0.9.0.tar.bz2/.gz: the MATLAB bindings, packaged
separately with an autotools build system, designed to be built against
an already installed version of GetData. Due to licensing
restrictions, pre-built packages rarely come with these bindings, and
this package allows end- users to add support for MATLAB without having
to recompile the whole GetData package.
---------------------------------------------------------------------------
New in verison 0.9.0:
Library Changes:
* Literals in format metadata may now have complex form (i.e. include a
semicolon) when the parameter is purely real. However, a non-zero
imaginary part is still an error.
* gd_free_entry_strings() now NULLs pointers after freeing them.
* gd_entry() now returns entry metadata when they contain scalar field
codes which do not exist. In this case the GD_EN_CALC flag in the
object will not be set. Previously, on such entries, this function
would fail with the error GD_E_BAD_SCALAR, and return nothing.
* gd_rename() now by default updates the target of ALIASes pointing to a
renamed field to point to the new field instead of leaving them dangle.
(But see GD_REN_DANGLE in the API section below).
* CARRAYs are no longer truncated to GD_MAX_CARRAY_LENGTH elements.
Flushing metadata to disk will now fail if writing a CARRAY would
overflow a format file line. (It's platform specific, but format file
lines are typically permitted to be at least 2**31 bytes long, so such
an error usually indicates something pathological happening.) The
GD_MAX_CARRAY_LENGTH symbol has been removed from the GetData header
file.
* Write support for bzip2-encoded and lzma-encoded data has been added.
LZMA write support is only available for .xz files, not the obsolete
.lzma format. The write support occurs out-of-place, just like how
writing gzip-encoded data works. See the gzip discussion in the 0.8.0
section below for important notes.
* A new encoding scheme using the Free Lossless Audio Codec (FLAC) to
compress data has been implemented. For some datasets, it provides a
good trade-off between speed and compression. Like gzip, bzip2, and
lzma, is also uses out-of-place writes (see previous point).
* A newly-created dirfile is now always opened in read-write mode, ignor-
ing the access mode specified in the call. Previously, specifying both
GD_RDONLY and GD_CREAT in open calls would result in an access mode
(GD_E_ACCMODE) error if the dirfile didn't already exist.
* Many functions which used to silently ignore representation suffixes in
field codes passed to them no longer do that. Most of these will
report an error (GD_E_BAD_CODE) if passed a representation suffix. The
affected functions are: gd_bof, gd_entry, gd_entry_type, gd_eof,
gd_flush, gd_linterp_tablename, gd_put_carray, gd_put_carray_slice,
gd_putdata, gd_raw_close, gd_raw_filename, gd_seek, gd_spf, gd_sync,
gd_tell.
* The error code GD_E_BAD_REPR has been merged into GD_E_BAD_CODE. The
symbol GD_E_BAD_REPR remains as an alias for GD_E_BAD_CODE, but is
deprecated.
* Attempts to seek past the end-of-field with gd_seek() now always
succeed, although the resultant position is encoding specific.
Previously, attempting to seek past the end-of-field on some encodings
would return an error.
* BUG FIX: The library now properly recovers from an I/O error while
trying to open an unencoded datafile. Previously, such an error would
poison the library's bookkeeping data, preventing all subsequent
attempts to open that file unless the Dirfile was re-opened. Reported
by Alexandra Rahlin.
* BUG FIX: GetData no longer segfaults when trying to do a large forward
seek before a write to a gzipped file. Reported by Joy Didier.
* BUG FIX: gd_putdata() no longer ignores I/O errors while seeking to the
first sample of a write.
* BUG FIX: If the reference field is being written to, gd_nframes() now
flushes it first before calculating the size of the dirfile.
Previously a short count could result for some encodings in this case.
* BUG FIX: Calling gd_putdata() to write gzip data with a non-zero
starting offset equal to the field's current I/O position, no longer
result in the call hanging.
* BUG FIX: In addition to the addition of write support mentioned above,
a number of problems with reading LZMA files has been fixed, which
should result in fewer segmentaion faults.
* BUG FIX: The parser no longer silently appends a closing > to scalar
field codes that contain an umatched opening < (e.g. "scalar<3"). This
is now interpreted as a simple field code (which may be rejected later
due to the presence of the invalid '<' character).
* BUG FIX: The parser no-longer interprets various numbers as field codes
when it shouldn't (e.g. when specifying a PHASE shift as "1." instead
of "1").
* BUG FIX: When writing scalar field codes to disk which could be inter-
preted as a number (e.g. the field code "1"), the library now forces
the interpretation of these field codes as codes rather than numbers by
appending a scalar index (making, e.g., "1<0>"), which is harmless.
Previously, these were written as-is, resulting in misinterpretation
the next time the Dirfile was opened. This only happens with Standards
Version 8 or later, see the following for earlier versions.
* BUG FIX: If the current Standards Version in effect is 7 or earlier,
ambiguous field codes (e.g., "1"), are now rejected by gd_[m]add() and
gd_alter_entry() with the error GD_E_BAD_CODE, since they can't be
represented in the metadata on disk. For the behaviour with later
Versions, and in permissive mode, see the previous.
* BUG FIX: If performing a metadata update due to renaming fields
(perhaps by passing GD_REN_UPDB to gd_rename()) results in an invalid
field code due to affix restrictions, the update now fails (but see
GD_REN_FORCE). Previously the invalid field code would be stored,
leading to errors when flushing the modified metadata to disk.
* BUG FIX: When performing a metadata update due to a renamed field, the
field codes containing subfields of the renamed field are now also
updated, including field codes specifying meta subfields which do not
exist.
* BUG FIX: reading a LINTERP table with fewer than two lines no longer
results in a segfault on close/discard.
* BUG FIX: gd_alter_raw() and similar no longer fail when asked to re-
encode the data file of a RAW field which has not been previously
accessed.
* BUG FIX: A previously-read LINTERP table is now always discarded when
changing table paths with gd_alter_linterp() or similar. Previously
these obsolete, cached LUTs would sometimes linger, causing incorrect
LINTERP computation.
* BUG FIX: The library now properly recovers from an I/O error while
trying to open an unencoded datafile. Previously, such an error would
poison the library's bookkeeping data, preventing all subsequent
attempts to open that file unless the Dirfile was re-opened. Reported
by Alexandra Rahlin.
* BUG FIX: Calling gd_putdata() to write gzip data with a non-zero
starting offset equal to the field's current I/O position, no longer
result in the call hanging.
* BUG FIX: The I/O position reported by gd_tell and gd_seek for slim,
zzip, and zzslim encoded data is now correct.
API Changes:
* CLARIFICATION: The macro GD_SIZE() declared in getdata.h is indeed part
of the public API. It returns the size in bytes of a sample of data of
a given type (e.g. GD_SIZE(GD_COMPLEX64) returns 8). It has been
around since GetData-0.3.0, but has only been documented since
GetData-0.8.3.
* The comp_scal member of the gd_entry_t object has been replaced with a
flags member, containing a flag (GD_EN_COMPSCAL) with the meaning of
the former comp_scal member. There are also flags for hiddenness
(GD_EN_HIDDEN) and whether the scalar entry codes in the field defi-
nition have been dereferenced (GD_EN_CALC).
* gd_[m]add() and gd_alter_entry() can now be used to set or change the
hiddenness of a field by setting or clearing the GD_EN_HIDDEN bit in
the supplied gd_entry_t object.
* Two new rename flags have been added:
- GD_REN_DANGLE which indicates the library shouldn't update ALIASes
whose target has been renamed (instead it will turn them into
dangling aliases)
- GD_REN_FORCE which causes the library to skip updating field codes
which would be invalid due to affixes instead of failing.
* The move_data argument of gd_move() has been replaced with a flags
argument which accepts the GD_REN_* flags, which have the same meaning
as they do with gd_rename().
* gd_move_alias() and gd_delete_alias() have been deleted: their
functions are now performed by gd_move() and gd_delete(), which now
operate on the alias itself when given the field code to an alias,
rather than the field the alias points to.
* A number of different error codes which indicated the same problem (an
I/O error returned by the operating system) have been merged into one.
The error codes GD_E_OPEN, GD_E_TRUNC, GD_E_RAW_IO, GD_E_OPEN_FRAGMENT,
GD_E_FLUSH are replaced by the new error GD_E_IO. The old symbols
remain as aliases but are deprecated. The corresponding error strings
also now include information from the underlying encoding library,
where possible. There is one exception to this merge: attempts to
flush metadata lines which are too long are now reported using
GD_E_LINE_TOO_LONG. Previously, these errors used GD_E_FLUSH.
* The error code GD_E_OPEN_LINFILE has also been removed. It has been
split into two parts:
- I/O errors resulting from reading the LINTERP table file are now
reported using GD_E_IO;
- Syntax errors in the table are reported using the new GD_E_LUT error
code. GD_E_OPEN_LINFILE remains as a deprecated alias for GD_E_LUT.
* gd_encoding_support() has been added to permit run-time determination
of supported encodings.
* gd_array_len() is the new name for gd_carray_len(). It now also
handles STRINGs (which have a length of one). The gd_carray_len() name
remains in the library, but has been marked deprecated.
* BUG FIX: If the dirfile path provided cannot be resolved (due to, for
instance, a symbolic link pointing to a non-existent path), gd_open()
and friends now return the correct error code (GD_E_IO).
* BUG FIX: gd_naliases() now returns an unsigned int, and zero on error,
as documented.
* BUG FIX: The API on 32-bit systems, which was broken in 0.8.7 and only
partially fixed in 0.8.8, should now work as expected again.
Bindings Changes:
* PHP bindings have been added.
* C++: There is no longer a default value for the "index" argument for
Entry methods (including subclasses) which accept it (viz. Input,
Scalar, ScalarIndex, Scale, CScale, Offset COffset, Coefficient,
CCoefficient). The exception to this is with Entry subclasses for
which zero is the only allowed value for the parameter.
* F77 and F95: The bindings no longer raise SIGABRT when the dirfile
space is exhausted. Instead they simply return a invalid dirfile unit
number.
* F77: Functions to add fields with named scalar parameters have been
added (GDASBT GDASCL GDASCP GDASCR GDASLC GDASMX GDASPH GDASPN GDASRC
GDASRW GDASSB GDASWD), but only for those field types which permit
named scalars. Similarly, functions for altering field metatdata with
named scalars are also present (GDLSBT GDLSCL GDLSCP GDLSCR GDLSLC
GDLSMX GDLSPH GDLSPN GDLSRC GDLSRW GDLSSB GDLSWD). These are provided
as an alternative to using GDASCA after the fact.
* IDL: The entry structure parser has been rewritten. It no longer
requires members which it doesn't need, and is also a lot more lax
about numerical data types. Notably, it now ignores a supplied
COMP_SCAL member. Floating point parameters can be specified in either
the base name (M, B, A, DIVIDEND) or else the member prefixed with 'C'
(CM, CB, CA, CDIVIDEND), whatever numerical type. The bindings will
ingest them appropriately. Also, N_FIELDS and POLY_ORD, may be
omitted, and will be calculated from the supplied data. A scalar
IN_FIELDS is treated like an single element array.
* IDL: GD_REFERENCE is now a function, instead of a procedure, as the
documentation has always claimed it was. It returns the current
reference field (or the empty string, if there is none). The second
parameter, the new reference field, is optional. (Previously the
second parameter was required.)
* PERL: The entry hash parser has been rewritten. It no longer requires
keys which it doesn't need.
* PERL: alter_entry() now only updates defined elements in the passed
entry hash.
* PYTHON: Building the python bindings now requires NumPy. Previously,
NumPy support was optional.
* PYTHON: for backwards compatiblity, exceptions now exist for deprecated
error codes (such as OpenError). These deprecated exceptions are
simply aliases for the current ones and are never returned by the
bindings.
* C++ BUG FIX: The Entry methods Input, Scalar, and ScalarIndex
(including subclasses) now return zero or NULL when passed an out-of-
range index value. Previously they would return, variously, zero,
NULL, another value for some other, valid index value, or segfault.
* C++ BUG FIX: The flags parameter to Dirfile::Delete() is now unsigned,
as it is in the C API.
* F95 BUG FIX: fgd_add and fgd_alter_entry no longer ignore named scalar
parameters provided in supplied entry structures.
* PYTHON BUG FIX: Several memory leaks have been plugged. Patch from
Matthew Petroff.
Miscellaneous:
* The minimum autotools versions have been bumped. Autoconf-2.65 or
newer, automake-1.13 or newer, and libtool-2.2.7b or newer are now
required to rebuild the configure script and associated build environ-
ment. NOTE: In general, most people building GetData from a source
release don't need the tools to build GetData; the autotools are only
needed if changes need to be made to the configure script or Makefile
input files provided in the release or if building from the repository.