From Ocean Teacher Library

Jump to: navigation, search
Self-Describing Formats

Contents

Background

  • These formats are the operational formats in use today by the world's meteorological community (GRIB, BUFR), the satellite community (HDF) and the newly-developed ocean observing systems (NetCDF). Three of them are suitable for gridded or raster data (GRIB, HDF, NetCDF) and two of them are suited for data reports (BUFR, NetCDF). They contain extensive internal metadata, hence the group name, providing user systems with all the information needed for both data discovery and practical usage. Recent advancements that indicate a fusion of these technologies are noted below.
  • WMO calls BUFR and GRIB table-driven code forms, because they require the use of many standard code tables (see the WMO Codes reference below). The global meteorological community has led the development of data standards, such as code tables, and in recent months the ocean community has begun to look toward these sample principles.

Binary Universal Form for the Representation of Meteorological Data (BUFR)

Please read the sub-article BUFR and GRIB Formats

Character Form for the Representation and Exchange of Data (CREX)

ASCII analog to BUFR. [Further information needed]

Gridded Binary (GRIB, GRB, GRB1, GRB2)

Please read the sub-article BUFR and GRIB Formats

Hierarchical Data Format (HDF, HDF4, HD4, HDF5, HD5)

Due to its extremely widespread and long-term use within the remote sensing community, HDF has experienced evolution in form, resulting in some issues about format and use that must be addressed. Many thanks to the HDF Group for the material below on Format Issues and for some of the resources cited.

HDF Format Issues

  • HDF was originally developed as a robust, standard format for gridded data ranging in scales from planetary surface scans down to electron microscope scenes. It remains one of the principal formats for distribution of Earth Observing System (EOS) data from US NASA.
  • There are two different versions of HDF: HDF4 and HDF5.
    • HDF4 is the original HDF format and HDF5 is a completely new HDF format.
    • Some software programs can accommodate both HDF4 and HDF5, but in general the switch to HDF5 involves adopting new systems.
    • Both are very general and can be used for almost any kind of data.
    • HDF4 has been most widely used during the past 2 decades for data publications from NASA. Apparently it is still used exclusively by NASA's Ocean Color Web.
  • HDF-EOS
    • In order to standardize their use of for a particular kind of data, it is common for users to specify just how that data should be organized in either HDF4 or HDF5, and to produce software that understands that organization and hides it from the user.
    • This has been done by EOS for earth science data.
    • EOS has defined a data model called HDF-EOS, which defines certain kinds of earth science data objects, and specifies how to organize them in HDF4 and HDF5.
    • So, you can think of HDF-EOS as a collection of earth science data objects, and there are many tools for accessing HDF-EOS files.
    • These Earth Observing Systems (EOS) extensions are supposed to be adopted by all US NASA systems, but there are unfortunately some hold-outs.
  • HDF-EOS2 and HDF-EOS5
    • There are two implementations of HDF-EOS: HDF-EOS2 (which uses HDF4) and HDF-EOS5 (which uses HDF5).
    • When you receive an HDF-EOS file, you usually do not need to worry about which format it uses. The software that is available for working with HDF-EOS files usually works with both kinds.
    • HDF-EOS2 is used operationally by MODIS, MISR, ASTER, Landsat, AIRS and other EOS instruments.
    • HDF-EOS5 is used only for EOS Aura instruments at present.

HDF Usage Issues

  • The current status of HDF use is complicated by these factors:
    • Many sofware programs do not state specifically which version of HDF their software can accomodate, and conversely many data sites don't clearly state which version they contain
    • Possible misunderstandings and disagreements about exact format specifications (resulting in incorrect/hybrid forms)
    • Different georeferencing methods used for Levels 1 and 2 data from Levels 3 and 4
  • HDF Use Recommendation
    • HDF use is a critical skill in the toolkit of marine data managers, but due to the above factors it is never easy, particularly so if a PC/Windows system is the only only available computer platform.
    • When HDF use is necessary, due to the desirability of the data , it is usually possible to use HDFView to convert regular HDF grids (i.e. L3 and L4) to TXT, and then it should be further converted to a widely used grid format, such as either the ASCII or the binary versions of the ESRI gridded data format. Swath data (L2) may be accommodated by the software program Panoply, and/or HDF-EOS data may be accommodated by the software program HEG. Otherwise, specific software recommendations given with the data products may be useful.

Network Common Data Form (NetCDF, NC, NC4, NCML)

  • NetCDF was developed principally for array data (i.e. grids), but it has been extended to measurements data, as BUFR is used. It is widely used in the climate, weather and marine community, and there are indications that it will play a large role in the emerging global ocean observing systems. Recently NetCDF 4.0 was released, incorporating HDF5, representing the first union of major formats. NetCDF has an ASCII analog format, CDL, that can be easily "compiled" to NetCDF.
  • Apparently NetCDF is now being routinely used in some global remote sensing programs, for example the Group for High Resolution Sea Surface Temperature (GHRSST). Because NetCDF development has not experience quite so many "version" problems as HDF (although there have been some issues), its use greatly furthers compatibility between data products and applications.
  • In development is an ASCII variant of NetCDF, similar to the CDL format (below) but written with XML syntax, called NetCDF Markup Language (NCML). An introductory level reference is provided below.
  • NetCDF Use Recommendation
    • Well-formed NetCDF grid files represent very few difficulties, when used with a wide variety of visualization and analysis programs. Capture of the basic grid within the file can be accomplished by exporting a CDL file (from ncBrowse) or by simple cut and paste from the data view in Panoply (using the displayed geographic coordinates). Either route enables easy creation of floating point TIF files for a GIS system, i.e. for WMS, after simple conversions in Saga. Exactly subsetted images can now be created with Panoply, but the only export mode for the geo-registered images is KMZ, unfortunately. The entire page (image plus labeling) can be saved and geo-registered with the Georeferencing Tool.
  • Available files:
    • None at this time

Common Data Language (CDL)

The CDL format is the ASCII analog to NetCDF (above). Both are designed primarily to hold grids, although recently they have been extended to hold measurement data. When a CDL file contains a grid, the grid dimensions are not necessarily Cartesian, so the coordinates of the cell values are given in separate longitude (COADSX) and latitude (COADSY) lists. Notice in this example file of air temperature offshore Namibia, that there is a large header containing useful metadata, a feature CDL shares with NetCDF.

netcdf coads_airT_annu_namib {
dimensions:
    TIME = UNLIMITED ; // (1 currently)
    COADSY27_38 = 12 ;
    COADSX170_181 = 12 ;
variables:
  double TIME(TIME);
   TIME:units = "hour since 0000-01-01 00:00:00";
   TIME:time_origin = "01-JAN-0000 00:00:00";
   TIME:modulo = " ";
   TIME:axis = "T";
  double COADSY27_38(COADSY27_38);
   COADSY27_38:units = "degrees_north";
   COADSY27_38:point_spacing = "even";
   COADSY27_38:axis = "Y";
  double COADSX170_181(COADSX170_181);
   COADSX170_181:units = "degrees_east";
   COADSX170_181:modulo = " ";
   COADSX170_181:point_spacing = "even";
   COADSX170_181:axis = "X";
  float AIRT(TIME, COADSY27_38, COADSX170_181);
   AIRT:missing_value = -1.0E34; // float
   AIRT:_FillValue = -1.0E34; // float
   AIRT:long_name = "AIR TEMPERATURE";
   AIRT:history = "From coads_climatology";
   AIRT:units = "DEG C";
data:
TIME = 366.0 ;
COADSY27_38 = -37.0, -35.0, -33.0, -31.0, -29.0, -27.0, -25.0, -23.0, -21.0, 
  -19.0, -17.0, -15.0 ;
COADSX170_181 = 359.0, 361.0, 363.0, 365.0, 367.0, 369.0, 371.0, 373.0, 
  375.0, 377.0, 379.0, 381.0 ;
AIRT = 17.228333, 17.065, 17.455263, 16.346666, 17.512499, 16.987143, 
  17.545, 17.392857, 18.278461, 18.636896, 19.393158, 20.12606, 18.900278, 
  18.434546, 18.449444, 18.503714, 18.595135, 18.457222, 18.675499, 
  18.710697, 19.071627, 19.72925, 19.780909, 20.680454, 20.247097, 
  20.205555, 20.416842, 19.726, 19.536154, 19.536154, 19.85093, 19.870714, 
  19.926363, 19.161818, 18.026363, -1.0E34, 21.402308, 21.224167, 21.257647, 
  21.004103, 20.88439, 20.502619, 20.328604, 20.34159, 20.045227, 19.30814, 
  18.785713, -1.0E34, 22.426786, 22.085554, 21.621315, 21.5655, 21.184048, 
  20.894545, 20.68186, 20.453863, 19.682499, 17.732187, -1.0E34, -1.0E34, 
  22.565641, 22.434633, 22.128809, 21.716743, 21.435226, 20.857273, 
  20.561363, 20.263409, 17.732925, -1.0E34, -1.0E34, -1.0E34, 22.782927, 
  22.277618, 22.16744, 21.9075, 21.535814, 21.021135, 20.62659, 18.51375, 
  16.666666, -1.0E34, -1.0E34, -1.0E34, 22.719025, 22.728636, 22.302273, 
  22.170513, 21.755814, 21.232044, 20.676285, 18.70317, 17.25389, -1.0E34, 
  -1.0E34, -1.0E34, 22.673489, 22.640232, 22.737429, 22.063095, 21.708635, 
  21.38128, 20.5005, 19.43775, 20.286999, -1.0E34, -1.0E34, -1.0E34, 
  22.922045, 22.576841, 22.574652, 22.175226, 21.924318, 21.463783, 
  20.031794, 19.62697, -1.0E34, -1.0E34, -1.0E34, -1.0E34, 23.318485, 
  22.988647, 22.597273, 22.291136, 21.959486, 21.70975, 20.270811, -1.0E34, 
  -1.0E34, -1.0E34, -1.0E34, -1.0E34, 23.486755, 22.993954, 22.83909, 
  22.902895, 22.845121, 22.681786, 22.34054, 23.177826, -1.0E34, -1.0E34, 
  -1.0E34, -1.0E34 ;

ENVISAT Format

The EnviSat format is actually a family of closely related formats, developed within a common schema for representation of data from the eponymous satellite platform. ENVISAT products will all follow a generalized structure consisting of:

  • A Main Product Header (MPH); inspection of sample files indicates the MPH is often ASCII
  • A Specific Product Header (SPH) containing information specific to the whole product plus one or more Data Set Descriptors (DSDs) which describe individual Data Sets; often ASCII
  • One or more Data Sets (DSs), each consisting of one or more Data Set Records (DSRs); often binary.

Consult the references below for detailed information.

Additional Resources


Subsections of this Article

Pagename Short title Description
BUFR and GRIB Formats BUFR and GRIB Formats BUFR and GRIB none


Information about this article

Short title: Self-Describing Formats

Description: These formats contain extensive internal metadata, which provides user systems with all the information needed for both use and discovery. Station data, grids and rasters can be accommodated in these formats.

Expertise level: beginner

Author: Murray.Brown

Approval status: approved

Approved by: Murray.Brown

Last change: 2012-1-13

Subsection of: Marine Data Format Types

Contact

If you have any direct comments or suggestions for the author of this page then please feel free to send an email to the author (listed above). For discussions on this page please use the discussions page.,   

This page was last modified on 13 January 2012, at 19:11.This page has been accessed 6,416 times.
SemanticTreeview close tree

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License