From Ocean Teacher Library
Spreadsheet FormatsContents |
Background
- A spreadsheet is a data table, consisting of "rows" of data items arranged in "columns". The rows are sometimes called records, and the columns are sometimes called fields. Spreadsheets can be preceded by ancillary data in "header" rows that identify the columns. A single header row may simply identify the columns with "labels." A multi-row header can be very lengthy and may have an entirely different format from the body of the spreadsheet below. Spreadsheets can also be extremely simple, such as the XYZ form described below.
- A spreadsheet is also computer application that simulates a paper worksheet. It displays multiple cells that together make up a grid consisting of rows and columns, each cell containing either alphanumeric text or numeric values. A spreadsheet cell may alternatively contain a formula that defines how the contents of that cell is to be calculated from the contents of any other cell (or combination of cells) each time any cell is updated. [From Wikipedia: Spreadsheet]
- The term spreadsheet has come to mean either the commercial program used to manage the data, or the row-and-column array of data itself.
Fundamental Property of Spreadsheets
All data rows below the header (i.e. the spreadsheet proper) have exactly the same format (i.e. the sequence of fields), and every row is completely self-contained. This means that all the data to understand each row are contained within it, and no other rows (beside the header) are needed for its intepretation. If the header is ignored, you can sort and re-sort a true spreadsheet on any field or sequence of fields, but you can always re-construct the original sequence. This fundamental property is not possessed by the closely related Archive Formats, which in effect have multiple headers occurring throughout the file.
Commercial (Binary) Formats
Spreadsheets evolved from the old VisiCalc program in the 1970's, through Lotus 1-2-3 and Quattro in the 80's and 90's, to Microsoft's Excel, Apple's Numbers and OpenOffice' Calc programs today. Excel is the most widely used. They all use binary internal formats for calculations, but can import and export a wide variety of ASCII exchange formats. They can import just about any ASCII spreadsheet, parsing it according to user-supplied rules.
General ASCII Formats
Spreadsheets can range from the very simplest form, and XYZ triplet consisting of longitude-latitude-parameter data, to very large arrays with dozens of columns (including quality flag columns beside the data value columns). Commonly also called "table data" spreadsheets are the most widely used data type for exchange of data between scientists who are not using the same software, enabling them to immediately "see" the data and understand its contents. These spreadsheets often have specific ASCII characters to separate the data columns. In general order of commonality, they are COMMA, TAB, SPACE, DOUBLE QUOTES, and COLON.
Tab-Separated Values (TSV) File
Ocean Data View Spreadsheet (TXT)
TSV is extremely common in European earth sciences software, and can be assumed in the absence of other information. TSV files often have the extension TXT. This example, with 9 informational header rows, a blank row and a label row, is a spreadsheet export from Ocean Data View. [Some units information has been removed in the label row for simplicity; inspect the downloadable file for the full details.]
//<Version>ODV Spreadsheet V4.0</Version> //<Creator>Murray@MURRAYB</Creator> //<CreateTime>2009-05-14T11:19:59</CreateTime> //<Software>Ocean Data View Version 4.1.1 - 2009</Software> //<Source>C:/TUTORIALS_namibia/Data/2 National collections/1 Bottle/bottle_data_namibia_wod05.var</Source> //<SourceLastModified>2009-04-15T10:59:19</SourceLastModified> //<DataField>GeneralField</DataField> //<DataType>GeneralType</DataType> //<MissingValueIndicators>-99999</MissingValueIndicators> // Cruise Station Type mon/day/yr hh:mm Longitude Latitude Bot. Depth Depth Temperature WOD05_06000082 506503 B 06/03/1967 10:00:00.000 13.98 -26.65 400 5 17.43 WOD05_06000082 506503 B 06/03/1967 10:00:00.000 13.98 -26.65 400 -99999 -99999 WOD05_06000082 506591 B 06/04/1967 08:00:00.000 14.07 -26.68 400 5 16.8 WOD05_06000082 506591 B 06/04/1967 08:00:00.000 14.07 -26.68 400 -99999 -99999 WOD05_06000082 506701 B 06/05/1967 08:00:00.000 14.1 -26.77 390 5 15.92 WOD05_06000082 506701 B 06/05/1967 08:00:00.000 14.1 -26.77 390 -99999 -99999 WOD05_06000082 506829 B 06/06/1967 11:00:00.000 13.97 -26.5 385 5 16.38 WOD05_06000082 506829 B 06/06/1967 11:00:00.000 13.97 -26.5 385 -99999 -99999 WOD05_06000082 507059 B 06/08/1967 08:00:00.000 14.03 -26.57 405 5 16.28
SeaBird Spreadsheet (CNV)
This spreadsheet has an extremely large set of header line, but it's still a spreadsheet. The data rows were reduced here to 7 for brevity. Notice that there is no single header row to identify the spreadsheet fields; you have to deduce this from information in the header lines beginning with "name".
Sea-Bird SBE 9 Raw Data File:
FileName = C:\SEABIRD\DATA\STA0679.DAT
Software Version 4.217
Temperature SN = 1166
Conductivity SN = 1149
Number of Bytes Per Scan = 15
Number of Voltage Words = 1
System UpLoad Time = Jun 30 2003 06:57:03
NMEA Latitude = 15 01.75 N
NMEA Longitude = 017 33.99 W
NMEA UpLoad Time = not available
Store Lat/Lon Data = Add to Header Only
SHIP: 14
Station: 0679
Echodepth: 818
Log: 9886.57
Wind dir/force: 2 2
Air temp: 26
Weather Sky: 1 4
Sea: 2
nquan = 12
nvalues = 499
units = metric
name 0 = scan: scan number
name 1 = pr: pressure [db]
name 2 = t068: temperature, IPTS-68 [deg C]
name 3 = c0S/m: conductivity [S/m]
name 4 = oxT: oxygen, temperature [deg C]
name 5 = oxC: oxygen, current [æA]
name 6 = sal00: salinity, PSS-78 [PSU]
name 7 = oxML/L: oxygen [ml/l]
name 8 = svW: sound velocity, wilson [m/s]
name 9 = v4: voltage, number 4 [V]
name 10 = sal00: salinity, PSS-78 [PSU]
name 11 = flag: 0.000e+00
span 0 = 144, 13896
span 1 = 4.000, 502.000
span 2 = 9.2907, 27.5198
span 3 = 3.776038, 5.701168
span 4 = 12.29066, 25.75421
span 5 = 0.13946, 1.15242
span 6 = 35.1234, 35.9173
span 7 = 1.34850, 5.70296
span 8 = 1496.12, 1541.89
span 9 = 0.000, 0.000
span 10 = 35.1219, 35.9173
span 11 = 0.000e+00, 0.000e+00
interval = decibars: 1
start_time = Jun 30 1903 06:57:03
bad_flag = -9.990e-29
sensor 0 = Frequency 0 temperature, 1166, 10-Jan-02
sensor 1 = Frequency 1 conductivity, 1149, 10-Jan-02, cpcor = -9.5700e-08
sensor 2 = Frequency 2 pressure, 53966, 17-Jun-93
sensor 3 = Extrnl Volt 0 oxygen, current, 130367, 13-May-02
sensor 4 = Extrnl Volt 1 oxygen, temperature, 130367, 13-May-02
datcnv_date = Jun 30 2003 07:07:11, 4.217
datcnv_in = STA0679.DAT NANSEN.CON
datcnv_skipover = 0
alignctd_date = Jun 30 2003 07:07:17, 4.217
alignctd_in = STA0679.CNV
alignctd_cond_advSecs = 0.000, 0.000
alignctd_temp_advSecs = 0.000, 0.000
alignctd_oxygen_current_advSecs = 3.000, 0.000
alignctd_oxygen_temp_advSecs = 3.000, 0.000
wildedit_date = Jun 30 2003 07:07:21, 4.217
wildedit_in = STA0679.CNV
wildedit_pass1_nstd = 2.0
wildedit_pass2_nstd = 20.0
wildedit_npoint = 10
wildedit_vars = pr t068 c0S/m oxT oxC sal00 oxML/L svW v4
wildedit_excl_bad_scans = yes
celltm_date = Jun 30 2003 07:07:24, 4.217
celltm_in = STA0679.CNV
celltm_alpha = 0.0300, 0.0000
celltm_tau = 9.0000, 0.0000
filter_date = Jun 30 2003 07:07:28, 4.217
filter_in = STA0679.CNV
filter_low_pass_tc_A = 0.030
filter_low_pass_tc_B = 0.150
filter_low_pass_A_vars = c0S/m
filter_low_pass_B_vars = pr
loopedit_date = Jun 30 2003 07:07:33, 4.217
loopedit_in = STA0679.CNV
loopedit_minVelocity = 0.100
loopedit_excl_bad_scans = yes
derive_date = Jun 30 2003 07:07:38, 4.217
derive_in = STA0679.CNV NANSEN.CON
binavg_date = Jun 30 2003 07:07:42, 4.217
binavg_in = STA0679.CNV
binavg_bintype = Pressure Bins
binavg_binsize = 1.00
binavg_excl_bad_scans = yes
binavg_downcast_only = no
binavg_skipover = 0
binavg_surface_bin = yes, min = 0.300, max = 0.500, value = 0.000
file_type = ascii
END
144 4.000 27.5148 5.699974 24.99342 1.14713 35.9154 5.36460 1541.61 0.000 35.9154 0.000e+00
338 5.000 27.5154 5.700342 25.12097 1.14973 35.9173 5.36649 1541.63 0.000 35.9173 0.000e+00
360 6.000 27.5152 5.700173 25.13345 1.14962 35.9159 5.36484 1541.65 0.000 35.9159 0.000e+00
395 7.000 27.5138 5.699983 25.15682 1.14925 35.9153 5.36684 1541.66 0.000 35.9153 0.000e+00
426 8.000 27.5099 5.699572 25.17440 1.14958 35.9152 5.35300 1541.67 0.000 35.9151 0.000e+00
449 9.000 27.5138 5.700026 25.19099 1.14951 35.9151 5.35227 1541.69 0.000 35.9151 0.000e+00
476 10.000 27.5167 5.700434 25.20526 1.14914 35.9153 5.35063 1541.71 0.000 35.9154 0.000e+00
Sample truncated for brevity...
Seabird Spreadsheet with 1-Row Header (ASC)
This more basic spreadsheet can be combined with the appropriate header, in the Seabird software, and after re-arrangement the above spreadsheet is created.
PrdM Tv290C C0mS/cm Obs Par Spar Cpar DepSM Nbin Sal00 Density00 Flag
1.009 12.4782 37.322014 13.8862 1.2285e+02 1.0575e+05 3.6692e+01 1.000 5 31.9878 1024.1654 0.0000e+00
2.017 12.4820 37.343418 13.7895 2.5981e+01 1.0575e+05 7.7246e+00 2.000 7 32.0045 1024.1822 0.0000e+00
3.026 12.4825 37.346811 13.6855 7.8378e+00 1.0575e+05 2.3305e+00 3.000 8 32.0070 1024.1885 0.0000e+00
4.035 12.4827 37.351523 16.7880 2.5299e+00 1.0575e+05 7.5229e-01 4.000 7 32.0109 1024.1960 0.0000e+00
5.044 12.4810 37.356942 22.0714 1.0971e+00 1.0575e+05 3.2653e-01 5.000 7 32.0171 1024.2057 0.0000e+00
6.052 12.4775 37.360021 25.7383 8.2565e-01 1.0575e+05 2.4489e-01 6.000 7 32.0226 1024.2152 0.0000e+00
7.061 12.4745 37.359871 31.7655 7.7696e-01 1.0575e+05 2.3065e-01 7.000 7 32.0246 1024.2218 0.0000e+00
8.070 12.4725 37.363497 35.6999 7.7729e-01 1.0575e+05 2.3091e-01 8.000 6 32.0294 1024.2304 0.0000e+00
9.079 12.4703 37.366344 36.2423 7.7195e-01 1.0575e+05 2.2905e-01 9.000 8 32.0335 1024.2386 0.0000e+00
10.087 12.4680 37.369758 46.4126 7.7156e-01 1.0575e+05 2.2922e-01 10.000 7 32.0384 1024.2473 0.0000e+00
11.097 12.4655 37.372670 64.3901 7.6671e-01 1.0575e+05 2.2745e-01 11.000 11 32.0429 1024.2559 0.0000e+00
10.088 12.4662 37.370930 49.5640 7.6086e-01 1.0575e+05 2.2498e-01 10.000 7 32.0410 1024.2497 0.0000e+00
9.078 12.4669 37.369932 43.5323 7.6317e-01 1.0575e+05 2.2561e-01 9.000 7 32.0399 1024.2442 0.0000e+00
8.070 12.4671 37.370193 36.3797 7.6505e-01 1.0575e+05 2.2595e-01 8.000 6 32.0403 1024.2400 0.0000e+00
7.061 12.4686 37.367081 25.6574 7.7112e-01 1.0575e+05 2.2854e-01 7.000 7 32.0365 1024.2322 0.0000e+00
6.053 12.4719 37.360644 22.8541 8.2998e-01 1.0575e+05 2.4609e-01 6.000 8 32.0280 1024.2204 0.0000e+00
5.044 12.4734 37.357165 22.2153 1.0199e+00 1.0575e+05 3.0203e-01 5.000 7 32.0238 1024.2123 0.0000e+00
4.035 12.4720 37.361669 20.9403 2.5607e+00 1.0575e+05 7.6136e-01 4.000 9 32.0297 1024.2126 0.0000e+00
3.026 12.4705 37.360506 15.2629 9.4358e+00 1.0575e+05 2.8016e+00 3.000 9 32.0302 1024.2088 0.0000e+00
2.018 12.4709 37.355558 14.8751 4.0622e+01 1.0575e+05 1.2093e+01 2.000 9 32.0256 1024.2006 0.0000e+00
1.009 12.4723 36.910330 15.4404 1.9278e+02 1.0575e+05 5.7652e+01 1.000 8 31.6005 1023.8665 0.0000e+00
Comma Separated Values (CSV) File
CSV files, which are much more common in the US than TSV, often have the extension CSV. See the next section to see a cSV example.
XYZ Tables
One very widely used spreadsheet format is the XYZ table, where the three columns of data represent (usually) longitude, latitude and a parameter value. Separators in XYZ files are often SPACE, COMMA or TAB. They are the simplest and most unambiguous means of transferring a simple dataset, particularly the contents of grids, between programs.
In the following simple example, the columns are labeled, and the fields are separated by commas (CSV). If a data manager ever encounters a file like this, with no labeling, it is virtually certain to be XYZ format, where Z is some measured parameter. Only in the extremely unlikely case that a dataset has been stored using principles from the old GF3 format, where latitude preceded longitude, would this assumption be incorrect.
LONGITUDE(X),LATITUDE(Y),PARAMETER(Z) -0.9792,-13.0209,+64.0000 -0.9375,-13.0209,+64.0000 -0.8958,-13.0209,+66.0000 -0.8542,-13.0209,+65.0000 -0.8125,-13.0209,+64.0000 -0.7708,-13.0209,+64.0000 -0.7292,-13.0209,+63.0000 -0.6875,-13.0209,+63.0000 -0.6458,-13.0209,+62.0000 -0.6042,-13.0209,+62.0000 -0.5625,-13.0209,+62.0000 -0.5208,-13.0209,+63.0000 -0.4792,-13.0209,+63.0000 -0.4375,-13.0209,+61.0000 other lines removed for brevity...
The relation between an XYZ file and the source grid is not simple, however, as you can see in the related article Grids/Rasters and XYZ Files.
Problems with Delimiters
- A delimiter at the end of a line is often interpreted as the beginning of another empty field. This can "appear" to the software program as the beginning of the next data row, causing a complete mis-reading of the table. In general, data rows should not end with a delimiter.
- Most spreadsheet programs, when reading an input spreadsheet, allow the user to specify multiple sequential delimiters (such as 2 spaces in a row) as one delimiter. Make sure you use this capability when called for.
- The SPACE can be problematic, because leading spaces (i.e. at the beginning of a row), or trailing spaces (i.e. at the end of the row) can be overlooked or mis-read.
Problems with Spreadsheets
- Users who employ formulas to calculate new column values from other columns need to make sure that if they export an ASCII version of the table that they export the resulting value. Each spreadsheet program has special tools to do this.
- The recent trend to produce "spreadsheets" with multiple header rows seemed like a very minimal change, but it results in files that must be edited for further processing, or they cause errors. Users should be sensitive to this issue, and check exact formatting before further work with "spreadsheets."
Additional Resources
Subsections of this Article
| Pagename | Short title | Description | |
|---|---|---|---|
| Grids/Rasters and XYZ Files | Grids/Rasters and XYZ Files | Grids/Rasters and XYZ | The relationship between the grid cell values and geography is such that the parameter value is assumed to exist at the geographic centre of the grid cell |
| Ocean Data View Spreadsheet Format | Ocean Data View Spreadsheet Format | ODV Spreadsheet | none |
| Sea Level Data Formats | Sea Level Data Formats | Sea Level Formats | none |
Information about this article
Short title: Spreadsheet Formats
Description: An array of rows and columns, each cell containing either alphanumeric text or numeric values. The columns in the spreadsheet, usually labeled in the first row, contain separate types of information; the rows contain all the separate types of information associated with a single entity, such as an oceanographic station. All rows in a true spreadsheet have exactly the same format.
Expertise level: beginner
Author: Murray.Brown
Approval status: approved
Approved by: Murray.Brown
Last change: 2012-1-13
Subsection of: Marine Data Format Types
Contact
If you have any direct comments or suggestions for the author of this page then please feel free to send an email to the author (listed above). For discussions on this page please use the discussions page.,



