Canadian Database of Geochemical Surveys

Excel Data Files Help Topics

Overview

Raw data for approximately 100 surveys have been compiled into MS-Excel® spreadsheets that are available for download.  The files have been standardised with respect to their first nine columns so that the user will find the same information in the same place in all of the files.  The first nine columns contain details of various sample identifiers, sample location, sample type and preparation which are described in more detail here.

Columns of geochemical or indicator mineral data are displayed after the nine columns in the order the analytical laboratory presented them, and commonly, but not always, this is alphabetically by element symbol.  The elements listed in the file depend on what was in the analytical package provided by the lab, and different labs may present AAS data in a different order, for example, or may not include the same elements in their AAS package.

The MS-Excel® spreadsheets can be downloaded from from the Analytical History section of surveys with an "Extended Metadata" section on their Survey Metadata page as described here.  The spreadsheets can be downloaded in three formats based on how the "less-than-detection-limit" data are handled:

  1. as a negative (<2 becomes -2)
  2. as half the detection limit (<2 becomes 1)
  3. as text (<2 stays as <2)

The naming convention for the MS-Excel® files is to refer to the analytical package used by the lab to analyse the samples.  Each package has been assigned a unique number in the database and this appears in the file name, followed by an a, b or c to denote how the "less-than-detection-limit" data are presented (e.g. pkg_0134a.xls).

There is one basic type of spreadsheet which in addition to the first 9 columns, presents either geochemical and indicator mineral data.  There are two extensions to this basic type of file, one for National Geochemical Reconnaissance (NGR) geochemical data and one for indicator mineral probe data.  These extensions to the basic type of spreadsheet present extra columns after the 9 common columns and before the analytical data.  These are described in the next sections.

Please Note: the spreadsheets are based on an analytical package provided by a laboratory.  Many sample bundles over the years from many surveys have been analysed by the same analytical package and data for ALL of the samples analysed by that analytical package will appear in the spreadsheet.  It is possible that the user only wants to look at samples from a particular survey.  The Survey_Key column in the spreadsheet can be used to filter surveys of interest by clicking on the arrow beside the column name and then selecting the required surveys from the drop-down list.  The filtered/requested surveys will appear as a "view"; the rest of the lines of the file will be hidden until 'Select All' is clicked back on.  The number of samples that appear in the view (i.e. those in the survey(s) selected from the drop-down list) appears in the bottom left corner of the file window (e.g. "655 of 19167 records found").

Spreadsheet Column Structure

As noted previously, the first nine columns of all the downloadable spreadsheets data files are the same, regardless of sample medium or analytical data contained within the file.  The columns in the order they appear are described in the "First Ten columns" section below:

First Ten Columns
Lab_Sample_Identifier

An identifier for the sample. It generally has a close relationship with the sample numbers in the published report.  It is unique within a sample bundle but is not guaranteed to be unique across surveys.  Lab_Sample_Identifiers for NGR data are always unique across all NGR surveys.

Lab_Key

The Lab_Key is a unique identifier composed of three integers separated by colons:

  1. the Organisation_ID, which indicates the organisation responsible for the analytical work (e.g. 21 = GSC Northern Canada Division; 13 = New Brunswick Department of Natural Resources); the full list of IDs can be found here.
  2. the Analytical_Sample_Bundle_ID which is an arbitrary integer that reflects the group with which the sample in question was bundled and sent for analysis.
  3. the Lab_Sample_ID, an arbitrary integer which uniquely identifies each sample analysed within a specific sample bundle.

Unlike the Lab_Sample_Identifier, the Lab_Key is guaranteed to be unique across all surveys.

Bundle_Key

Samples are grouped into "bundles" before sending them to the lab.  The Bundle_Key is hyperlinked and takes the user to a metadata page for the Analytical Sample Bundle.  The Bundle_Key is a unique identifier composed of two integers separated by a colon:

  1. the Organisation_ID, which indicates the organisation responsible for the analytical work (e.g. 21 = GSC Northern Canada Division; 13 = New Brunswick Department of Natural Resources); the full list of IDs can be found here.
  2. the Analytical_Sample_Bundle_ID which is an arbitrary integer that indicates the group with which the sample in question was bundled and sent for analysis.

It is useful to either sort or filter on this column so that samples from one bundle are grouped together when viewing the MS-Excel® file.

Survey_Key

The Survey_Key is a hyperlink that when clicked takes the user to the metadata page for the survey that the sample was collected in/belongs to.  The Survey_Key is unique identifier, composed of two integers separated by colons:

  1. the Organisation_ID, which represents the organisation responsible for the field work; the full list of IDs can be found here.
  2. the Survey_ID, which is a number assigned to each survey, and which increases by 1 each time a new survey is assigned to that organisation.
Field_Key

The Field_Key links tha analysed sample back to the sample that was collected in the field.  In order to create a unique field identifier, the Field_Key is made up of four arbitrary integers separated by colons:

  1. the Site_ID, which is an arbitrary integer that is incremented at each new site visited; this is the integer that is displayed beside the red crosses on the KML location maps that are found under Sample Locations in Extended Metadata on the Survey Metadata pages.
  2. the Field Sample_ID (which usually equals 1 i.e. one sample is collected at the site).
  3. the Preparation_Lab_Sample_ID (which usually equals 1 i.e. the sample is prepared once for analysis).
  4. the Sample_Split_ID which is an arbitrary integer used to indicate whether the sample is the first or second of a duplicate pair (duplicate pairs are used to measure lab precision).
Control_Reference_ID

An integer ID that refers to the reference material used as a laboratory control (e.g. 95 = the international soil/till reference material TILL 4).  The full list of IDs can be found here{MISSING_PAGE}.

Latitude_NAD83 and Longitude_NAD83

These two columns identify the geographic location of the samples, in decimal degrees, and based on North American Datum 83.

Sample_Type_Name_en

The English name of the sample medium that was collected (e.g. till, lake sediments).

Preparation_Method_Name_en

The English name of the material that was produced by the preparation lab prior to the sample being sent for analysis (e.g <63 micron, Undivided).

The three "Key" columns (Lab_Key, Survey_Key and Field_Key) are used consistently across all of the spreadsheets.  They allow the contents of different spreadsheets to be merged together.

Basic Spreadsheet

The analytical or indicator mineral count data in the basic type of spreadsheet are found after the ten columns described above, specifically following column Preparation_Method_Name_en.  The column headers generally include the element symbol (or grain name) and analytical method, if applicable.  The units and determination limits are not listed in the spreadsheet but can be found by clicking on the Bundle_Key hyperlink and viewing the metadata page for the analytical sample bundle, specifically the Analytical Methods table.  In this table, the column "Column Name" lists the columns found in the spreadsheet and identifies the units and determination limit for the element.  The actual elements in the spreadsheet will vary depending on the analytical package being viewed.  The analytical data columns remain the same in the Negative (N), Half (H) and Text (T) spreadsheets for the same analytical package.

NGR Spreadsheet

The first column, Lab_Sample_Identifier, is found in all spreadsheets but is unique and very consistent for NGR data.  It is made up of five fields that are separated by colons:

  1. the 1:250k NTS mapsheet number (e.g. 031C).
  2. the sample number (e.g. 761024), made up of the year (76), a field crew identifier (1) and a three digit incremental number representing the site number (024).
  3. the replicate status, or repstat, to describe the relationship of the sample with respect to others in the survey.  It is used to indicate whether the sample is routine (00), the first of a field duplicate pair (10), the second of a field duplicate pair (20), a blind/lab duplicate (80) or a control reference/standard (91, 92, 93 etc.).
  4. the master sample as it relates to the repstat.  If repstat=00, 10 or 9x, there is no master sample; if repstat=20, the master sample is the sample ID (year, crew, site#) of the other sample in the duplicate pair; if repstat=80, the master sample is the sample ID (year, crew, site#) of the sample that was split to create the blind/lab duplicate.
  5. the repstat of the master sample.

In addition to Lab_Sample_Identifier and the nine other common columns found in all spreadsheets, the NGR extension to the basic type of spreadsheet also has the following columns before the analytical data:

QAQC_Block_ID

NGR samples are collected according to a basic sampling design that is based on a block of twenty samples.  Quality assurance and quality control (QAQC) procedures are carried out on the data within and between blocks and this column is an integer number that identifies the block of 20 within which the sample was collected.

Within an ideal block of twenty samples sent to the laboratory for analysis are 17 routine samples collected in the field, a field duplicate of one of the routine samples, also collected in the field, a blind duplicate created in the preparation lab by splitting one of the field duplicates and a control reference or standard inserted into the block to monitor accuracy.

QAQC_Sample_Identifier

This column classifies the sample in a way that facilitates statistical analysis by unbalanced analysis of variance (UANOVA).  It is made up of two parts, separated by a colon:

integer value

This allows the records in each QA/QC block to be sorted into the correct sequence for UANOVA:

Sample Classification Integer Values
Quadruplicate set 1 2 3 4  
Triplicate set 11 12 13    
BD → CD; CD → FD quad set 21 22 23 24  
BD → CD triplicate set 31 32 33    
Blind duplicate pair 41 42      
CD → FD triplicate set 51 52 53    
CD → FD triplicate second set 56 57 58    
Field triplicate set 61 62 63    
Field duplicate pair #1 71 72      
Field duplicate pair #2 81 82      
Field duplicate pair #3 83 84      
Cell duplicate pair 91 92      
Routine sample 101 102 103 104
Control reference 901 902 903 904
alphanumeric code

This classifies each sample within a QA/QC block.

Alphanumeric Code Explanation
bffc1
  • A set of four samples
  • integer values: 1, 2, 3, 4
  • BD, FD1, FD2, CD
  • the ideal quad format - blind duplicate split from one of the field duplicates; cell duplicate links to field duplicate pair
bff_1
  • A set of three samples
  • integer values: 11, 12, 13
  • BD, FD1, FD2
  • the ideal triple format - blind duplicate split from one of the field duplicates
bcff1
  • A set of four samples
  • integer values: 21, 22, 23, 24
  • BD, CD, FD1, FD2
  • the blind duplicate links to the cell duplicate, which in turn links to the field duplicate pair
bcc_1
  • A set of three samples
  • integer values: 31, 32, 33
  • BD, CD, CD
  • the blind duplicate links to the cell duplicate, which links to a routine sample, not the field duplicate
bs__1
  • A set of two samples
  • integer values: 41, 42
  • BD, routine
  • a blind duplicate pair, not linked to field duplicate or cell duplicate
ffc_1
  • A set of three samples
  • integer values: 51, 52, 53
  • FD1, FD2, CD
  • the field duplicate links to the cell duplicate, but there are no links to the blind duplicate
ffc_2
  • A set of three samples
  • integer values: 56, 57, 57
  • FD1, FD2, CD
  • the block of 20 contains a second cell duplicate, which links to a second field duplicate pair.  Note that the first cell duplicate may be part of an ideal quad format (bffc1), or may be part of a triple set (ffc_1).
fff_1
  • A set of three samples
  • integer values: 61, 62, 63
  • FT1, FT2, FT3
  • a field triplicate set: three samples from the same site, not linked to either BD or cell duplicate
ff__1
  • A set of two samples
  • integer values: 71, 72
  • FD1, FD2
  • a field duplicate pair, not linked to either BD or cell duplicate
ff__2
  • A set of two samples
  • integer values: 81, 82
  • FD1(2), FD2(2)
  • a second field duplicate pair in the same QA/QC block, not linked to either BD or cell duplicate
ff__3
  • A set of two samples
  • integer values: 83, 84
  • FD1(3), FD2(3)
  • a third field duplicate pair in the same QA/QC block, not linked to either BD or cell duplicate
cc__1
  • A set of two samples
  • integer values: 91, 92
  • CD, routine
  • a cell duplicate pair, not linked to either BD or field duplicates
s__xx
  • A routine sample
  • xx will be an integer ranging between 01 and 20
R__xx
  • A control reference sample
  • xx will be an integer ranging between 01 and 20
Order_of_Analysis

This column presents the order the samples were analysed in the laboratory.  Note that depending on the view selected (by using the drop-down lists associated with a column) or how the samples are sorted, the sequence of numbers may not be continuous.

Following the column Order_of_Analysis are the columns of geochemical data.  Once again, the column names generally contain the element symbol (e.g. Pb) and the analytical technique (e.g. AAS).  The units for each measured quantity (element) can be found by clicking on the Lab_Key hyperlink in the MS-Excel® file and viewing the metadata page for the analytical sample bundle.

IM Probe Results Spreadsheet

In addition to the ten common columns found in all spreadsheets, the IM Probe Data extension to the basic type of spreadsheet also has the following columns before the probe chemistry data:

Preparation_Laboratory_Sample_Group
Grains that are subjected to miroprobe analyses are usually mounted in epoxy on 25 mm diameter lucite disks.  Several hundred grains may be mounted on a single disk.  This column allows the user to identify the disk holding a particular grain.
Mineral_Abbreviation
This column provides a classification for the grain based on the probe chemistry results.  The mineral names are abbreviated but are hyperlinked to the corresponding keyword metadata page for the mineral, which provides a definition of the mineral.  The column can also be filtered by clicking on the down-arrow beside the column name and selecting one or more minerals from the drop-down list.  This way, all the samples with chrome-diopside grains, for example, can be viewed together.

Following the column Mineral_Abbreviation are the columns of probe chemistry results.  The column names are the oxide symbol abbreviations.  To see the units for these measurements, click on the Lab_Key (second column in the spreadsheet) for the sample to go to its analtyical sample bundle metadata page where the units for each measured quantity (here, oxides) are listed.

Determination Limits and Detection Limits

The definitions for determination limit and detection limit used on this web site are those used by J.J. Lynch (pers. comm., 1990).  A determination limit represents the lowest value that is obtainable by a particular analytical method.  In some situations (low sample weight in INAA, for example), this lowest level may not be attainable for a particular sample, and the laboratory will report a "detection limit" which is greater than the "determination limit".  If an analytical measurement does not have variable detection limits, the distinction between detection and determination limits disappears.

In the table below, Element 1 has undetected values of <10, <20 and <30.  Therefore, 10, 20 and 30 are the variable detection limits.  The lowest of these is <10, which means that 10 is the determination limit.  For Element 2, the only undetected values are all reported as <5.  In this case, the detection limit is identical to the determination limit, which equals 5.

Detection Limits vs. Determination Limits
Sample # Element 1 Element 2
1 61 <5
2 <10 41
3 42 23
4 <20 11
5 <30 <5
6 55 <5
7 <20 17
8 19 22
9 52 <5
10 28 9

Analytical values in a dataset that are close to the determination limit are not necessarily reliable.  Calculating percentiles when the dataset has variable detection limits is not straightforward.