Database field Sediment codes Soil codes ---------------------------------------------- SAMPLE 2.x or 2.xx 3.x SAMPMDC 4 LTYPC M SAMPTYP 11,12,13,14,15 58,59 37,50,55,60,61, 63,64,70,71,72, 73,96,97,99After surveying each file (through a series of Paradox queries), a new query was constructed that extracted all records for stream sediments (wet and dry), lake and pond sediments (including dry lakes), spring sediments, and soils. 1.2 Field selection Data fields were chosen from the selected records for further processing. These included several label fields, the sample- type fields listed in Table 2, the geographic coordinates, fields for the 54 chemical elements appropriate for solid samples (Ag, Al, As, Au, B, Ba, Be, Bi, Ca, Cd, Ce, Co, Cr, Cs, Cu, Dy, Eu, Fe, Hf, Ho, K, La, Li, Lu, Mg, Mn, Mo, Na, Nb, Nd, Ni, P, Pb, Pt, Rb, Ru, Sb, Sc, Se, Si, Sm, Sn, Sr, Ta, Tb, Th, Ti, U, V, W, Y, Yb, Zn, Zr), and 5 miscellaneous fields that contain chemical data (CONCN01 through CONCN05). A Paradox query extracted these fields, and all other data were discarded (including things like stream characteristics, contamination codes, various labels, and fields not used for solid sample media). 1.3 Data scaling Most chemical data in the quadrangle DBF files are stored in parts-per-billion (ppb). Paradox was used to convert each field into a more appropriate unit: parts-per-million (ppm) for trace elements, and weight percent for major elements (Al, Ca, Fe, K, Mg, and Na). 1.4 Record consolidation Many samples were analyzed by more than one laboratory, or by more than one method. In these cases, there are multiple records in the quadrangle DBF files for an individual sample location, each with analyses for different elements. These records were found and combined into a single record. Paradox was used to sort the records by latitude and longitude. A temporary DBF file was generated, and read by a DOS FORTRAN program, ECLEAN, written by the author (unpublished). This program searched for consecutive records that had identical or nearly identical geographic coordinates (within 0.0005 degrees, or ~50 m, of each other). These were assumed to be the same sample, as round-off errors sometimes affected the 4th decimal place. ECLEAN then combined these records, element by element, into a single new record. In the few cases where data for the same element was present in two or more records, the highest value was arbitrarily chosen. This process also had the effect of consolidating samples actually collected as duplicates at a single location into single records. ECLEAN also eliminated records with no chemical data (and there were many of these). The program then created a new DBF file with the consolidated data. Secondary data processing At the beginning of this processing stage, the 308 original quadrangle DBF files have been reduced to 308 new DBF files containing only the geographic and chemical-element fields of the sediment and soil data, without any duplicate or blank records. Major systematic problems, as discussed above, have been corrected. The following processing steps were used to find and correct additional problems in the datasets, to search for regional inconsistencies in the data, and to establish the usefulness of data reported as upper limits ( for example <10 ppm). 2.1 Data surveying The reduced DBF files were surveyed with a DOS FORTRAN program, also written by the author, called GRIDPLOT. This program reads in multiple DBF files, and produces a simple, color, gridded map of the data for one element on the computer screen. Systematic errors that were not found during primary data processing could be seen visually, as discontinuities in the colored map. In some cases, these could be traced to systematic errors in the quadrangle DBF files, especially errors in the position of decimal points. These were corrected by repeating the primary processing for the affected quadrangle. Other discontinuities are caused by analytical errors, and were handled through the data leveling procedure described next. 2.2 Data leveling In some areas, generally in the western U.S., one or more quadrangles, or parts of quadrangles, would appear to be discontinuous with adjacent quadrangles for a given element, when viewed with GRIDPLOT. In many such instances, a good case can be made that there is a systematic analytical error (that is, an accuracy problem, probably due to different analytical methods or interlaboratory calibration problems) across the discontinuity. The best argument for the occurrence of this type of error is that regional chemical trends are seen on both sides of the discontinuity, and the application of a simple correction factor can make the data appear continuous. In these cases, a correction factor is supplied to GRIDPLOT for the affected areas, and the factor is adjusted until the gridded map appears smooth and continuous. In other cases, either no correction factor can correct the discontinuity, or regional trends are absent in certain quadrangles and the data appear to be random. Such data were discarded and not used to produce these images. 2.3 Data below detection limits A negative concentration of an element in the quadrangle DBF files indicates that the value is an upper limit (for example -10 means <10). These values present a special problem in creating map coverages of geochemical data. The philosophy adopted here is simple: steps were taken to ensure that all such upper limits fall within the lowest interval in the final map legend, and thus are known to be correctly categorized. First, two histograms were prepared for each element, one showing the concentration range of unqualified data, the other showing only upper limits. For most elements, the vast majority of the data fell in the first histogram, and markers were inserted into this plot showing the values of every 5th percentile (for reference). The second histogram was displayed below the first and compared visually. The strategy was to select a cutoff value below which upper limits are to be retained, such that they do not affect the accuracy of the map. Above this cutoff value, upper limits are deleted from the final dataset. The graphical result of deletions of this type are small holes in the map where grid cells could not be assigned real values. 2.4 Data extraction Once the data were leveled, upper limit cutoffs were established, and areas of bad data were identified, the GRIDPLOT program was run again to extract values for a single element from all 308 processed quadrangle DBF files. For the special case of uranium, GRIDPLOT was programmed to make choices about which data field to use for the final value. Uranium is typically stored in one of five fields in the original quadrangle DBF files: one labeled as CONU, the others as CONCN01, CONCN02, CONCN05, and CONUDN. The CONC05 field was given priority over the CONU field if both were filled, and data in the CONCN01 and CONCN02 fields were used in the absence of data in the first two fields. The CONUDN field (U by delayed neutron) was only coded in few percent of the samples ( in only 9 quadrangles), but these data were not used here. The output from this data processing step is a series of elemental DBF files of useable NURE data. Major errors corrected Several major errors in the NURE HSSR data were identified and corrected during the above data-processing steps. These errors are present in the original DBF files and composite database of Hoffman and Buttleman (1994; 1996). The errors will be corrected in the a new database (Smith, 1998), but as of this time only a small part of the United States is covered by this. 3.1 Miscoded samples The data survey conducted for each quadrangle DBF file in step 1.1 uncovered a block of stream-sediment samples miscoded as stream water in seven quadrangles in the northeastern U.S. (Boston, Glen Falls, Lake Champlain, Lewiston, Newark, Scranton, and Williamsport). These records were altered to give them the correct coding prior to any data processing. 3.2 Data in incorrect units In about 30,000 samples collected and analyzed by Oak Ridge Gaseous Diffusion Plant (ORGDP) and tabulated in the quadrangle DBF files, major elements (Al, Ca, Fe, K, Mg, and Na) plus As and Se were all tabulated incorrectly, in units other than ppb. Over 70 quadrangles contain data affected by this problem. These records can be identified from the lack of coding in the SAMPTYP field, and a value of 4 coded in the SAMPMDC field. These problems were corrected as a group. About 15,000 records found in several dozen quadrangles in the western U.S. (samples analyzed but not collected by ORGDP) also contain major element data in ppm instead of ppb, although trace elements are all coded correctly. Most of these are coded as soils (SAMPTYP=59), talus (SAMPTYP=62), or uncoded in this field (SAMPTYP=blank), and all have a value of M coded in the LTYPC field, which stands for sediment. These were also corrected by special handling. Data products Themes with names of the form Grid: Cu are elemental concentration maps, produced from a gridded version of the point data. These bitmap files (TIFF) are based on grids made with the MINC program of Webring (1981), which employs a minimum curvature interpolation of the point data to create a smooth surface. The grid-cells used were 2 km on each side.