By An_Li

2013-03-29 11:37:04 8 Comments

I am trying to introduce the 5 degree dataset from here (it is the first link). And once I introduce it to GIS to use it as a raster input to calculate zonal statistics using as feature zone the world borders (

My first question is how to introduce the dat data into GIS. An second, once I introduce it can I use it as a regular raster file and do zonal statistics?

Can anyone help? I use ARCGIS 10.1



@whuber 2013-06-28 16:53:34

This is a custom, one-off file format, so don't expect the usual tools to read it correctly.

The documentation file describes the contents clearly and (for the most part) accurately. Logically, the file is organized into "gridboxes." Each grid box contains a table of information by year and month, along with summary information by year (precipitation in 0.1 mm) and summary information for the gridbox itself. Physically, each grid box is represented by a header line followed by one line for each year. The lines are in ASCII fixed format (produced by a Fortran program, evidently).

Although logically there are three tables here--gridboxes, years, and months--it might be simplest to "flatten" everything out as if all data had been joined, with one record per month. This could be added to any GIS as an "XY point event" layer, which can then be queried for any month and converted into raster format for further analysis (or kept as-is for statistical summaries).

The main task, then, is to do the reformatting. Python will perform this nicely, but I still prefer AWK for its quick development cycle and clarity, so here's my quick-and-dirty (yet tested) AWK solution. (The GNU AWK, GAWK, runs on Windows and other systems. It's freely available.) You can see that it just picks out all the fields within the two types of physical lines--"header" and "data"--and spits them back out, flattened (that is, showing all current values of all fields in every record) and tab-delimited. It preforms minimal error checking, presuming that the input has been correctly formatted and is uncorrupted. It is executed from a shell or command line like this:

awk -f [name of file.awk] [name of input file.dat] > [name of output file]

For instance, on my system I named this AWK file y.awk, extracted the zipped precipitation data file into F:/temp/, and my command was

awk -f y.awk f:\temp\g55wld0098.dat > y.txt

After symbolizing the XY event theme in ArcMap 10 by country name, selecting the data for December 1990 ("Year" = 1990 AND "Month" = 12), and converting that to shapefile format (for efficiency), it looked like this, all ready for analysis or conversion to raster format.


(Note that all values are in tenths of millimeters.)

Because trying to display or process all the data (around 10^5 records) brings my copy of ArcGIS 10 to its knees, it might slow down your GIS too. One solution is to filter the data you want during the AWK conversion so that your GIS has fewer records to deal with. Just modify the test for missing data to skip anything else you don't want in the output. (This is an order of magnitude faster than filtering the data in ArcGIS itself.) You could also modify this program to output the data directly in an ASCII raster format (as multiple files, one per month per year), but that would take a little more skill and might be unreliable, because there is no guarantee the input is physically ordered by gridbox.)

#    Global precipitation data (
#    Header (I7,I5,I6,I5,A15,I4,A14,2I4,I7,I9):
#    388-6250 -4250    4ANTARCTICA        1  85% 85%  0% 19001998    388        1
#    Data (I4,12I5,I6):
#1900  -10  -10  -10  -10  -10  -10  -10  -10  -10  -10  -10  -10   -10
#   ("-10" is the null value.  It will be output unchanged.)
#    NB: the final data record may have some invisible non-numeric characters and
#        so is left unconverted; it is in 0.1 mm.
    nrecs = 0     # Number of records output
    maxerr = 20   # max error messages
    errcount = 0
    OFS = "\t"    # Output field separator
    # Print a header line of field names.
    print "Gridbox", "Lat", "Lon", "Altitude", "Country", "N", "Start", "End", 
        "Diag1", "Diag2", "Precip", "Year", "Month", "Value"
length($0)!=70 && substr($0, 1, 7) == substr($0, 65, 7) { # Possible header
    gridbox = substr($0, 1, 7) + 0
    lat = substr($0, 8, 5)/100.0
    lon = substr($0, 13, 6)/100.0
    alt = substr($0, 19, 5) + 0
    country = "\"" substr($0, 24, 15) "\""
    n = substr($0, 39, 4) + 0
    diagnostic = substr($0, 43, 14)
    start = substr($0, 57, 4) + 0
    end = substr($0, 61, 4) + 0
    gridbox2 = substr($0, 65, 7) + 0
    diagnostic2 = substr($0, 72, 9)
    printf("\rGrid box %-8d", nrecs) > "/dev/stderr"
length($0)!=70 && errcount <= maxerr {
    print "Unable to interpret record " NR >> "/dev/stderr"
    errcount ++ 
{   # Data record
    year = substr($0, 1, 4)
    precip = substr($0, 65) # (There are problems treating this value as numeric...)
    # Print a data line for each month.
    for (month = 1; month  <= 12; month ++) {
        value = substr($0, 5*month , 5) + 0 # In 0.1 mm
        if (value != -10) { # Skip missing values
            print gridbox, lat, lon, alt, country, n, start, end, 
            diagnostic, diagnostic2, precip, year, month, value
    print "\r" nrecs " gridbox records output." > "/dev/stderr"
    if (errcount > 0) print errcount " errors encountered." > "/dev/stderr"

@nickves 2013-06-28 18:56:12

(There are problems treating this value as numeric...): what problems?

@whuber 2013-06-28 18:58:58

@nickves My copy of GAWK converted about half of the precip values to nulls when I tried to divide them by 10. There was no evident pattern to the failures. I didn't want to spend time hunting down the cause of the problem, which I guess might come down to some disagreement about how the initial whitespace is interpreted. I worked around the problem by avoiding all conversion and just passing those values as strings to the output. ArcGIS appears to have read them without any trouble.

@Ibe 2013-05-29 01:00:13

This is gridded binary data in lat/lon. So you need a header file such as:

ncols xxx
nrows xxx
xllcorner xxx
yllcorner xxx
cellsize xxx
nodata_value -999999
byteorder xxx

Then you can use float to raster conversion tool in ArcGIS under conversion toolbox. This will give you a regular raster GRID file which you can use for zonal statistics analysis.

@whuber 2013-06-28 16:57:31

According to the documentation, and by inspection, you are incorrect on every count: it is neither gridded, nor binary, nor in lat/lon (although it does have some numeric codes that can be converted into lat and lon). Tacking on that header will produce only garbage.

@Erica 2013-03-29 23:26:05

Looking at the documentation, this .dat file seems to be a table rather than a raster. You may have a much easier time creating a raster (or points) that cover the appropriate area, and then joining the data to it, and then exporting a raster with the data you want.

@whuber 2013-06-28 16:58:52

Although it looks like a table, it's not quite: it is organized by months within year. Think of it as a set of records, one per grid cell, each of which is stored in a long series of broken lines representing monthly precipitation during most of the 20th century.

@mete7 2013-03-29 12:00:11

I dont know how you can perform this analysis with ArcGIS but you can easily handle it with QGIS. It supports lot of data formats and can convert between different data format. And it has a tool for zonal statistics. You should try it. But if you want to perform this analysis in ArcGIS environment firstly you convert your data set to well known raster data format like TIFF so ArcGIS can read it.

@Erica 2013-03-29 14:25:18

Do you have specific information about importing a .dat file into QGIS? It is not opening on my machine.

@mete7 2013-03-29 15:21:46

there is alot of .dat formats. I succesfully imported Surfer .dat files into QGIS as raster...

@An_Li 2013-03-29 17:44:00

I am afraid that I can hardly handle Arcgis let alone QGIS. Is there some way to convert the file before importing it?

@mete7 2013-03-29 18:07:30

you can use Gdal_Translate tool...

@mkennedy 2013-03-29 23:53:18

@mete7, it's not a standard raster file, so GDAL isn't going to work. It's text and multidimensional but not in HDF nor netCDF.

Related Questions

Sponsored Content

0 Answered Questions

1 Answered Questions

[SOLVED] Monthly weather averages using NCEP/NCAR Reanalysis data

Sponsored Content