Unable to use CDO to read large nc files

I have using CDO to read metadata of nc files on gadi. Unfortunately, while I was reading relatively large nc files (i.e. 100-300MB) with CDO, I ran into this segmentation error as attached in the screenshot. I’m sure I still have plenty of storage space on my scratch and desktop directories, as I could still run Jupyter lab sessions. What else should should I check to resolve this error?

Hi Marvin,

If you look at the line just above the Caught signal 11, you’ll see that cdo is looking for geolon_t var in the netcdf file. Do you know if it’s in there? If it’s missing, it might be why cdo segfaulted.

Thanks,
Ed

@xcmarvin , it might be worth checking the netcdf file with another tool like ncdump. You just need to do:

module load netcdf
ncdump -h <filename>

Pro tip:
Avoid sharing screenshots when asking for help. They are difficult to read and we can’t copy/paste from it.
It’s easier for us to see and use what you share if you can copy-paste inside a code block. In raw format, it looks like:

```
code
```
Or simply tabulate each line with 4 spaces.

Hi Edward and Claire,

When I used ncdump to open the same nc file, the geolon_t and geolat_t variables are there. This is a sample output I extracted for one variable because the file has plenty of variables:

float diff_cbt_t(time, st_ocean, yt_ocean, xt_ocean) ;
diff_cbt_t:long_name = "total vert diff_cbt(temp) (w/o neutral included)" ;
diff_cbt_t:units = "m^2/s" ;
diff_cbt_t:valid_range = -10.f, 1000000.f ;
diff_cbt_t:missing_value = -1.e+20f ;
diff_cbt_t:_FillValue = -1.e+20f ;
diff_cbt_t:cell_methods = "time: mean" ;
diff_cbt_t:time_avg_info = "average_T1,average_T2,average_DT" ;
diff_cbt_t:coordinates = "geolon_t geolat_t" ;

I tried multiple big nc files and all them, coincidentally, showed the same segmentation problem with CDO. I have no problem when I opened smaller nc files with CDO.

To test out if it is the file problem, I downloaded a sample nc file that is large over 600MB (/g/data/lg87/wgh581/cz861/history/ocn/ocean-ym_0520_07.nc) from gadi into my local machine. My local PC can successfully extract the metadata of the sample file with CDO, though CDO says geolon_t and geolat_t variables aren’t found, which isn’t a problem.

cdi  warning (read_coordinates_vars): NetCDF: Variable not found - >geolon_t<
cdi  warning (read_coordinates_vars): NetCDF: Variable not found - >geolat_t<
cdi  warning (find_time_vars): Found more than one time variable, skipped variable average_T1!
cdi  warning (find_time_vars): Found more than one time variable, skipped variable average_T2!
   File format : NetCDF4 classic zip
    -1 : Institut Source   T Steptype Levels Num    Points Num Dtype : Parameter ID
     1 : unknown  unknown  v instant      50   1   1555200   1  F32z : -1     
     2 : unknown  unknown  v instant       1   2         1   2  F64z : -2     
   Grid coordinates :
     1 : lonlat                   : points=1555200 (1440x1080)
                         xt_ocean : -279.875 to 79.875 by 0.25 [degrees_E]  circular
                         yt_ocean : -81.077 to 89.94728 [degrees_N]
     2 : generic                  : points=1
   Vertical coordinates :
     1 : generic                  : levels=50
                         st_ocean : 1.15175 to 5253.663 [meters]
     2 : surface                  : levels=1
   Time coordinate :
                             time : 6 steps
     RefTime =  0001-01-01 00:00:00  Units = days  Calendar = proleptic_gregorian  Bounds = true
  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
  0015-07-16 12:00:00  0015-08-16 12:00:00  0015-09-16 00:00:00  0015-10-16 12:00:00
  0015-11-16 00:00:00  0015-12-16 12:00:00
cdo    sinfo: Processed 2 variables over 6 timesteps [0.06s 40MB]

Hi @xcmarvin,

Just in case this helps, I successfully ran `cdo sinfo` on a 435MB file on Gadi using xp65 module loads.

Is it possible that the file you were originally looking at on Gadi somehow got corrupted. Did you create it? or did you download it from somewhere?

If you downloaded it, can you look at the checksum to ensure the full file made it across?

Otherwise, I ran the command from a compute_size = tiny ARE session. Could you do the same or package your command into PBS call so you can control the Gadi environment? As in, are you running on a Gadi login node?

Hi Chermelle,

I was accessing gadi using the terminal from my local machine, so I’m not sure if this is the problem. Here is an example of big nc file I was reading with CDO but was unsuccessful(/g/data/lg87/wgh581/cz861/history/ocn/ocean-ym_0520_07.nc). Do I have to create an ARE session and specify bigger number of CPU nodes and run CDO from ARE session, should I want to read large nc files with CDO?

I ran from a tiny ARE session. I just mentioned it because I know you are not supposed to do large compute from a Gadi login node. That said, I just ran cdo sinfo on the 435 Mb file from a Gadi login node to test if that is the issue but the command succeeded so using the Gadi login node is likely not the reason.

I can’t run cdo sinfo on your example file because I am not in the lg87 project.

I would still be a bit suspicious of the actual file. Sorry I can’t be of more help.

That said, yes, you could create an ARE session (or qsub a PBS job) with a request for more CPUs and memory and see how you go.

No worries, thanks! I don’t think the file has any issue per se, because when I downloaded it onto my local machine and ran CDO on it, it could generate the metadata output successfully. The only reason I can think of is the large file size as the cause, but I’m not sure how to circumvent this issue.

When I run into problems like that I usually get around them by increasing my compute environment. i.e. In ARE I would choose a large compute_size.

If you are not familiar with ARE, the other option is to create a PBS job submission script and run the command from with the script (including all the module loads, etc). Sometimes a PBS job submission script is preferable because you can use it again.

Good luck. Please reach out to the community if you are stuck. Someone else may have come across the same problem.

It looks to be a CDO version issue.

Reading the file you pointed at works with:

  • default Gadi cdo (from module load cdo). That’s version 2.4.3
  • cdo in conda/analysis3-26.01. That’s version 2.4.1

It does not work with either conda/analysis3-26.02 or conda/analysis3-26.03 that ship cdo version 2.5.0.

I have no idea if it’s a problem in cdo or if it is cdo that changed their requirements for a successful read.