NCO netCDF Operators / Discussion / Open Discussion: Several problems with IPCC CMIP6 NetCDF files

palle - 2025-10-30

Hi All,
I am trying to perform several operations with .nc files associated to the latest IPCC report AR6 and that are used in the IPCC Atlas. The files are here https://digital.csic.es/handle/10261/332744 and the problem is the same on all the files.

What I am trying to do is for example selecting only March over a period of consecutive year, and averaging over these years and over the ensemble models. Another operation is averaging over consecutive years and ensemble.

If I use the command ncwa -O -a time,member [any of those files] I get a segmentation fault and after a while I found out that I needed to remove the string variables with
ncks -C -x -v gcm_institution,gcm_model,gcm_variant,member_id,crs anyfile.nc clean.nc
after which I can use ncwa

If I try to select only the month of March and average through 5 years
ncks -O -d time,"2028-01-01","2032-12-31" anyfile.nc test-20282032.nc
Extract the five March slices
ncks -O -d time,2,2 -d time,14,14 -d time,26,26 -d time,38,38 -d time,50,50 test-20282032.nc test-march.nc
Average over those 5 March months
ncra -O -y avg test-march.nc test-march_mean.nc

I get ncra: ERROR no variables fit criteria for processing, but I can see with ncdump that there's a float variable.

If I try a different way I manage to average each month through the years but when I do the ensemble mean I get a segmentation fault
ncks -O -d time,"2028-01-01","2032-12-31" anyfile.nc test-20282032.nc
ncecat -O -u time -d time,8,8 -d time,20,20 -d time,32,32 -d time,44,44 -d time,56,56 test-20282032.nc septembers_cat.nc
ncra -O septembers_cat.nc september_time_mean.nc
ncwa -O -a member september_time_mean.nc s_ensmean.nc

If I remove the string variables I can proceed with the ensemble and time mean in one sweep.
ncks -C -x -v gcm_institution,gcm_model,gcm_variant,member_id,crs september_time_mean.nc s-clean.nc
ncwa -O -a member,time s-clean.nc s_ensmean.nc

After all these operations I still end up with a file that has extremely large values, something along the way went wrong. They are not missing values nor fill values.

Is this a normal workflow for those files? It seems one needs to go through trial and error until finding the right succession of operations. For example if I try to remove the string variables right away I get
ncks -C -x -v gcm_institution,gcm_model,gcm_variant,member_id,crs siconc_CMIP6_ssp245_mon_201501-210012.nc baco.nc
ERROR: nco_get_vara() failed to nc_get_vara() variable "siconc"
ERROR NC_EHDFERR Error at HDF5 layer

But this specific error doesn't happen when I use files in CORDEX dataset.

CDO is of no use at all.

Last edit: palle 2025-10-30

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Charlie Zender - 2025-10-30

Hello palle,
Thanks for your report. I can reproduce some though not all of the issues you encounter. For example, this reports no errors for me:
ncwa -O -a time,member ~/ph_CMIP6_historical_mon_185001-201412.nc ~/foo.nc
What version of NCO are you using? Please upgrade to the latest possible. I will refrain from saying more about these files until you confirm the NCO version.
Charlie

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- palle - 2025-10-30
  
  Hi Charlie,
  Thanks for looking into this. I have version 5.2.1
  
  sudo apt install nco
  Reading package lists... Done
  Building dependency tree... Done
  Reading state information... Done
  nco is already the newest version (5.2.1-1build2).
  0 upgraded, 0 newly installed, 0 to remove and 207 not upgraded.
  $ ncks --version
  NCO netCDF Operators version 5.2.1 "Shabu Shabu" built by buildd on lcy02-amd64-080 at Apr 1 2024 07:01:37
  ncks version 5.2.1
  
  I am on WSL on Windows 10. Does this mean I have to build my own installation? Why sudo apt install nco is not getting a more recent version?
  Thanks in advance
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Charlie Zender - 2025-10-30

You can either run on a different machine with a newer version, figure out how to install a newer version on your current machine, or build from source. Your sysadmin may know how to update the current version with sudo apt ... but I do not. Until/unless you upgrade, you may not be able to get this stuff working. I admit that I thought 5.2.1 would be new enough to include all the relevant fixes for NC_STRING but I may be wrong about that.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

palle - 2025-10-30

Hi again, I managed to install 5.3.6. I tried to skip removing the string variables but I still get a segmentation fault.

Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7ff80903b39f)
==== backtrace (tid: 8619) ====
0 /home/a/miniconda3/envs/ncoenv/bin/../lib/./././libucs.so.0(ucs_handle_error+0x2fd) [0x7f91fe33e84d]
1 /home/a/miniconda3/envs/ncoenv/bin/../lib/./././libucs.so.0(+0x2fa3f) [0x7f91fe33ea3f]
2 /home/a/miniconda3/envs/ncoenv/bin/../lib/./././libucs.so.0(+0x2fc0a) [0x7f91fe33ec0a]
3 /lib/x86_64-linux-gnu/libc.so.6(+0x45330) [0x7f9204255330]
4 /lib/x86_64-linux-gnu/libc.so.6(+0x19b8dc) [0x7f92043ab8dc]
5 /home/a/miniconda3/envs/ncoenv/bin/../lib/././libhdf5.so.310(+0x2f922d) [0x7f920079422d]
6 /home/a/miniconda3/envs/ncoenv/bin/../lib/././libhdf5.so.310(H5T__conv_vlen+0x5ec) [0x7f9200785c4c]
7 /home/a/miniconda3/envs/ncoenv/bin/../lib/././libhdf5.so.310(H5T_convert_with_ctx+0x81) [0x7f9200701481]
8 /home/a/miniconda3/envs/ncoenv/bin/../lib/././libhdf5.so.310(H5T_convert+0x199) [0x7f9200701699]
9 /home/a/miniconda3/envs/ncoenv/bin/../lib/././libhdf5.so.310(H5D__scatgath_write+0x231) [0x7f920057f151]
10 /home/a/miniconda3/envs/ncoenv/bin/../lib/././libhdf5.so.310(H5D__contig_write+0x2c) [0x7f92005650fc]
11 /home/a/miniconda3/envs/ncoenv/bin/../lib/././libhdf5.so.310(H5D__write+0xe0c) [0x7f920057ac2c]
12 /home/a/miniconda3/envs/ncoenv/bin/../lib/././libhdf5.so.310(H5VL__native_dataset_write+0xb2) [0x7f92007aeb52]
13 /home/a/miniconda3/envs/ncoenv/bin/../lib/././libhdf5.so.310(+0x2fed44) [0x7f9200799d44]
14 /home/a/miniconda3/envs/ncoenv/bin/../lib/././libhdf5.so.310(H5VL_dataset_write+0x9d) [0x7f920079d8cd]
15 /home/a/miniconda3/envs/ncoenv/bin/../lib/././libhdf5.so.310(+0xae150) [0x7f9200549150]
16 /home/a/miniconda3/envs/ncoenv/bin/../lib/././libhdf5.so.310(H5Dwrite+0x9c) [0x7f920054c65c]
17 /home/a/miniconda3/envs/ncoenv/bin/../lib/./libnetcdf.so.22(NC4_put_vars+0x511) [0x7f92040bfc11]
18 /home/a/miniconda3/envs/ncoenv/bin/../lib/./libnetcdf.so.22(NC4_put_vara+0x12) [0x7f92040c0382]
19 /home/a/miniconda3/envs/ncoenv/bin/../lib/./libnetcdf.so.22(nc_put_var1_string+0x68) [0x7f9204083048]
20 /home/a/miniconda3/envs/ncoenv/bin/../lib/libnco-5.3.6.so(nco_put_var1+0x114) [0x7f92045055b4]
21 ncwa(+0x8854) [0x7f9204649854]
22 /home/a/miniconda3/envs/ncoenv/bin/../lib/libgomp.so.1(GOMP_parallel+0x43) [0x7f920444dc58]
23 ncwa(+0x6ee0) [0x7f9204647ee0]
24 /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca) [0x7f920423a1ca]
25 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x7f920423a28b]
26 ncwa(+0x80a7) [0x7f92046490a7]
=================================
Segmentation fault (core dumped)

What's the problem exactly? Is it the formatting of these files or is it on the NCO side, or both? I have contacted the people who created these files and they told me they followed NetCDF protocols. I might have to work in R at a very low level by extracting the raw data. But NCO usually is so handy that I only bring in the data in R for plotting and minor stuff.

By the way, these files (after any of the calculations) don't work in panoply either, where I get "checksum invalid". I never seen this in Panoply. It makes me wonder if anyone is actually using these files or is it just me having all these issues...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Charlie Zender - 2025-10-30

I don't have the answers to all these questions. The presence of NC_STRING is difficult for many applications to handle, though it's working fine for me on MacOS with NCO 5.3.6. There is, however, one crucial problem with the metadata. Namely, the geophysical field contains both a missing_value and a _FillValueattribute and the values of these attributes are distinct. It is silly to have a distinct missing_value for model data since models can predict values in all gridcells. I do not know what the dataset producers did this. This causes all NCO arithmetic operations on these datasets to fail because NCO does not treat missing_value as anything special, and this leads to numerical overflows. The workaround for this, and what appears to be done in all other CMIP datasets, is to set the missing_value equal to the fill value. You can do this with either of these commands:

ncatted -O -a _FillValue,,m,f,1.0384593717e+34 ~/ph_CMIP6_historical_mon_185001-201412.nc ~/ph_fll_val.nc ncap2 -O -s 'where(ph == ph@missing_value) ph=ph@_FillValue' ~/ph_CMIP6_historical_mon_185001-201412.nc ~/ph_mss_val.nc

The second method is preferred for a few reasons. Once this is done to the input files the averagers work (for me). For context please read http://nco.sf.net/nco.html#missing_value. One last thing, the datasets are in HDF5/netCDF4 format. Try running ncwawith the '-t 1' option to turn-off OpenMP. That might help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Charlie Zender - 2025-10-31

I read the rest of your original post (OP). The large values are due to the missing_value issue, for which I provided a workaround. I'm hopeful that upgrading the NCO will eliminate the segfaults. The last remaining issue is about "no variables fit criteria for processing". This is an NCO issue that I will work on in the near future. Not sure why it hasn't shown up before...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Charlie Zender - 2025-10-31

I spoke too soon. The "no variables fit criteria for processing" error is correct. The initial dataset has time as a fixed dimension not a record dimension so ncra will not work. To average over the fixed dimension time use ncwa -a time in.nc out.nc. Or change time to a record dimension first as described in the manual. I take back the statement that NCO was at fault for that. The error message clearly describes the problem.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

palle - 7 days ago

@zender thanks for detailed feedback. I will run more tests next week and get back with you on the results. My guess in relation to fill and miss value is that you could have an ocean or terrestrial model that doesn't simulate anything on land or viceversa. Maybe it's for the ocean/land mask. Just a guess.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Several problems with IPCC CMIP6 NetCDF files

Command-line operators for netCDF and HDF files

Forums

Help

Several problems with IPCC CMIP6 NetCDF files

Several problems with IPCC CMIP6 NetCDF files

Command-line operators for netCDF and HDF files

Forums

Help

Several problems with IPCC CMIP6 NetCDF files document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Several problems with IPCC CMIP6 NetCDF files