Commit 288c088d authored by Jonathan  Minz's avatar Jonathan Minz
Browse files

Edit CF-Compliance-Checker.md

parent 6f7cbc88
Loading
Loading
Loading
Loading
+32 −19
Original line number Diff line number Diff line
@@ -6,12 +6,14 @@ The `check_cf_compliance.py` script automates the validation of **NetCDF files**

It uses the [CF-Checker](https://github.com/cedadev/cf-checker) utility (`cfchecks`) and is designed for workflows where NetCDF datasets are organized in subdirectories. The script:

- Finds the **first NetCDF file** in each subdirectory of a given parent directory.  
- By default, finds the **first NetCDF file** in each subdirectory of a given parent directory.  
- With the `--all` flag, processes **all NetCDF files** in each subdirectory.  
- Runs **CF-Checker** on those files.  
- Saves results into two separate logs:
  - `cf_compliance_details.log` → detailed CF-Checker output (per file)  
  - `cf_compliance_summary.log` → summary table of results (across all files)  
- Optionally exports the summary to **CSV** or **Excel**, with timestamped filenames for reproducibility.  
- The summary table includes a **Folder** column to show the subdirectory of each file.  

---

@@ -19,13 +21,16 @@ It uses the [CF-Checker](https://github.com/cedadev/cf-checker) utility (`cfchec

```mermaid
flowchart TD
    A[Parent Directory] --> B[Find first NetCDF file in each subdir]
    B --> C[Run cfchecks on each file]
    C --> D[Parse CF-Checker summary (Errors, Warnings, Info)]
    D --> E[cf_compliance_details.log<br>(per-file details)]
    D --> F[cf_compliance_summary.log<br>(summary table)]
    F --> G[Optional CSV Export<br>(timestamped)]
    F --> H[Optional Excel Export<br>(timestamped)]
    A[Parent Directory] --> B[Find NetCDF files in each subdir]
    B -->|Default| C[First file only]
    B -->|--all flag| C2[All files]
    C --> D[Run cfchecks on each file]
    C2 --> D
    D --> E[Parse CF-Checker summary (Errors, Warnings, Info)]
    E --> F[cf_compliance_details.log<br>(per-file details)]
    E --> G[cf_compliance_summary.log<br>(summary table, incl. Folder column)]
    G --> H[Optional CSV Export<br>(timestamped)]
    G --> I[Optional Excel Export<br>(timestamped)]
```

---
@@ -65,22 +70,28 @@ pip install cfchecker
## Usage

```bash
python check_cf_compliance.py <parent_directory> [CF_version] [--csv] [--excel]
python check_cf_compliance.py <parent_directory> [CF_version] [--all] [--csv] [--excel]
```

### Arguments
- `<parent_directory>` → Path to the parent folder containing subdirectories with NetCDF files.  
- `[CF_version]` → (Optional) CF version to check against, e.g. `1.8`.  
- `--all` → Check **all NetCDF files** in each subdirectory (default: only first file per subdir).  
- `--csv` → Export summary results to a timestamped CSV file.  
- `--excel` → Export summary results to a timestamped Excel file.  

### Examples

Check all subdirectories and just log results:
Check only the first file in each subdirectory (default):
```bash
python check_cf_compliance.py /data/netcdf_files
```

Check all files in each subdirectory:
```bash
python check_cf_compliance.py /data/netcdf_files --all
```

Check against CF-1.8:
```bash
python check_cf_compliance.py /data/netcdf_files 1.8
@@ -98,7 +109,7 @@ python check_cf_compliance.py /data/netcdf_files --excel

Export both CSV and Excel:
```bash
python check_cf_compliance.py /data/netcdf_files 1.8 --csv --excel
python check_cf_compliance.py /data/netcdf_files 1.8 --all --csv --excel
```

---
@@ -107,7 +118,7 @@ python check_cf_compliance.py /data/netcdf_files 1.8 --csv --excel

1. **Log files**
   - `cf_compliance_details.log` → detailed `cfchecks` output for each file.  
   - `cf_compliance_summary.log` → summary table with counts of Errors, Warnings, Info.  
   - `cf_compliance_summary.log` → summary table with counts of Errors, Warnings, Info, including **Folder** column.  

2. **Optional exports**
   - `cf_compliance_summary_YYYYMMDD_HHMMSS.csv` → machine-readable CSV summary.  
@@ -116,22 +127,24 @@ python check_cf_compliance.py /data/netcdf_files 1.8 --csv --excel
Example summary table in logs:

```
| File                 | Errors | Warnings | Info | Result |
|----------------------|--------|----------|------|--------|
| huragl10S1_200001.nc |      0 |        3 |    0 | PASS   |
| windagl80S1_199901.nc|      1 |        5 |    0 | FAIL   |
| Folder   | File                 | Errors | Warnings | Info | Result |
|----------|----------------------|--------|----------|------|--------|
| exp001   | huragl10S1_200001.nc |      0 |        3 |    0 | PASS   |
| exp001   | windagl80S1_199901.nc|      1 |        5 |    0 | FAIL   |
```

---

## How It Works

1. **File selection**: For each subdirectory in the parent folder, the script takes the **alphabetically first NetCDF file** (`*.nc`).  
1. **File selection**:  
   - Default: takes the **alphabetically first NetCDF file** (`*.nc`) in each subdir.  
   - With `--all`: processes **all NetCDF files** in each subdir.  
2. **Run CF-Checker**: Calls the `cfchecks` utility for each selected file.  
3. **Parse results**: Extracts the `ERRORS detected`, `WARNINGS given`, and `INFORMATION messages` lines from the CF-Checker output.  
4. **Logging**:
   - Writes all raw CF-Checker output into `cf_compliance_details.log`.  
   - Builds a concise summary table and writes it into `cf_compliance_summary.log`.  
   - Builds a concise summary table (including **Folder** column) and writes it into `cf_compliance_summary.log`.  
5. **Export** (optional): Saves the summary table into CSV/Excel with timestamped filenames for reproducibility.  

---
@@ -140,5 +153,5 @@ Example summary table in logs:

Future improvements could include:
- Adding `--details-only` or `--summary-only` flags to control which logs are written.  
- Processing **all files** in each subdirectory, not just the first.  
- Supporting parallel execution for large datasets.  
- Adding more granular filtering (e.g., only certain file patterns).