xarray-ms#

xarray-ms presents a Measurement Set v4 view (MSv4) over CASA Measurement Sets (MSv2). It provides access to MSv2 data via the xarray API, allowing MSv4 compliant applications to be developed on well-understood MSv2 data.

In [1]: import xarray_ms

In [2]: import xarray

In [3]: import xarray.testing

In [4]: from xarray_ms.testing.simulator import simulate

# Simulate a Measurement Set with 2 channel and polarisation configurations
In [5]: ms = simulate("test.ms", data_description=[
   ...:   (8, ("XX", "XY", "YX", "YY")),
   ...:   (4, ("RR", "LL"))])
   ...: 

In [6]: ms
Out[6]: '/tmp/tmpiy4czerm/test.ms'

In [7]: dt = xarray.open_datatree(ms)

In [8]: dt
Out[8]: 
<xarray.DataTree>
Group: /
├── Group: /test_partition_000
│   │   Dimensions:                     (time: 5, baseline_id: 3, frequency: 8,
│   │                                    polarization: 4, uvw_label: 3)
│   │   Coordinates:
│   │       baseline_antenna1_name      (baseline_id) object 24B ...
│   │       baseline_antenna2_name      (baseline_id) object 24B ...
│   │     * baseline_id                 (baseline_id) int64 24B 0 1 2
│   │       field_name                  (time) object 40B ...
│   │     * frequency                   (frequency) float64 64B 8.56e+08 ... 1.712e+09
│   │     * polarization                (polarization) object 32B 'XX' 'XY' 'YX' 'YY'
│   │       scan_number                 (time) int32 20B ...
│   │       sub_scan_number             (time) int32 20B ...
│   │     * time                        (time) float64 40B 2.09e+11 ... 2.09e+11
│   │     * uvw_label                   (uvw_label) object 24B 'u' 'v' 'w'
│   │   Data variables:
│   │       EFFECTIVE_INTEGRATION_TIME  (time, baseline_id) float64 120B ...
│   │       FLAG                        (time, baseline_id, frequency, polarization) uint8 480B ...
│   │       TIME_CENTROID               (time, baseline_id) float64 120B ...
│   │       UVW                         (time, baseline_id, uvw_label) float64 360B ...
│   │       VISIBILITY                  (time, baseline_id, frequency, polarization) complex64 4kB ...
│   │       WEIGHT                      (time, baseline_id, frequency, polarization) float32 2kB ...
│   │   Attributes:
│   │       creation_date:     2025-04-04T11:45:59.952416+00:00
│   │       creator:           {'software_name': 'xarray-ms', 'version': '0.2.8'}
│   │       observation_info:  {'observer': 'observed', 'project': 'project'}
│   │       processor_info:    {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│   │       schema_version:    4.0.0
│   │       type:              visibility
│   └── Group: /test_partition_000/antenna_xds
│           Dimensions:                          (antenna_name: 3, cartesian_pos_label: 3,
│                                                 receptor_label: 2, telescope_name: 3)
│           Coordinates:
│             * antenna_name                     (antenna_name) object 24B 'ANTENNA-0' .....
│               mount                            (antenna_name) object 24B ...
│             * telescope_name                   (telescope_name) <U9 108B 'telescope' .....
│               station                          (antenna_name) object 24B ...
│             * cartesian_pos_label              (cartesian_pos_label) object 24B 'x' ......
│               polarization_type                (antenna_name, receptor_label) object 48B ...
│             * receptor_label                   (receptor_label) object 16B 'pol_0' 'pol_1'
│           Data variables:
│               ANTENNA_POSITION                 (antenna_name, cartesian_pos_label) float64 72B ...
│               ANTENNA_DISH_DIAMETER            (antenna_name) float64 24B ...
│               ANTENNA_EFFECTIVE_DISH_DIAMETER  (antenna_name) float64 24B ...
│               ANTENNA_RECEPTOR_ANGLE           (antenna_name, receptor_label) float64 48B ...
│           Attributes:
│               type:                    antenna
│               overall_telescope_name:  telescope
│               relocatable_antennas:    False
└── Group: /test_partition_001
    │   Dimensions:                     (time: 5, baseline_id: 3, frequency: 4,
    │                                    polarization: 2, uvw_label: 3)
    │   Coordinates:
    │       baseline_antenna1_name      (baseline_id) object 24B ...
    │       baseline_antenna2_name      (baseline_id) object 24B ...
    │     * baseline_id                 (baseline_id) int64 24B 0 1 2
    │       field_name                  (time) object 40B ...
    │     * frequency                   (frequency) float64 32B 8.56e+08 ... 1.712e+09
    │     * polarization                (polarization) object 16B 'RR' 'LL'
    │       scan_number                 (time) int32 20B ...
    │       sub_scan_number             (time) int32 20B ...
    │     * time                        (time) float64 40B 2.09e+11 ... 2.09e+11
    │     * uvw_label                   (uvw_label) object 24B 'u' 'v' 'w'
    │   Data variables:
    │       EFFECTIVE_INTEGRATION_TIME  (time, baseline_id) float64 120B ...
    │       FLAG                        (time, baseline_id, frequency, polarization) uint8 120B ...
    │       TIME_CENTROID               (time, baseline_id) float64 120B ...
    │       UVW                         (time, baseline_id, uvw_label) float64 360B ...
    │       VISIBILITY                  (time, baseline_id, frequency, polarization) complex64 960B ...
    │       WEIGHT                      (time, baseline_id, frequency, polarization) float32 480B ...
    │   Attributes:
    │       creation_date:     2025-04-04T11:46:00.246659+00:00
    │       creator:           {'software_name': 'xarray-ms', 'version': '0.2.8'}
    │       observation_info:  {'observer': 'observed', 'project': 'project'}
    │       processor_info:    {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
    │       schema_version:    4.0.0
    │       type:              visibility
    └── Group: /test_partition_001/antenna_xds
            Dimensions:                          (antenna_name: 3, cartesian_pos_label: 3,
                                                  receptor_label: 2, telescope_name: 3)
            Coordinates:
              * antenna_name                     (antenna_name) object 24B 'ANTENNA-0' .....
                mount                            (antenna_name) object 24B ...
              * telescope_name                   (telescope_name) <U9 108B 'telescope' .....
                station                          (antenna_name) object 24B ...
              * cartesian_pos_label              (cartesian_pos_label) object 24B 'x' ......
                polarization_type                (antenna_name, receptor_label) object 48B ...
              * receptor_label                   (receptor_label) object 16B 'pol_0' 'pol_1'
            Data variables:
                ANTENNA_POSITION                 (antenna_name, cartesian_pos_label) float64 72B ...
                ANTENNA_DISH_DIAMETER            (antenna_name) float64 24B ...
                ANTENNA_EFFECTIVE_DISH_DIAMETER  (antenna_name) float64 24B ...
                ANTENNA_RECEPTOR_ANGLE           (antenna_name, receptor_label) float64 48B ...
            Attributes:
                type:                    antenna
                overall_telescope_name:  telescope
                relocatable_antennas:    False

Measurement Set v4#

NRAO/SKAO are developing a new xarray-based Measurement Set v4 specification. While there are many changes some of the major highlights are:

  • xarray is used to define the specification.

  • MSv4 data consists of Datasets of ndarrays on a regular time-channel grid. MSv2 data is tabular and, while in many instances the time-channel grid is regular, this is not guaranteed, especially after MSv2 datasets have been transformed by various tasks.

xarray Datasets are self-describing and they are therefore easier to reason about and work with. Additionally, the regularity of data will make writing MSv4-based software less complex.

xradio#

casangi/xradio provides a reference implementation that converts CASA v2 Measurement Sets to Zarr v4 Measurement Sets using the python-casacore package.

Why xarray-ms?#

  • By developing against an MSv4 xarray view over MSv2 data, developers can develop applications on well-understood data, and then seamlessly transition to newer formats. Data can also be exported to newer formats (principally zarr) via xarray’s native I/O routines. However, the xarray view of either format looks the same to the software developer.

  • xarray-ms builds on xarray’s backend API: Implementing a formal CASA MSv2 backend has a number of benefits:

    • xarray’s internal I/O routines such as open_dataset and open_datatree can dispatch to the backend to load data.

    • Similarly xarray’s lazy loading mechanism dispatches through the backend.

    • Automatic access to any chunked array types supported by xarray including, but not limited to dask.

    • Arbitrary chunking along any xarray dimension.

  • xarray-ms uses arcae, a high-performance backend to CASA Tables implementing a subset of python-casacore’s interface.

  • Some limited support for irregular MSv2 data via padding.

Work in Progress#

The Measurement Set v4 specification is currently under active development. xarray-ms is also currently under active development and does not yet have feature parity with MSv4 or xradio. Most measures information and many secondary sub-tables are currently missing.

However, the most important parts of the MSv2 MAIN tables, as well as the ANTENNA, POLARIZATON and SPECTRAL_WINDOW sub-tables are implemented and should be sufficient for basic algorithm development.