xarray-ms#
xarray-ms presents a Measurement Set v4 view (MSv4) over CASA Measurement Sets (MSv2). It provides access to MSv2 data via the xarray API, allowing MSv4 compliant applications to be developed on well-understood MSv2 data.
In [1]: import xarray_ms
In [2]: import xarray
In [3]: import xarray.testing
In [4]: from xarray_ms.testing.simulator import simulate
# Simulate a Measurement Set with 2 channel and polarisation configurations
In [5]: ms = simulate("test.ms", data_description=[
...: (8, ("XX", "XY", "YX", "YY")),
...: (4, ("RR", "LL"))])
...:
In [6]: ms
Out[6]: '/tmp/tmpiy4czerm/test.ms'
In [7]: dt = xarray.open_datatree(ms)
In [8]: dt
Out[8]:
<xarray.DataTree>
Group: /
├── Group: /test_partition_000
│ │ Dimensions: (time: 5, baseline_id: 3, frequency: 8,
│ │ polarization: 4, uvw_label: 3)
│ │ Coordinates:
│ │ baseline_antenna1_name (baseline_id) object 24B ...
│ │ baseline_antenna2_name (baseline_id) object 24B ...
│ │ * baseline_id (baseline_id) int64 24B 0 1 2
│ │ field_name (time) object 40B ...
│ │ * frequency (frequency) float64 64B 8.56e+08 ... 1.712e+09
│ │ * polarization (polarization) object 32B 'XX' 'XY' 'YX' 'YY'
│ │ scan_number (time) int32 20B ...
│ │ sub_scan_number (time) int32 20B ...
│ │ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ │ * uvw_label (uvw_label) object 24B 'u' 'v' 'w'
│ │ Data variables:
│ │ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 120B ...
│ │ FLAG (time, baseline_id, frequency, polarization) uint8 480B ...
│ │ TIME_CENTROID (time, baseline_id) float64 120B ...
│ │ UVW (time, baseline_id, uvw_label) float64 360B ...
│ │ VISIBILITY (time, baseline_id, frequency, polarization) complex64 4kB ...
│ │ WEIGHT (time, baseline_id, frequency, polarization) float32 2kB ...
│ │ Attributes:
│ │ creation_date: 2025-04-04T11:45:59.952416+00:00
│ │ creator: {'software_name': 'xarray-ms', 'version': '0.2.8'}
│ │ observation_info: {'observer': 'observed', 'project': 'project'}
│ │ processor_info: {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│ │ schema_version: 4.0.0
│ │ type: visibility
│ └── Group: /test_partition_000/antenna_xds
│ Dimensions: (antenna_name: 3, cartesian_pos_label: 3,
│ receptor_label: 2, telescope_name: 3)
│ Coordinates:
│ * antenna_name (antenna_name) object 24B 'ANTENNA-0' .....
│ mount (antenna_name) object 24B ...
│ * telescope_name (telescope_name) <U9 108B 'telescope' .....
│ station (antenna_name) object 24B ...
│ * cartesian_pos_label (cartesian_pos_label) object 24B 'x' ......
│ polarization_type (antenna_name, receptor_label) object 48B ...
│ * receptor_label (receptor_label) object 16B 'pol_0' 'pol_1'
│ Data variables:
│ ANTENNA_POSITION (antenna_name, cartesian_pos_label) float64 72B ...
│ ANTENNA_DISH_DIAMETER (antenna_name) float64 24B ...
│ ANTENNA_EFFECTIVE_DISH_DIAMETER (antenna_name) float64 24B ...
│ ANTENNA_RECEPTOR_ANGLE (antenna_name, receptor_label) float64 48B ...
│ Attributes:
│ type: antenna
│ overall_telescope_name: telescope
│ relocatable_antennas: False
└── Group: /test_partition_001
│ Dimensions: (time: 5, baseline_id: 3, frequency: 4,
│ polarization: 2, uvw_label: 3)
│ Coordinates:
│ baseline_antenna1_name (baseline_id) object 24B ...
│ baseline_antenna2_name (baseline_id) object 24B ...
│ * baseline_id (baseline_id) int64 24B 0 1 2
│ field_name (time) object 40B ...
│ * frequency (frequency) float64 32B 8.56e+08 ... 1.712e+09
│ * polarization (polarization) object 16B 'RR' 'LL'
│ scan_number (time) int32 20B ...
│ sub_scan_number (time) int32 20B ...
│ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ * uvw_label (uvw_label) object 24B 'u' 'v' 'w'
│ Data variables:
│ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 120B ...
│ FLAG (time, baseline_id, frequency, polarization) uint8 120B ...
│ TIME_CENTROID (time, baseline_id) float64 120B ...
│ UVW (time, baseline_id, uvw_label) float64 360B ...
│ VISIBILITY (time, baseline_id, frequency, polarization) complex64 960B ...
│ WEIGHT (time, baseline_id, frequency, polarization) float32 480B ...
│ Attributes:
│ creation_date: 2025-04-04T11:46:00.246659+00:00
│ creator: {'software_name': 'xarray-ms', 'version': '0.2.8'}
│ observation_info: {'observer': 'observed', 'project': 'project'}
│ processor_info: {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│ schema_version: 4.0.0
│ type: visibility
└── Group: /test_partition_001/antenna_xds
Dimensions: (antenna_name: 3, cartesian_pos_label: 3,
receptor_label: 2, telescope_name: 3)
Coordinates:
* antenna_name (antenna_name) object 24B 'ANTENNA-0' .....
mount (antenna_name) object 24B ...
* telescope_name (telescope_name) <U9 108B 'telescope' .....
station (antenna_name) object 24B ...
* cartesian_pos_label (cartesian_pos_label) object 24B 'x' ......
polarization_type (antenna_name, receptor_label) object 48B ...
* receptor_label (receptor_label) object 16B 'pol_0' 'pol_1'
Data variables:
ANTENNA_POSITION (antenna_name, cartesian_pos_label) float64 72B ...
ANTENNA_DISH_DIAMETER (antenna_name) float64 24B ...
ANTENNA_EFFECTIVE_DISH_DIAMETER (antenna_name) float64 24B ...
ANTENNA_RECEPTOR_ANGLE (antenna_name, receptor_label) float64 48B ...
Attributes:
type: antenna
overall_telescope_name: telescope
relocatable_antennas: False
Measurement Set v4#
NRAO/SKAO are developing a new xarray-based Measurement Set v4 specification. While there are many changes some of the major highlights are:
xarray is used to define the specification.
MSv4 data consists of Datasets of ndarrays on a regular time-channel grid. MSv2 data is tabular and, while in many instances the time-channel grid is regular, this is not guaranteed, especially after MSv2 datasets have been transformed by various tasks.
xarray Datasets are self-describing and they are therefore easier to reason about and work with. Additionally, the regularity of data will make writing MSv4-based software less complex.
xradio#
casangi/xradio provides a reference implementation that converts CASA v2 Measurement Sets to Zarr v4 Measurement Sets using the python-casacore package.
Why xarray-ms?#
By developing against an MSv4 xarray view over MSv2 data, developers can develop applications on well-understood data, and then seamlessly transition to newer formats. Data can also be exported to newer formats (principally zarr) via xarray’s native I/O routines. However, the xarray view of either format looks the same to the software developer.
xarray-ms builds on xarray’s backend API: Implementing a formal CASA MSv2 backend has a number of benefits:
xarray’s internal I/O routines such as
open_datasetandopen_datatreecan dispatch to the backend to load data.Similarly xarray’s lazy loading mechanism dispatches through the backend.
Automatic access to any chunked array types supported by xarray including, but not limited to dask.
Arbitrary chunking along any xarray dimension.
xarray-ms uses arcae, a high-performance backend to CASA Tables implementing a subset of python-casacore’s interface.
Some limited support for irregular MSv2 data via padding.
Work in Progress#
The Measurement Set v4 specification is currently under active development. xarray-ms is also currently under active development and does not yet have feature parity with MSv4 or xradio. Most measures information and many secondary sub-tables are currently missing.
However, the most important parts of the MSv2 MAIN tables,
as well as the ANTENNA, POLARIZATON and SPECTRAL_WINDOW
sub-tables are implemented and should be sufficient
for basic algorithm development.