Tutorial#
The Measurement Set v2.0 is a tabular format that
includes notions of regularity or, the shape of the data, in the MAIN table.
This is accomplished through the DATA_DESC_ID column which defines the
Spectral Window and Polarisation Configuration associated with each row:
the shape of the visibility in each row of the DATA column can
vary per-row.
By contrast Measurement Set v4.0 specifies a collection of Datasets of ndarrays on a regular grid. To move data between the two formats, it is necessary to partition or group MSv2 rows by the same shape and configuration.
In xarray-ms, this is accomplished by specifying partition_schema
when opening a Measurement Set.
Different columns may be used to define the partition.
See Partioning Schema for more information.
Opening a Measurement Set#
As xarray-ms implements an xarray backend,
it is possible to use the xarray.backends.api.open_datatree() function
to open multiple partitions of a Measurement Set.
In [1]: import xarray_ms
In [2]: import xarray
In [3]: import xarray.testing
In [4]: from xarray_ms.testing.simulator import simulate
# Simulate a Measurement Set with 2 channel and polarisation configurations
In [5]: ms = simulate("test.ms", data_description=[
...: (8, ("XX", "XY", "YX", "YY")),
...: (4, ("RR", "LL"))])
...:
In [6]: dt = xarray.open_datatree(ms, partition_schema=["FIELD_ID"])
In [7]: dt
Out[7]:
<xarray.DataTree>
Group: /
├── Group: /test_partition_000
│ │ Dimensions: (time: 5, baseline_id: 3, frequency: 8,
│ │ polarization: 4, uvw_label: 3)
│ │ Coordinates:
│ │ baseline_antenna1_name (baseline_id) object 24B ...
│ │ baseline_antenna2_name (baseline_id) object 24B ...
│ │ * baseline_id (baseline_id) int64 24B 0 1 2
│ │ field_name (time) object 40B ...
│ │ * frequency (frequency) float64 64B 8.56e+08 ... 1.712e+09
│ │ * polarization (polarization) object 32B 'XX' 'XY' 'YX' 'YY'
│ │ scan_number (time) int32 20B ...
│ │ sub_scan_number (time) int32 20B ...
│ │ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ │ * uvw_label (uvw_label) object 24B 'u' 'v' 'w'
│ │ Data variables:
│ │ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 120B ...
│ │ FLAG (time, baseline_id, frequency, polarization) uint8 480B ...
│ │ TIME_CENTROID (time, baseline_id) float64 120B ...
│ │ UVW (time, baseline_id, uvw_label) float64 360B ...
│ │ VISIBILITY (time, baseline_id, frequency, polarization) complex64 4kB ...
│ │ WEIGHT (time, baseline_id, frequency, polarization) float32 2kB ...
│ │ Attributes:
│ │ creation_date: 2025-04-04T11:46:01.690589+00:00
│ │ creator: {'software_name': 'xarray-ms', 'version': '0.2.8'}
│ │ observation_info: {'observer': 'observed', 'project': 'project'}
│ │ processor_info: {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│ │ schema_version: 4.0.0
│ │ type: visibility
│ └── Group: /test_partition_000/antenna_xds
│ Dimensions: (antenna_name: 3, cartesian_pos_label: 3,
│ receptor_label: 2, telescope_name: 3)
│ Coordinates:
│ * antenna_name (antenna_name) object 24B 'ANTENNA-0' .....
│ mount (antenna_name) object 24B ...
│ * telescope_name (telescope_name) <U9 108B 'telescope' .....
│ station (antenna_name) object 24B ...
│ * cartesian_pos_label (cartesian_pos_label) object 24B 'x' ......
│ polarization_type (antenna_name, receptor_label) object 48B ...
│ * receptor_label (receptor_label) object 16B 'pol_0' 'pol_1'
│ Data variables:
│ ANTENNA_POSITION (antenna_name, cartesian_pos_label) float64 72B ...
│ ANTENNA_DISH_DIAMETER (antenna_name) float64 24B ...
│ ANTENNA_EFFECTIVE_DISH_DIAMETER (antenna_name) float64 24B ...
│ ANTENNA_RECEPTOR_ANGLE (antenna_name, receptor_label) float64 48B ...
│ Attributes:
│ type: antenna
│ overall_telescope_name: telescope
│ relocatable_antennas: False
└── Group: /test_partition_001
│ Dimensions: (time: 5, baseline_id: 3, frequency: 4,
│ polarization: 2, uvw_label: 3)
│ Coordinates:
│ baseline_antenna1_name (baseline_id) object 24B ...
│ baseline_antenna2_name (baseline_id) object 24B ...
│ * baseline_id (baseline_id) int64 24B 0 1 2
│ field_name (time) object 40B ...
│ * frequency (frequency) float64 32B 8.56e+08 ... 1.712e+09
│ * polarization (polarization) object 16B 'RR' 'LL'
│ scan_number (time) int32 20B ...
│ sub_scan_number (time) int32 20B ...
│ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ * uvw_label (uvw_label) object 24B 'u' 'v' 'w'
│ Data variables:
│ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 120B ...
│ FLAG (time, baseline_id, frequency, polarization) uint8 120B ...
│ TIME_CENTROID (time, baseline_id) float64 120B ...
│ UVW (time, baseline_id, uvw_label) float64 360B ...
│ VISIBILITY (time, baseline_id, frequency, polarization) complex64 960B ...
│ WEIGHT (time, baseline_id, frequency, polarization) float32 480B ...
│ Attributes:
│ creation_date: 2025-04-04T11:46:01.703942+00:00
│ creator: {'software_name': 'xarray-ms', 'version': '0.2.8'}
│ observation_info: {'observer': 'observed', 'project': 'project'}
│ processor_info: {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│ schema_version: 4.0.0
│ type: visibility
└── Group: /test_partition_001/antenna_xds
Dimensions: (antenna_name: 3, cartesian_pos_label: 3,
receptor_label: 2, telescope_name: 3)
Coordinates:
* antenna_name (antenna_name) object 24B 'ANTENNA-0' .....
mount (antenna_name) object 24B ...
* telescope_name (telescope_name) <U9 108B 'telescope' .....
station (antenna_name) object 24B ...
* cartesian_pos_label (cartesian_pos_label) object 24B 'x' ......
polarization_type (antenna_name, receptor_label) object 48B ...
* receptor_label (receptor_label) object 16B 'pol_0' 'pol_1'
Data variables:
ANTENNA_POSITION (antenna_name, cartesian_pos_label) float64 72B ...
ANTENNA_DISH_DIAMETER (antenna_name) float64 24B ...
ANTENNA_EFFECTIVE_DISH_DIAMETER (antenna_name) float64 24B ...
ANTENNA_RECEPTOR_ANGLE (antenna_name, receptor_label) float64 48B ...
Attributes:
type: antenna
overall_telescope_name: telescope
relocatable_antennas: False
Warning
The MSv4 spec is still under development and the arrangement and naming of the DataTree branches is likely to change.
Selecting a subset of the data#
By default, open_datatree() will return a datatree
with a lazy view over the data.
xarray has extensive functionality for
indexing and selecting data.
For example, one could select select some specific dimensions out:
In [8]: dt = xarray.open_datatree(ms, partition_schema=["FIELD_ID"])
In [9]: subdt = dt.isel(time=slice(1, 3), baseline_id=[0, 2], frequency=slice(2, 4))
In [10]: subdt
Out[10]:
<xarray.DataTree>
Group: /
├── Group: /test_partition_000
│ │ Dimensions: (time: 2, baseline_id: 2, frequency: 2,
│ │ polarization: 4, uvw_label: 3)
│ │ Coordinates:
│ │ baseline_antenna1_name (baseline_id) object 16B ...
│ │ baseline_antenna2_name (baseline_id) object 16B ...
│ │ * baseline_id (baseline_id) int64 16B 0 2
│ │ field_name (time) object 16B ...
│ │ * frequency (frequency) float64 16B 1.101e+09 1.223e+09
│ │ * polarization (polarization) object 32B 'XX' 'XY' 'YX' 'YY'
│ │ scan_number (time) int32 8B ...
│ │ sub_scan_number (time) int32 8B ...
│ │ * time (time) float64 16B 2.09e+11 2.09e+11
│ │ * uvw_label (uvw_label) object 24B 'u' 'v' 'w'
│ │ Data variables:
│ │ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 32B ...
│ │ FLAG (time, baseline_id, frequency, polarization) uint8 32B ...
│ │ TIME_CENTROID (time, baseline_id) float64 32B ...
│ │ UVW (time, baseline_id, uvw_label) float64 96B ...
│ │ VISIBILITY (time, baseline_id, frequency, polarization) complex64 256B ...
│ │ WEIGHT (time, baseline_id, frequency, polarization) float32 128B ...
│ │ Attributes:
│ │ creation_date: 2025-04-04T11:46:01.783521+00:00
│ │ creator: {'software_name': 'xarray-ms', 'version': '0.2.8'}
│ │ observation_info: {'observer': 'observed', 'project': 'project'}
│ │ processor_info: {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│ │ schema_version: 4.0.0
│ │ type: visibility
│ └── Group: /test_partition_000/antenna_xds
│ Dimensions: (antenna_name: 3, cartesian_pos_label: 3,
│ receptor_label: 2, telescope_name: 3)
│ Coordinates:
│ * antenna_name (antenna_name) object 24B 'ANTENNA-0' .....
│ mount (antenna_name) object 24B ...
│ * telescope_name (telescope_name) <U9 108B 'telescope' .....
│ station (antenna_name) object 24B ...
│ * cartesian_pos_label (cartesian_pos_label) object 24B 'x' ......
│ polarization_type (antenna_name, receptor_label) object 48B ...
│ * receptor_label (receptor_label) object 16B 'pol_0' 'pol_1'
│ Data variables:
│ ANTENNA_POSITION (antenna_name, cartesian_pos_label) float64 72B ...
│ ANTENNA_DISH_DIAMETER (antenna_name) float64 24B ...
│ ANTENNA_EFFECTIVE_DISH_DIAMETER (antenna_name) float64 24B ...
│ ANTENNA_RECEPTOR_ANGLE (antenna_name, receptor_label) float64 48B ...
│ Attributes:
│ type: antenna
│ overall_telescope_name: telescope
│ relocatable_antennas: False
└── Group: /test_partition_001
│ Dimensions: (time: 2, baseline_id: 2, frequency: 2,
│ polarization: 2, uvw_label: 3)
│ Coordinates:
│ baseline_antenna1_name (baseline_id) object 16B ...
│ baseline_antenna2_name (baseline_id) object 16B ...
│ * baseline_id (baseline_id) int64 16B 0 2
│ field_name (time) object 16B ...
│ * frequency (frequency) float64 16B 1.427e+09 1.712e+09
│ * polarization (polarization) object 16B 'RR' 'LL'
│ scan_number (time) int32 8B ...
│ sub_scan_number (time) int32 8B ...
│ * time (time) float64 16B 2.09e+11 2.09e+11
│ * uvw_label (uvw_label) object 24B 'u' 'v' 'w'
│ Data variables:
│ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 32B ...
│ FLAG (time, baseline_id, frequency, polarization) uint8 16B ...
│ TIME_CENTROID (time, baseline_id) float64 32B ...
│ UVW (time, baseline_id, uvw_label) float64 96B ...
│ VISIBILITY (time, baseline_id, frequency, polarization) complex64 128B ...
│ WEIGHT (time, baseline_id, frequency, polarization) float32 64B ...
│ Attributes:
│ creation_date: 2025-04-04T11:46:01.793465+00:00
│ creator: {'software_name': 'xarray-ms', 'version': '0.2.8'}
│ observation_info: {'observer': 'observed', 'project': 'project'}
│ processor_info: {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│ schema_version: 4.0.0
│ type: visibility
└── Group: /test_partition_001/antenna_xds
Dimensions: (antenna_name: 3, cartesian_pos_label: 3,
receptor_label: 2, telescope_name: 3)
Coordinates:
* antenna_name (antenna_name) object 24B 'ANTENNA-0' .....
mount (antenna_name) object 24B ...
* telescope_name (telescope_name) <U9 108B 'telescope' .....
station (antenna_name) object 24B ...
* cartesian_pos_label (cartesian_pos_label) object 24B 'x' ......
polarization_type (antenna_name, receptor_label) object 48B ...
* receptor_label (receptor_label) object 16B 'pol_0' 'pol_1'
Data variables:
ANTENNA_POSITION (antenna_name, cartesian_pos_label) float64 72B ...
ANTENNA_DISH_DIAMETER (antenna_name) float64 24B ...
ANTENNA_EFFECTIVE_DISH_DIAMETER (antenna_name) float64 24B ...
ANTENNA_RECEPTOR_ANGLE (antenna_name, receptor_label) float64 48B ...
Attributes:
type: antenna
overall_telescope_name: telescope
relocatable_antennas: False
At this point, the subdt DataTree is still lazy – no Data variables have been loaded
into memory.
Loading a DataTree#
By calling load on the lazy datatree, all the Data Variables are loaded onto the dataset as numpy arrays.
In [11]: subdt.load()
Out[11]:
<xarray.DataTree>
Group: /
├── Group: /test_partition_000
│ │ Dimensions: (time: 2, baseline_id: 2, frequency: 2,
│ │ polarization: 4, uvw_label: 3)
│ │ Coordinates:
│ │ baseline_antenna1_name (baseline_id) object 16B 'ANTENNA-0' 'ANTENNA-1'
│ │ baseline_antenna2_name (baseline_id) object 16B 'ANTENNA-1' 'ANTENNA-2'
│ │ * baseline_id (baseline_id) int64 16B 0 2
│ │ field_name (time) object 16B 'FIELD-0' 'FIELD-0'
│ │ * frequency (frequency) float64 16B 1.101e+09 1.223e+09
│ │ * polarization (polarization) object 32B 'XX' 'XY' 'YX' 'YY'
│ │ scan_number (time) int32 8B 0 0
│ │ sub_scan_number (time) int32 8B 0 0
│ │ * time (time) float64 16B 2.09e+11 2.09e+11
│ │ * uvw_label (uvw_label) object 24B 'u' 'v' 'w'
│ │ Data variables:
│ │ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 32B 0.0 0.0 0.0 0.0
│ │ FLAG (time, baseline_id, frequency, polarization) uint8 32B ...
│ │ TIME_CENTROID (time, baseline_id) float64 32B -3.507e+09 .....
│ │ UVW (time, baseline_id, uvw_label) float64 96B 21...
│ │ VISIBILITY (time, baseline_id, frequency, polarization) complex64 256B ...
│ │ WEIGHT (time, baseline_id, frequency, polarization) float32 128B ...
│ │ Attributes:
│ │ creation_date: 2025-04-04T11:46:01.783521+00:00
│ │ creator: {'software_name': 'xarray-ms', 'version': '0.2.8'}
│ │ observation_info: {'observer': 'observed', 'project': 'project'}
│ │ processor_info: {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│ │ schema_version: 4.0.0
│ │ type: visibility
│ └── Group: /test_partition_000/antenna_xds
│ Dimensions: (antenna_name: 3, cartesian_pos_label: 3,
│ receptor_label: 2, telescope_name: 3)
│ Coordinates:
│ * antenna_name (antenna_name) object 24B 'ANTENNA-0' .....
│ mount (antenna_name) object 24B 'ALT-AZ' ... '...
│ * telescope_name (telescope_name) <U9 108B 'telescope' .....
│ station (antenna_name) object 24B 'STATION-0' .....
│ * cartesian_pos_label (cartesian_pos_label) object 24B 'x' ......
│ polarization_type (antenna_name, receptor_label) object 48B ...
│ * receptor_label (receptor_label) object 16B 'pol_0' 'pol_1'
│ Data variables:
│ ANTENNA_POSITION (antenna_name, cartesian_pos_label) float64 72B ...
│ ANTENNA_DISH_DIAMETER (antenna_name) float64 24B 13.5 13.5 13.5
│ ANTENNA_EFFECTIVE_DISH_DIAMETER (antenna_name) float64 24B 13.5 13.5 13.5
│ ANTENNA_RECEPTOR_ANGLE (antenna_name, receptor_label) float64 48B ...
│ Attributes:
│ type: antenna
│ overall_telescope_name: telescope
│ relocatable_antennas: False
└── Group: /test_partition_001
│ Dimensions: (time: 2, baseline_id: 2, frequency: 2,
│ polarization: 2, uvw_label: 3)
│ Coordinates:
│ baseline_antenna1_name (baseline_id) object 16B 'ANTENNA-0' 'ANTENNA-1'
│ baseline_antenna2_name (baseline_id) object 16B 'ANTENNA-1' 'ANTENNA-2'
│ * baseline_id (baseline_id) int64 16B 0 2
│ field_name (time) object 16B 'FIELD-0' 'FIELD-0'
│ * frequency (frequency) float64 16B 1.427e+09 1.712e+09
│ * polarization (polarization) object 16B 'RR' 'LL'
│ scan_number (time) int32 8B 0 0
│ sub_scan_number (time) int32 8B 0 0
│ * time (time) float64 16B 2.09e+11 2.09e+11
│ * uvw_label (uvw_label) object 24B 'u' 'v' 'w'
│ Data variables:
│ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 32B 0.0 0.0 0.0 0.0
│ FLAG (time, baseline_id, frequency, polarization) uint8 16B ...
│ TIME_CENTROID (time, baseline_id) float64 32B -3.507e+09 .....
│ UVW (time, baseline_id, uvw_label) float64 96B 21...
│ VISIBILITY (time, baseline_id, frequency, polarization) complex64 128B ...
│ WEIGHT (time, baseline_id, frequency, polarization) float32 64B ...
│ Attributes:
│ creation_date: 2025-04-04T11:46:01.793465+00:00
│ creator: {'software_name': 'xarray-ms', 'version': '0.2.8'}
│ observation_info: {'observer': 'observed', 'project': 'project'}
│ processor_info: {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│ schema_version: 4.0.0
│ type: visibility
└── Group: /test_partition_001/antenna_xds
Dimensions: (antenna_name: 3, cartesian_pos_label: 3,
receptor_label: 2, telescope_name: 3)
Coordinates:
* antenna_name (antenna_name) object 24B 'ANTENNA-0' .....
mount (antenna_name) object 24B 'ALT-AZ' ... '...
* telescope_name (telescope_name) <U9 108B 'telescope' .....
station (antenna_name) object 24B 'STATION-0' .....
* cartesian_pos_label (cartesian_pos_label) object 24B 'x' ......
polarization_type (antenna_name, receptor_label) object 48B ...
* receptor_label (receptor_label) object 16B 'pol_0' 'pol_1'
Data variables:
ANTENNA_POSITION (antenna_name, cartesian_pos_label) float64 72B ...
ANTENNA_DISH_DIAMETER (antenna_name) float64 24B 13.5 13.5 13.5
ANTENNA_EFFECTIVE_DISH_DIAMETER (antenna_name) float64 24B 13.5 13.5 13.5
ANTENNA_RECEPTOR_ANGLE (antenna_name, receptor_label) float64 48B ...
Attributes:
type: antenna
overall_telescope_name: telescope
relocatable_antennas: False
Opening a Measurement Set with dask#
Generally speaking, observational data will be too large to fit in memory. Either portions of the dataset must be selected and loaded, or it must be processed in chunks.
Data processing using a chunked storage engine such as dask
can be enabled by specifying the chunks parameter:
In [12]: dt = xarray.open_datatree(ms, partition_schema=["FIELD_ID"],
....: chunks={"time": 2, "frequency": 2}, auto_corrs=True)
....:
In [13]: dt
Out[13]:
<xarray.DataTree>
Group: /
├── Group: /test_partition_000
│ │ Dimensions: (time: 5, baseline_id: 6, frequency: 8,
│ │ polarization: 4, uvw_label: 3)
│ │ Coordinates:
│ │ baseline_antenna1_name (baseline_id) object 48B dask.array<chunksize=(6,), meta=np.ndarray>
│ │ baseline_antenna2_name (baseline_id) object 48B dask.array<chunksize=(6,), meta=np.ndarray>
│ │ * baseline_id (baseline_id) int64 48B 0 1 2 3 4 5
│ │ field_name (time) object 40B dask.array<chunksize=(2,), meta=np.ndarray>
│ │ * frequency (frequency) float64 64B 8.56e+08 ... 1.712e+09
│ │ * polarization (polarization) object 32B 'XX' 'XY' 'YX' 'YY'
│ │ scan_number (time) int32 20B dask.array<chunksize=(2,), meta=np.ndarray>
│ │ sub_scan_number (time) int32 20B dask.array<chunksize=(2,), meta=np.ndarray>
│ │ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ │ * uvw_label (uvw_label) object 24B 'u' 'v' 'w'
│ │ Data variables:
│ │ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 240B dask.array<chunksize=(2, 6), meta=np.ndarray>
│ │ FLAG (time, baseline_id, frequency, polarization) uint8 960B dask.array<chunksize=(2, 6, 2, 4), meta=np.ndarray>
│ │ TIME_CENTROID (time, baseline_id) float64 240B dask.array<chunksize=(2, 6), meta=np.ndarray>
│ │ UVW (time, baseline_id, uvw_label) float64 720B dask.array<chunksize=(2, 6, 3), meta=np.ndarray>
│ │ VISIBILITY (time, baseline_id, frequency, polarization) complex64 8kB dask.array<chunksize=(2, 6, 2, 4), meta=np.ndarray>
│ │ WEIGHT (time, baseline_id, frequency, polarization) float32 4kB dask.array<chunksize=(2, 6, 2, 4), meta=np.ndarray>
│ │ Attributes:
│ │ creation_date: 2025-04-04T11:46:01.977831+00:00
│ │ creator: {'software_name': 'xarray-ms', 'version': '0.2.8'}
│ │ observation_info: {'observer': 'observed', 'project': 'project'}
│ │ processor_info: {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│ │ schema_version: 4.0.0
│ │ type: visibility
│ └── Group: /test_partition_000/antenna_xds
│ Dimensions: (antenna_name: 3, cartesian_pos_label: 3,
│ receptor_label: 2, telescope_name: 3)
│ Coordinates:
│ * antenna_name (antenna_name) object 24B 'ANTENNA-0' .....
│ mount (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ * telescope_name (telescope_name) <U9 108B 'telescope' .....
│ station (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ * cartesian_pos_label (cartesian_pos_label) object 24B 'x' ......
│ polarization_type (antenna_name, receptor_label) object 48B dask.array<chunksize=(3, 2), meta=np.ndarray>
│ * receptor_label (receptor_label) object 16B 'pol_0' 'pol_1'
│ Data variables:
│ ANTENNA_POSITION (antenna_name, cartesian_pos_label) float64 72B dask.array<chunksize=(3, 3), meta=np.ndarray>
│ ANTENNA_DISH_DIAMETER (antenna_name) float64 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ ANTENNA_EFFECTIVE_DISH_DIAMETER (antenna_name) float64 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ ANTENNA_RECEPTOR_ANGLE (antenna_name, receptor_label) float64 48B dask.array<chunksize=(3, 2), meta=np.ndarray>
│ Attributes:
│ type: antenna
│ overall_telescope_name: telescope
│ relocatable_antennas: False
└── Group: /test_partition_001
│ Dimensions: (time: 5, baseline_id: 6, frequency: 4,
│ polarization: 2, uvw_label: 3)
│ Coordinates:
│ baseline_antenna1_name (baseline_id) object 48B dask.array<chunksize=(6,), meta=np.ndarray>
│ baseline_antenna2_name (baseline_id) object 48B dask.array<chunksize=(6,), meta=np.ndarray>
│ * baseline_id (baseline_id) int64 48B 0 1 2 3 4 5
│ field_name (time) object 40B dask.array<chunksize=(2,), meta=np.ndarray>
│ * frequency (frequency) float64 32B 8.56e+08 ... 1.712e+09
│ * polarization (polarization) object 16B 'RR' 'LL'
│ scan_number (time) int32 20B dask.array<chunksize=(2,), meta=np.ndarray>
│ sub_scan_number (time) int32 20B dask.array<chunksize=(2,), meta=np.ndarray>
│ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ * uvw_label (uvw_label) object 24B 'u' 'v' 'w'
│ Data variables:
│ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 240B dask.array<chunksize=(2, 6), meta=np.ndarray>
│ FLAG (time, baseline_id, frequency, polarization) uint8 240B dask.array<chunksize=(2, 6, 2, 2), meta=np.ndarray>
│ TIME_CENTROID (time, baseline_id) float64 240B dask.array<chunksize=(2, 6), meta=np.ndarray>
│ UVW (time, baseline_id, uvw_label) float64 720B dask.array<chunksize=(2, 6, 3), meta=np.ndarray>
│ VISIBILITY (time, baseline_id, frequency, polarization) complex64 2kB dask.array<chunksize=(2, 6, 2, 2), meta=np.ndarray>
│ WEIGHT (time, baseline_id, frequency, polarization) float32 960B dask.array<chunksize=(2, 6, 2, 2), meta=np.ndarray>
│ Attributes:
│ creation_date: 2025-04-04T11:46:01.987777+00:00
│ creator: {'software_name': 'xarray-ms', 'version': '0.2.8'}
│ observation_info: {'observer': 'observed', 'project': 'project'}
│ processor_info: {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│ schema_version: 4.0.0
│ type: visibility
└── Group: /test_partition_001/antenna_xds
Dimensions: (antenna_name: 3, cartesian_pos_label: 3,
receptor_label: 2, telescope_name: 3)
Coordinates:
* antenna_name (antenna_name) object 24B 'ANTENNA-0' .....
mount (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
* telescope_name (telescope_name) <U9 108B 'telescope' .....
station (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
* cartesian_pos_label (cartesian_pos_label) object 24B 'x' ......
polarization_type (antenna_name, receptor_label) object 48B dask.array<chunksize=(3, 2), meta=np.ndarray>
* receptor_label (receptor_label) object 16B 'pol_0' 'pol_1'
Data variables:
ANTENNA_POSITION (antenna_name, cartesian_pos_label) float64 72B dask.array<chunksize=(3, 3), meta=np.ndarray>
ANTENNA_DISH_DIAMETER (antenna_name) float64 24B dask.array<chunksize=(3,), meta=np.ndarray>
ANTENNA_EFFECTIVE_DISH_DIAMETER (antenna_name) float64 24B dask.array<chunksize=(3,), meta=np.ndarray>
ANTENNA_RECEPTOR_ANGLE (antenna_name, receptor_label) float64 48B dask.array<chunksize=(3, 2), meta=np.ndarray>
Attributes:
type: antenna
overall_telescope_name: telescope
relocatable_antennas: False
Per-partition chunking#
Different chunking may be desired, especially when applied to
different channelisation and polarisation configurations.
In these cases, the preferred_chunks argument can be used
to specify different chunking setups for each partition.
In [14]: dt = xarray.open_datatree(ms, partition_schema=["FIELD_ID"],
....: chunks={},
....: preferred_chunks={
....: (("DATA_DESC_ID", 0),): {"time": 2, "frequency": 4},
....: (("DATA_DESC_ID", 1),): {"time": 3, "frequency": 2}})
....:
See the preferred_chunks argument of
open_datatree()
for more information.
In [15]: dt
Out[15]:
<xarray.DataTree>
Group: /
├── Group: /test_partition_000
│ │ Dimensions: (time: 5, baseline_id: 3, frequency: 8,
│ │ polarization: 4, uvw_label: 3)
│ │ Coordinates:
│ │ baseline_antenna1_name (baseline_id) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ │ baseline_antenna2_name (baseline_id) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ │ * baseline_id (baseline_id) int64 24B 0 1 2
│ │ field_name (time) object 40B dask.array<chunksize=(2,), meta=np.ndarray>
│ │ * frequency (frequency) float64 64B 8.56e+08 ... 1.712e+09
│ │ * polarization (polarization) object 32B 'XX' 'XY' 'YX' 'YY'
│ │ scan_number (time) int32 20B dask.array<chunksize=(2,), meta=np.ndarray>
│ │ sub_scan_number (time) int32 20B dask.array<chunksize=(2,), meta=np.ndarray>
│ │ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ │ * uvw_label (uvw_label) object 24B 'u' 'v' 'w'
│ │ Data variables:
│ │ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 120B dask.array<chunksize=(2, 3), meta=np.ndarray>
│ │ FLAG (time, baseline_id, frequency, polarization) uint8 480B dask.array<chunksize=(2, 3, 4, 4), meta=np.ndarray>
│ │ TIME_CENTROID (time, baseline_id) float64 120B dask.array<chunksize=(2, 3), meta=np.ndarray>
│ │ UVW (time, baseline_id, uvw_label) float64 360B dask.array<chunksize=(2, 3, 3), meta=np.ndarray>
│ │ VISIBILITY (time, baseline_id, frequency, polarization) complex64 4kB dask.array<chunksize=(2, 3, 4, 4), meta=np.ndarray>
│ │ WEIGHT (time, baseline_id, frequency, polarization) float32 2kB dask.array<chunksize=(2, 3, 4, 4), meta=np.ndarray>
│ │ Attributes:
│ │ creation_date: 2025-04-04T11:46:02.104909+00:00
│ │ creator: {'software_name': 'xarray-ms', 'version': '0.2.8'}
│ │ observation_info: {'observer': 'observed', 'project': 'project'}
│ │ processor_info: {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│ │ schema_version: 4.0.0
│ │ type: visibility
│ └── Group: /test_partition_000/antenna_xds
│ Dimensions: (antenna_name: 3, cartesian_pos_label: 3,
│ receptor_label: 2, telescope_name: 3)
│ Coordinates:
│ * antenna_name (antenna_name) object 24B 'ANTENNA-0' .....
│ mount (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ * telescope_name (telescope_name) <U9 108B 'telescope' .....
│ station (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ * cartesian_pos_label (cartesian_pos_label) object 24B 'x' ......
│ polarization_type (antenna_name, receptor_label) object 48B dask.array<chunksize=(3, 2), meta=np.ndarray>
│ * receptor_label (receptor_label) object 16B 'pol_0' 'pol_1'
│ Data variables:
│ ANTENNA_POSITION (antenna_name, cartesian_pos_label) float64 72B dask.array<chunksize=(3, 3), meta=np.ndarray>
│ ANTENNA_DISH_DIAMETER (antenna_name) float64 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ ANTENNA_EFFECTIVE_DISH_DIAMETER (antenna_name) float64 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ ANTENNA_RECEPTOR_ANGLE (antenna_name, receptor_label) float64 48B dask.array<chunksize=(3, 2), meta=np.ndarray>
│ Attributes:
│ type: antenna
│ overall_telescope_name: telescope
│ relocatable_antennas: False
└── Group: /test_partition_001
│ Dimensions: (time: 5, baseline_id: 3, frequency: 4,
│ polarization: 2, uvw_label: 3)
│ Coordinates:
│ baseline_antenna1_name (baseline_id) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ baseline_antenna2_name (baseline_id) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
│ * baseline_id (baseline_id) int64 24B 0 1 2
│ field_name (time) object 40B dask.array<chunksize=(3,), meta=np.ndarray>
│ * frequency (frequency) float64 32B 8.56e+08 ... 1.712e+09
│ * polarization (polarization) object 16B 'RR' 'LL'
│ scan_number (time) int32 20B dask.array<chunksize=(3,), meta=np.ndarray>
│ sub_scan_number (time) int32 20B dask.array<chunksize=(3,), meta=np.ndarray>
│ * time (time) float64 40B 2.09e+11 ... 2.09e+11
│ * uvw_label (uvw_label) object 24B 'u' 'v' 'w'
│ Data variables:
│ EFFECTIVE_INTEGRATION_TIME (time, baseline_id) float64 120B dask.array<chunksize=(3, 3), meta=np.ndarray>
│ FLAG (time, baseline_id, frequency, polarization) uint8 120B dask.array<chunksize=(3, 3, 2, 2), meta=np.ndarray>
│ TIME_CENTROID (time, baseline_id) float64 120B dask.array<chunksize=(3, 3), meta=np.ndarray>
│ UVW (time, baseline_id, uvw_label) float64 360B dask.array<chunksize=(3, 3, 3), meta=np.ndarray>
│ VISIBILITY (time, baseline_id, frequency, polarization) complex64 960B dask.array<chunksize=(3, 3, 2, 2), meta=np.ndarray>
│ WEIGHT (time, baseline_id, frequency, polarization) float32 480B dask.array<chunksize=(3, 3, 2, 2), meta=np.ndarray>
│ Attributes:
│ creation_date: 2025-04-04T11:46:02.115229+00:00
│ creator: {'software_name': 'xarray-ms', 'version': '0.2.8'}
│ observation_info: {'observer': 'observed', 'project': 'project'}
│ processor_info: {'sub_type': 'MEERKAT', 'type': 'CORRELATOR'}
│ schema_version: 4.0.0
│ type: visibility
└── Group: /test_partition_001/antenna_xds
Dimensions: (antenna_name: 3, cartesian_pos_label: 3,
receptor_label: 2, telescope_name: 3)
Coordinates:
* antenna_name (antenna_name) object 24B 'ANTENNA-0' .....
mount (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
* telescope_name (telescope_name) <U9 108B 'telescope' .....
station (antenna_name) object 24B dask.array<chunksize=(3,), meta=np.ndarray>
* cartesian_pos_label (cartesian_pos_label) object 24B 'x' ......
polarization_type (antenna_name, receptor_label) object 48B dask.array<chunksize=(3, 2), meta=np.ndarray>
* receptor_label (receptor_label) object 16B 'pol_0' 'pol_1'
Data variables:
ANTENNA_POSITION (antenna_name, cartesian_pos_label) float64 72B dask.array<chunksize=(3, 3), meta=np.ndarray>
ANTENNA_DISH_DIAMETER (antenna_name) float64 24B dask.array<chunksize=(3,), meta=np.ndarray>
ANTENNA_EFFECTIVE_DISH_DIAMETER (antenna_name) float64 24B dask.array<chunksize=(3,), meta=np.ndarray>
ANTENNA_RECEPTOR_ANGLE (antenna_name, receptor_label) float64 48B dask.array<chunksize=(3, 2), meta=np.ndarray>
Attributes:
type: antenna
overall_telescope_name: telescope
relocatable_antennas: False
Writing a DataTree to Zarr#
zarr is a chunked storage format designed for use with distributed file systems. Once a DataTree view of the data has been established, it is trivial to export this to a zarr store.
In [16]: import os.path
In [17]: import tempfile
In [18]: dt = xarray.open_datatree(ms, partition_schema=["FIELD_ID"],
....: chunks={},
....: preferred_chunks={
....: (("DATA_DESC_ID", 0),): {"time": 2, "frequency": 4},
....: (("DATA_DESC_ID", 1),): {"time": 3, "frequency": 2}})
....:
In [19]: zarr_path = f"{tempfile.mkdtemp()}{os.path.sep}test.zarr"
In [20]: dt.to_zarr(zarr_path, consolidated=True, compute=True)
It is then trivial to open this using open_datatree:
In [21]: dt2 = xarray.open_datatree(zarr_path)
In [22]: xarray.testing.assert_identical(dt, dt2)
Writing a DataTree to Cloud Storage#
xarray incorporates standard functionality for writing xarray datasets to cloud storage.
Here we will use the s3fs package to write to an S3 bucket.
import s3fs
# custom-profile in .aws/credentials
s3 = s3fs.S3FileSystem(profile="custom-profile",
client_kwargs={"region_name": "af-south-1"})
# A path in a bucket
store = s3fs.mapping.S3Map("bucket/scratch/test.zarr", s3=s3,
check=True, create=False)
dt.to_zarr(store=store, mode="w", compute=True, consolidated=True)
See the xarray documentation on Cloud Storage Buckets for information on interfacing with other cloud providers.