TrackMatcher — A tool for finding intercepts in tracks of geographical positions

Working with measurement data in atmospheric science often necessitates the collocation of observations from instruments or platforms at different locations, with different geographical and/or temporal data coverage. Varying complexity and abundance of the different data sets demand a consolidation of the observations. This paper presents a tool for (i) finding temporally and spatially resolved intersections between twoor three-dimensional geographical tracks (trajectories) and (ii) extracting of observations and other derived parameters in the vicinity of intersections to achieve the optimal combination of 5 various data sets and measurement techniques. The TrackMatcher tool has been designed specifically for matching height-resolved remote-sensing observations along the ground track of a satellite with position data of aircraft (flight tracks) and clouds (cloud tracks) and intended extension for ships (ship tracks) and air parcels (forward and backward trajectories). The open-source algorithm is written in the Julia programming language. The core of the matching algorithm consist of interpolating tracks of different objects with a piecewise cubic Hermite 10 interpolating polynomial with subsequent identification of an intercept point by minimising the norm between the different track point coordinate pairs. The functionality wrapped around the two steps allows for application of the TrackMatcher tool to a wide range of scenarios. Here, we present three examples of matching satellite tracks with the position of individual aircraft and clouds that demonstrate the usefulness of TrackMatcher for application in atmospheric science.

atmospheric science, the tool is designed for great flexibility with respect to the input data for further applications in the wider geosciences.

Code availability and package dependencies
TrackMatcher is an open source package hosted at GitHub (https://github.com/LIM-AeroCloud/TrackMatcher.jl.git) under the GNU General Public Licence v3.0. We strongly encourage contributions from outsiders, e.g., by pull requests or filing issues. 70 TrackMatcher is written in Julia (Bezanson et al., 2017). The package relies on MATLAB to read the satellite data from HDF4 files. The software is distributed as an unregistered Julia package and is tested against Julia 1.6.3 and the most recent stable release (currently identical version). Besides a Julia and MATLAB installation, the following Julia package dependencies exist with the version numbers given in parentheses: The PCHIP package was developed within the TrackMatcher framework to allow track interpolation with a piecewise cubic 90 Hermite interpolating polynomial (see Sect. 2.4). It is available under the GNU General Public Licence version 3 at Github (https://github.com/LIM-AeroCloud/PCHIP.jl.git).

Data structure
TrackMatcher is organised in data sets making use of Julia's type system. For readers unfamiliar with the type system, Sect. S1.1 in the ESM highlights the key points of this ecosystem and the different types. 95 Figure 1 shows TrackMatcher's type tree. Green boxes in Fig. 1 are used for concrete types that store track data or observations. Dark Blue boxes denote abstract types needed to classify the concrete types. Light blue boxes show an abstract type (top white label) together with an concrete type (bottom grey label). These are special cases for abstract types within the type tree that hold a concrete type as child. Both, abstract and concrete type have a constructor to instantiate the data in the concrete type tying both types together. For boxes circled in violet, a convenience constructor exists that initiates a TrackMatcher pro-100 cess such as loading input data and/or calculating intersections. Section S1.2 in the ESM presents details of the data structure introducing field names and data types for each struct. In TrackMatcher, the top level type is a generic DataSet, which holds Data with fields for measured (MeasuredSet) and computed (ComputedSet) data. The MeasuredSet organises the track data and observations. Observations are subtypes of ObservationSet as shown in Fig. 1. Track data are split into primary and secondary data sets. Currently, primary data consist of 105 individual trajectories (a PrimaryTrack) that are combined in a PrimarySet. The current version of TrackMatcher distinguishes between flight and cloud data in the PrimarySet and PrimaryTrack. Within the FlightSet, a FlightTrack can be obtained from several sources, however, the format of FlightData is unified. Currently, the only SecondarySet is SatSet. It holds data from a continuous trajectory, which is split into segments. Segments are stored in the field granules of SatSet. Segments or granules are classified below the SecondarySet level as SecondaryTrack, currently only holding SatData of the SatTrack. Each SatTrack Intersections between two tracks are defined as those locations for which the distance between a pair of points from the primary and secondary track reaches zero. This distance can be calculated either as the difference of the latitude values from 120 two tracks with a common longitude value or as the difference of the longitude values for a common latitude value. The general Here, x is either a latitude or longitude value and τ prim/sec represent the primary and secondary trajectories that determine the corresponding longitude and latitude values. Secondary track data are stored as segments of a continuous track for performance reasons (see also Sect. 2.3). All track segments are combined in a set. Secondary track data (currently only SatData) are stored in a data field with a DataFrame using the same structure as primary data (see Fig. S2 for schematics of the current data structure). Only essential data needed for the calculation of intersections are stored in this struct for performance reasons, i.e. time, latitude, and longitude. All SatData are combined in the field granules of the SatSet. Additional information about the temporal and spatial coverage of each granule 140 and the location of the input file is given in the SatSet metadata. Observations at track points are only extracted from the input files, if intercept points between the primary and secondary track sets are found. Specifics regarding the extracted data can be customised for the desired application.

Track data interpolation
A fundamental problem of the TrackMatcher algorithm is that the true functions of the investigated tracks are unknown and 145 have to be approximated. The approximation is only valid near nodes, i.e. in the vicinity of known track points. For Eq. 11 to be applicable, x (either latitude or longitude) must be equal for both tracks. In reality, the two tracks rarely feature regular intervals and most likely will not share a common latitude or longitude vector. Hence, both data sets will first need to be interpolated with a common set of x data between a shared start and end value.
Generally, track data cannot be fitted to a known function and the connection between latitude/longitude pairs needs to be 150 approximated with a suitable interpolation method. We chose the Piecewise Cubic Hermite Interpolating Polynomial (PCHIP; see, e.g., Fritsch and Carlson, 1980;Kahaner et al., 1989;Turley, 2018) to approximate a function f (x) between any two data points x i and x i+1 as and The PCHIP method demands a continuous first derivative f (x) at each data point (node), but in contrast to cubic splines does not require a continuous second derivative f (x). This principally means that cubic splines are slightly more accurate 165 in approximating continuous curves. For our purpose, the decreased accuracy is negligible and well within the errors of the track data points themselves. Instead, PCHIP interpolation suppresses artificial oscillation at discontinuities, which can occur at sharp turns of a track.
Track interpolation is a comprehensive task that consumes a significant portion of the source code. Therefore, PCHIP has been outsourced as a separate package available at GitHub under the GNU general public licence version 3.0 or above 170 (https://github.com/LIM-AeroCloud/PCHIP.jl.git).

Calculation of intersept points
To find intercept points between the primary and secondary tracks, the TrackMatcher algorithm follows seven steps that are explained in more detail below: 1. Load primary data, find the prevailing track direction and inflection points in the x-data of the primary track. 2. Load secondary data for the time frame of the primary data including a pre-defined tolerance at the beginning and end; find inflection points in the x-data based on the prevailing track direction of the primary data.
3. Put a bounding box around the coordinates of the primary track (see Fig. 2) and find segments of the secondary track within these coordinates and a given window of acceptalbe temporal difference ±∆t.
4. Interpolate the track segments of the primary and secondary tracks with the PCHIP method using common equidistant 180 x-data with a defined step width.
5. Define a function (Eq. (1)) to obtain roots between the difference in the track points of both tracks.
7. Filter intersections. Save intersection data and relevant measurements from the input data in the vicinity of the intercepts.
The result of the procedure is visualised in an example of an aircraft flight track and a satellite ground track in Fig. 2.

185
Track data are loaded (steps 1 and 2) as explained in Sect. 2.4.1 and 2.5.2. For the interpolation of the track data of both trajectories in step 4, strictly monotonic ascending x-data are a requirement. Therefore, both trajectories are fragmented into segments that fulfil this condition. To minimise fragmentation, the prevailing direction (north-south or east-west) of the primary track is determined while reading the input data according to Longitude λ is chosen as x value, if Eq. (10) is true and the dominant direction of the primary track is east-west. Otherwise, latitude ϕ performs better for a prevailing north-south direction. In Eq. (10), maximum horizontal distances are calculated separately for positive (λ + ) and negative (λ − ) longitude values to avoid problems with the sign change at the date line. The

195
last term on the right-hand side of Eq. (10) corrects for the poleward-decreasing distance between meridians. For simplicity, only the mean latitude (ϕ) is considered. As highly irregular patterns are currently only expected in the primary track, their data defines, whether ϕ or λ is used as x-data in both tracks.
Before interpolation, data of the secondary track need to be extracted for the correct time frame and location in step 3.
Therefore, only data points are considered that are within the time span of the primary data. Additional data points at the 200 beginning and end are allowed to permit a delay at possible intercept points of the primary and secondary track. By default, a maximum delay of 30 minutes is permitted, which can be adjusted by the user. To exclude unnecessary secondary track points, a bounding box is put around the primary track as shown in Fig. 2. Only track segments of the secondary trajectory within this bounding box are considered in any further steps (visualised by the pink boxes 1 to 5 in Fig. 2). To allow for rounding errors, the bounding box can be increased by an absolute tolerance (by default 0.1 • ).

205
In step 4, all segments are interpolated. Common x-data are used in the overlap regions (pink boxes in Fig. 2). These segments (τ prim /τ sec in Eq. (1)) are used for the identification of intercept points. Eq. (1) is case-specific and is re-defined for every segment pair (or box) in step 5. In step 6, TrackMatcher uses the IntervalRootFinding.jl package to determine all roots of Eq. (1). These roots are intercept points between the tracks of the primary and secondary data set.
In rare cases, the algorithm duplicates intercept points. This occurs mainly when an intercept point is located at a track 210 segment boundary where matches are then found at either side of the segment border. Duplicate intercepts can also occur in case of near-parallel tracks. To avoid the output of duplicate detection, the algorithm verifies that only the intercept calculation with the highest accuracy is stored within a predefined radius. By default, this radius is 20 km, but can be customised by the user.
Duplicate intercept points as well as intercept points for which the delay between the primary and secondary track exceeds a 215 set time difference are disregarded in step 7 (pink square in Fig. 2). This and further filtering is controlled by keyword arguments at the start of a TrackMatcher run as outlined in Table 1. For the remaining intercept points (yellow squares in Fig. 2), the latitude, longitude, respective times of the primary and secondary track at the intercept point as well as the difference between these times, details regarding the accuracy of the calculation, and user-selected auxiliary data in the vicinity of the intercept points are saved.

220
Linear interpolation between recorded time steps of the track points is used to derive the exact intercept time. Accurate interpolation as with the PCHIP method demands track segments with strictly monotonic latitude and longitude data (see Sect. 2.4.2). This likely results in a strong segmentation of the original track and leads to high computational costs. However, satellites, clouds, and aircraft at cruising altitudes are expected to move with relatively constant velocity, which is why linear interpolation can be applied to derive the time of intercept. For ascending and descending aircraft with unknown acceleration 225 or deceleration, track points are close to each other leading to minimal errors from linear interpolation.
To calculate the distance between track points, which is needed for setting a variety of thresholds, we use the haversine which gives the great circle distance between two points on a perfect sphere. In Eq. (12), ϕ 1/2 are the latitude values of the 230 coordinate pairs 1 and 2, λ 1/2 are the respective longitude values, and r is the radius of a perfect sphere. The poleward decrease of the Earth's radius is approximated as with r eq = 6 378 137 m and r pol = 6 356 752 m as the radii in the equatorial and polar plane, respectively. However, this eases the introduction of new features or the improvement of current routines. Contributions from outsiders, e.g., by pull requests or filing issues, are strongly encouraged to enhance the performance and flexibility of the algorithm.

Programme description
Both, TrackMatcher and PCHIP are unregistered Julia packages. The easiest way to install them is by using Julia's inbuilt package manager and adding the url of the GitHub repository. Example code for TrackMatcher installation is shown in Script 1 245 in Sect. S2.1 of the ESM. This will install the package and all dependent registered Julia packages in the correct version as defined by the package's project file. However, dependent unregistered packages need to be installed manually prior to the actual package installation. Therefore, the order of installation for TrackMatcher must be: 1. PCHIP

250
Package developers can use the dev option from Julia's package manager. However, it is recommended to clone the Track-Matcher repository from GitHub and activate the main folder with the Project.toml file to develop and run TrackMatcher. "abs" Save directories and file names as absolute paths ("abs", default), relative paths ("rel") or as given by user without saving additional observations ("" or false). Julia can be acquired from the package's resources.

Loading input data
Currently, TrackMatcher is configured to process three types of data: aircraft track data from different sources (stored as primary data in FlightData/FlightSet)

260
cloud track data (stored as primary data in CloudData/CloudSet) -CALIPSO satellite data (stored as secondary data in SatData/SatSet) Track data are stored in structs with a data field and, for primary data, a metadata field. Metadata is another struct with information about the raw data, computation settings and times, and properties concerning the whole trajectory. Time-resolved data are stored in a DataFrame in the data field with columns time, lat, and lon, and further columns depending on the data type 265 (aircraft, cloud or satellite data).
Tracks from primary data sets are stored in individual structs, which are combined in a vector and stored in a database struct (FlightSet or CloudSet) together with metadata (see Sect 2.3 and ESM for details). Several databases, e.g. individual flight tracks saved in tsv files or complex flight inventories, are considered for aircraft data. Each database type is stored in a separate vector/FlightSet field.

270
In contrast, secondary data consists of a single long trajectory. However, for performance reasons, data are stored as SatData structs for individual granules (track segments holding data of either the day-or night-time hemisphere of Earth). Moreover, only time, latitude, and longitude are stored in SatData for an optimised performance.
Further satellite data are only loaded in the vicinity of intersections. Currently, additional satellite data can be extracted either as a height profile (CPro) or as a layer mean value (CLay). The additional data are also used to determine the meteorological 275 conditions at the intersection. Only one type of the observation data (CPro or CLay) can be used to derive intercept points. The data type is determined automatically from the keyword CPro or CLay embedded in the file names. If both types exists, the data type with a majority in the first 50 files is chosen unless the default behaviour is overwritten by user settings. It should be noted that profile data gives a more refined height resolution and, hence, a more precise representation of the height-resolved atmospheric state at the intersection at the cost of more memory usage and a longer computation time.

280
Data are loaded from HDF4, MATLAB data (.mat) files or text files with comma-separated values (csv) or tab-separated values using "tsv", "dat" or "txt" as file extension. Details on the database types, file formats, restrictions and conventions can be found in Sect. S2.2.1 of the ESM.
To load data into the respective struct a convenience constructor exists that takes any number of file strings with absolute or relative folder paths as input. These directories and all subfolders are searched recursively for any file with the correct 285 extension. These files are assumed to be valid data files or will produce a non-critical error during loading. Further keyword arguments control data reading and filtering as given in Table 1.
Aircraft data are currently the only data type with multiple database sources. For the data files, csv is commonly used as file format. Therefore, data files cannot be identified by file extension alone as this does not allow an assignment to the correct database. Therefore, directories are passed as strings to keyword arguments for the respective database type. If a user wants 290 to scan more than one directory for the same database type, a vector of strings can be passed to the keyword arguments (see Sect. S2.2 and Note 7 in the ESM for details).

Calculating intersections and model output
To calculate intersections between the trajectories of the primary and secondary data sets, the user only needs to instantiate a new Intersection struct using either FlightSet or CloudSet and SatSet as input to a modified constructor for Intersection.

295
The algorithm works only for either flight or cloud data and two intersection structs need to be instantiated, if you want to calculate intersections for both types. However, the algorithm does not differentiate between the different flight database types and calculates intersections for all flights in a FlightSet regardless of the source. Parameters exist to control the performance and accuracy of the results as indicated by Table 1

Adapting TrackMatcher
The TrackMatcher package works as is for the track data described in this article with the prior installation of a licensed  In general, structs within the type tree presented in Fig. 1 exist to load and store input data from the primary and secondary trajectories or observations near intersections. Further structs calculate and store intersection data and meta information.

320
When adapting code, developers should obey this general structure. If the file format of a database has changed, respective routines should be updated. To add new track data, new track structs should be added to an existing or newly established PrimarySet or SecondarySet. Section S3 in the ESM helps developers to understand the general structure of source files and the contained functions.
Data should be added in a way that all track data of the same data set are stored in a unified format. For example, all track 325 data of the FlightSet are stored as a FlightData struct regardless of the source of the flight data. If users want to add data from another source, these data should be stored in FlightData structs as well saving identical information. If the need arises to save additional data, structs responsible for storing the data should be altered. If the data are not available in previous databases, filler objects such as missing, nothing or NaN should be used.   Red colour marks an increase in the number of identified intercepts using TrackMatcher compared to the data set of Tesche et al. (2016).  Table 3 summarizes the statistics of applying TrackMatcher to the data set of Tesche et al. (2016). In the original study, 678

Application and evaluation of the TrackMatcher package
and 3331 intercepts were found for time differences of ±0.5 h and ±2.5 h, respectively. In addition, cirrus clouds had to be  Fig. 4. TrackMatcher identifies a total of 77635 intercepts between the CALIPSO ground track and the tracks of civil aircraft for a time delay of ±30 minutes (Fig. 4a).
Naturally, the largest number of intercepts are found where flight density and contrail coverage are highest (see, e.g., Fig. 1   375 in Duda et al., 2019). Nevertheless, there is also a considerable amount of intercepts in regions of lower air-traffic density. In particular, distinct connections, such as between Hawaii and the continents or between Australia and New Zealand, can clearly be identified in Fig. 4a. The number of cases is reduced to 5076 in Fig. 4b as this data set includes the demand for the detection of a cirrus clouds at flight level at each intercept. This decreases the number of identified intercepts by a factor of about 15.
Using a global way point database is likely to yield a data set that is large enough to introduce sub-categories into the statistical 380 analysis of the effect of embedded contrails on cirrus clouds.
The findings in Fig. 4 demonstrate that TrackMatcher can be applied to data sets of considerable size. However, parallel computation is not yet achieved in TrackMatcher with the exception of file reading with the CSV package. To achieve appre- and calculates the corresponding y 0 from the interpolated trajectories of the primary and secondary tracks. Using the haversine function (Eq. (12)), TrackMatcher calculates the distance between both computations of the intercept points using either the primary or secondary data set. The distance between both computations is the indicator for the accuracy of the calculation. This 395 indicator only recognises the accuracy of the interpolation method and the calculation of the roots in the distance function. It does not consider any measurement errors in the track data. Table 4 shows minimum, maximum, mean, and important quantiles for the accuracy indicator and the spatial and temporal distance to the nearest observed track point in the results of the TrackMatcher run for February 2012. Generally, CALIPSO satellite data of the secondary data set have a much greater density of track points, which is resembled by a median distance 400 of about 1 km to the nearest measured track point compared to the median distance of 11 km for the primary flight data. Both data sets show some large data gaps resulting in a maximum distance of 2615 km and 4648 km of the computed inter section to the nearest observed track point of the primary and secondary data set, respectively. Due to the much greater velocity of the satellite compared to an aircraft, the maximum time difference to the nearest measured track point is only about 8 minutes for the secondary data compared to 6 hours and 50 minutes for the primary data. Larger data gaps are more common in the 405 primary data, which is represented by the fact of the mean in the range of the 68th percentile. For the secondary data, the mean corresponds to the 0.986-quantile. Hence, only about 1% of the track points is above the average of 1 km and with a TrackMatcher performs remarkably well with most results being calculated with an accuracy of only a few meters. The the CM SAF CLoud property dAtAset using SEVIRI (CLAAS-2) data set (Benas et al., 2017) following Seelig et al. (2021) have been used as primary input into TrackMatcher. The tracked clouds in this data set (i) are low-level clouds that (ii) formed 420 in clear air and (iii) dissolved in clear air. All these clouds could be followed throughout the entirety of their lifetime. Clouds that originate from splitting or end as merged clouds are not tracked with the current methodology. Figure 5 shows the cloud tracks together with the identified intersections.
In contrast to aircraft trajectories, the success rate of finding intersections in cloud tracks is significantly reduced.  width results in less identified intersections (2350). In essence, handing too many and too finely resolved data points to the IntervalRootFinding package results in a performance loss and sometimes failure to identify intersections.
Using cloud layer data instead of cloud profile data decreases computation times by a factor of 2.5. The same number of intersections are found, however, meteorological conditions are not always correctly identified (282 finds under non-clear conditions compared to 295 using cloud profile data). Moreover, it is not possible to distinguish data without meteorological

Conclusions and outlook
This paper present a tool for finding intercept points between tracks of geographical coordinates, i.e. data sets consisting of at least time, latitude, and longitude. The main principles of the methodology consist of (i) interpolating the primary and secondary tracks with a piecewise cubic Hermite interpolating polynomial and (ii) finding the minimum norm between the 475 different track point coordinate pairs. The universal design of TrackMatcher allows application of the tool to a wide range of scenarios such as matching tracks from ships, aircrafts, clouds, satellites, and in fact any other moving object with known track data.
Here, the tool is applied to find intercept point between flight tracks of individual aircraft and satellite ground tracks as well as between tracks of individual clouds and satellite ground tracks. Potential application of the TrackMatcher tool in 480 atmospheric science include the identification of intercept points between the tracks of research aircraft and the tracks of clouds or trajectories of air parcels from dispersion modelling. TrackMatcher will also prove useful in research fields outside the atmospheric sciences whenever data collected along different spational pathways need to be collated with an objective and reproducible methodology.
However, current studies have also shown limitations of TrackMatcher. Identification of intercept points matching cloud data 485 with other trajectories can be improved with the development of an algorithm comparing trajectories and areas or volumes.
Further development will also focus on performance improvements, e.g., by enabling distributed runs or parallel computation. Author contributions. TrackMatcher is based on an idea by MT and PB. The code development is lead by PB. All authors contributed equally in the processing and interpretation of the data as well as in the preparation of the manuscript.