A fast catheter segmentation and tracking from echocardiographic sequences based on corresponding X-ray fluoroscopic image segmentation and hierarchical GRAPH modelling

3D soft tissue information, which X-ray images cannot provide but 3D echocardiographic imaging can, may be required in cardiac catheter-based interventions. In this paper, we propose a real-time catheter tracking strategy in echocardiographic sequences based on catheter segmentation in 2D X-ray images and registration between these two modalities. Firstly the segmentation from X-ray and the registration between X-ray and ultrasound is computed. The results from these steps reduce the search space in the ultrasound volume to a limited space surrounding a curved surface. This space is straightened and 2D SURF is calculated on the sampled cross-sections. All features are organized as a two level hierarchical graph. The longest path on the top-level graph and shortest paths on the bottom level graphs are solved. This combined path is considered as the potential catheter after B-Spline modelling and growing. The experiments on clinical data (2000 pairs of frames) show a better performance than a previous method and some dominant vesselness filters.


INTRODUCTION
X-ray fluoroscopy imaging and echocardiography imaging (ultrasound, US) are two modalities which have been widely used in real-time cardiac interventions. X-ray fluoroscopic imaging, which is the standard modality used to monitor cardiac catheter ablation interventions, has some limitations such as lack of 3D soft tissue information and harmful ionising radiation. In contrast, echocardiographic imaging, which overcomes the above limitations, can be seen as a good alternative for catheterization procedures. However, severe acoustics artifacts and small field of view make monitoring by ultrasound alone challenging and sometimes impossible.
For the application of catheter tracking in X-ray images, a great number of studies have been proposed. The methodologies include filter-based segmentation [1], electrode feature detection [2], graph-cut [3] and learning-based methods [4]. The tracking subjects include the catheter tip [1] [4], the electrodes [2] and the soft parts of the catheter or a guide wire [3] [5]. For ultrasound images, very limited studies have been proposed for catheter extraction. In [6], a fast catheter segmentation was proposed based on the segmentation from X-ray sequences. A graph model was constructed and the shortest path was solved in this work. However, the approach was tested on very little clinical data.
We are aiming to provide a fast catheter visualization based on a combination of both X-ray and ultrasound imaging for cardiac clinical interventions. This combination includes soft tissue information in the procedure and overcomes the disadvantages of ultrasound imaging. The main contributions are: (1) a new 2D SURF measurement filter in a straightened space for tubular structures; (2) a post-processing growing strategy which can increase the ratio of extracted parts of the catheter; (3) a hierarchical graph modelling strategy consisting of two layers which can preserve the integrity of catheter fragments and link them together as a more likely catheter; (4) more extensive experiments in clinical data.

METHODOLOGY
2.1. Projecting the Extracted Catheter from the X-ray Plane to the US Volume Our system starts with the catheter extraction or tracking from X-ray sequences. The method we use is from [6] which combines a blob detector, local line fitting and Kalman filter based growing. A fast Primal-Dual (Fast-PD) based tracking framework and a post-processing Kalman growing step were also proposed. The reader is referred to [6] for more details. The other information we need is the transformation matrix T mapping each US volume from the US coordinate system to the X-ray coordinate system. We use the method in [7].

Search Space Straightening and SURF Feature Extraction
Suppose T = [t 1 , t 2 , t 3 , t 4 ] T , for the kth landmark location p k = (u k , v k ) on the X-ray image, the corresponding ray T is the coordinates of the voxels in the US volume which correspond to the kth landmark in the X-ray image. Each ray is bounded by two intersections with the border of the echo volume. All of the rays corresponding to locations on the catheter in the X-ray image comprise a curved surface S. As shown in Fig. 1, a reference plane R which crosses the centre of the echo volume with a normal vector the same as the direction of the X-ray projection axis, c, is defined. The projections of the first and last X-ray landmarks of the catheter on reference plane R are used to define unified tangential direction t. The unified normal search direction n is defined as perpendicular to both c and t. We then straighten the original echo space surrounding the curved surface into a regular rectangular volume V. The direction vector v k for each ray corresponding to a pixel on the catheter in X-ray, as well as the corresponding intersection X k 0 with the reference plane R, are calculated from (1). The ray equation can be reformed as X k = X k 0 + sv k and s is bounded by [s k min , s k max ] based on the intersections with the border of the original echo volume. Given s max = max k s k max and s min = min k s k min , the value of each voxel in the straightened volume is defined as: where I(X) is the intensity value of coordinate X in the original echo volume and set to zero if X is out of the volume. i, j and k are coordinate indexes for the straightened volume. k also corresponds to the landmark index introduced above. i is bounded by [−τ normal , τ normal ] where τ normal is the limit of the search range along normal direction n. By this transform, we straighten the original space surrounding the curved surface to a rectangular volume V.
For the catheter which has been transformed into the straightened space V, it is likely that the catheter is along the z-axis of V, with small changes in the x-axis position, but some variation in the y-axis position. On this assumption, the xy and xz cross-section of V will show a blob like structure at the catheter position. We calculate the response Resp(X) by selecting the higher SURF value from its two corresponding locations in the xy and the xz cross-sections. By setting up a threshold τ surf , a set of candidate features, with both the coordinates X k i and the responses R k i above the threshold for the ith feature on the kth ray, are selected.

Two Level Hierarchical Graph Construction and Path Finding
Firstly from the 1st to the nth (last) ray, every pair of two feature locations on the same or adjacent ray(s) is checked for their distance. If the distance is below a threshold τ subgraph , then the two feature locations are linked by an edge with a cost equal to the distance. After this closest distance clustering, feature locations are organized as a set of clusters C j , j = 1, . . . , m (m is the number of clusters) and within each cluster features are linked to each other directly or via other features. For each cluster C j , the features X k i with no direct linked features with a smaller ray index than k are selected as the candidate starting nodes. Other features that have no direct linked features with a larger index than its ray index are selected as the candidate ending nodes. One starting node is selected among the starting node candidates and one ending node is selected among ending candidates. The principle is to make the distance between the starting and ending nodes as large as possible. This makes each catheter section as complete as possible. Then for every pair of two clusters, if one cluster i's start or end point A i is very close to the other cluster j's start or end point A j , and their directions are almost co-linear, then these two clusters are combined as one cluster. Suppose the opposite ends of each cluster where they are not linked are B i and B j for clusters i and j respectively, then the constraint can be denoted by: where ⟨·, ·⟩ denotes the angle between two vectors. τ colinear and τ cluster are thresholds set manually. The start and end points are re-allocated by their ray index (the start should have a smaller ray index than the end). To construct the top level graph, the start S i and end E i of each cluster i are considered as the nodes. Because a directed acyclic graph is used for the longest path problem, all edges have their own directions. For each cluster i, an edge edge i,i is created from the start to the end points. For any pair of the clusters i, j which are not combined but are likely co-linear, an edge edge i,j is created from the end point of one cluster to the start point of the other cluster. A weight W i,j is allocated to each edge through: where dis(i, i) is the length of the vector −−→ S i E i within the cluster i, and dis(i, j), i ̸ = j is the distance between the end point of cluster i and the start point of cluster j, along the vector −−→ E i S j . R i is the average value of the SURF responses among the members of the cluster. θ i is the turning angle from the vector After the top-level graph is complete, a bottom-level graph for each cluster is constructed by allocating each directly linked edge a cost of the distance between its two nodes, as shown in Fig. 2. Then the longest path is selected in the top-level graph using a dynamic programming technique and the shortest path is solved for each cluster. Each intra-cluster edge of the longest path is then replaced by the corresponding shortest path.

Post Processing
Finally, all landmarks on the longest path are transformed back to the original echo volume. Then the sorted landmarks on the path are used to model a B-Spline curve. The tangential direction d i for the i th landmark can also be calculated during B-Spline modelling. Because the overlap of the X-ray and echo images is limited due to alignment errors and this may cause the extracted result to cover only a part of the whole catheter, we use a growing technique to extend the catheter to the un-extracted parts. Starting from the two ends of the already tracked part of the catheter, the growing direction v i for each end is along −d 0 for the first landmark and d n for the last. Given two adjacent landmarks p i−1 and p i , the size of the growth step is determined by: where M and N are the maximum and minimum size of growth step set manually (M = 5, N = 3 voxels in our application). By setting the size of the steps through this equation, the growth will take smaller steps when a sharp turn occurs. After both the direction and the size of the step are determined, the next candidate position can be acquired by: The final position of the next landmark is determined by finding the voxel within a defined neighborhood of p ′ i+1 with the largest Frangi response. If all of the candidates' responses are below the threshold τ surf , the growth will be terminated.

EXPERIMENTS
Four long sequences of patient data were used in our experimental evaluation. These comprise more than 2000 frames of X-ray and nearly 1000 frames of corresponding echo data. The number of pairs of related X-ray and echo image is more than 2000 in total. All data contained a guide wire and a catheter. The X-ray data have an image size of 512×512 pixels. The echo volumes have 144×160×208 voxels of 0.60×0.60×0.29 mm 3 for the first sequence and 144×176×208 voxels of 0.51×0.50×0.29 mm 3 for the rest of the echo sequences. The synchronization of the X-ray and echo was performed manually. The ground truth catheter locations in the X-ray and echo images were also marked manually. Experiments were performed on an Ubuntu Linux system on a 3.40GHz, 8GB desktop. The methods compared were the proposed strategy with and without growing, Frangi filtering [8] and an ITK filtering method [9], as well as the method proposed in [6].
The following performance metrics were defined to evaluate each algorithm's speed, accuracy and robustness: (1) Average frame rate. This evaluates speed, which directly determines performance for real-time applications; (2) Average tracking error. For each landmark i, a shortest distance d i to the ground truth is calculated. Then a threshold ρ is used to select correctly tracked landmarks with d i ≤ ρ. The average of d i among correctly tracked landmarks is defined as the tracking error; (3) Incorrect tracking percentage (ITP). ITP is defined as the number of incorrectly tracked landmarks over the total number of landmarks. ITP indicates the reliability of the tracking results. (4) Failed tracking percentage (FTP). For each landmark on the ground truth, the minimum distance to the tracked curve is calculated. The landmark is considered as successfully tracked if the distance is below ρ. FTP is defined as the number of failed landmarks over the total number of landmarks on the ground truth. It evaluates to what extent the whole catheter can be tracked. (5) Ratio of failed tracked frames (ROF). If the average SURF response for all landmarks is below τ surf /3, we treat it as a failure. The number of frames where a tracking failure is detected over the total number of frames can indicate the need for re-initialisation.
The threshold τ surf = 0.01 is used to select feature locations. The search range τ normal along the normal direction for each voxel on the curved surface depends on how accurate the transformation between X-ray and echo images is. Normally it is set to 10 pixels for small registration errors which means the catheter is almost on or very near the curved surface. For some bad registrations or bad synchronizations, the value can be set up to 30, with the tradeoff of a larger search range and therefore reduced speed. At the stage of graph construction, the threshold for the minimum distances for subgraph construction or clustering is set to τ subgraph = 10 pixels. The threshold to decide the combination of any two clusters is set to τ colinear = 1 and τ cluster = 20. Fig. 3 shows the results of error, ROF, FTP and ITP. The average time per frame is 1.3498, 0.7428, 4.2993, 4.3489, 0.5017 seconds for the proposed method with and without growing, Frangi filtering, ITK vesselness filtering, and the method in [6]. Fig. 4 shows a comparison results in which the proposed method spotted the right position whereas [6] failed. We can conclude: (1) The growing strategy can extend the tracking to a longer length, the tradeoff is a larger computation cost and a higher ITP; (2) 2D SURF is more reliable and fast than vesselness filtering techniques in any metric. (3) The proposed strategy is more suitable for clinical application than [6]. Given ρ = 5mm, our standard method achieves a speed of 1.35 spf and an error of 1.79 mm, with FTP of 27.7%, ITP of 4.19% and a failure ratio of 11.6%.

CONCLUSION
We have proposed a fast X-ray assisted catheter segmentation and tracking strategy for ultrasound imaging in cardiac catheterization. Our method is better than [6] in any performance metric and the proposed SURF measurements outperform some dominant vesselness filtering techniques.