Page 142 - Artificial Intelligence for Computational Modeling of the Heart
P. 142
114 Chapter 3 Learning cardiac anatomy
tation of all these tracking methods comes from the fact that the
kernels or representations are ’engineered’ and may not capture
enough deep insights of the images.
In recent years, significant attention has been focused on the
development of deep learning based tracking models. Compared
with conventional methods, deep neural network models can ex-
tract more informative features and have shown superior perfor-
mance in various applications. Based on the network topology
and methodology types, we organize them into three different cat-
egories.
Tracking with convolutional Neural Networks (CNNs). To ad-
dress the aforementioned limitations from the ‘engineered’ rep-
resentations of target objects, leveraging high level features from
CNNs serves as a natural remedy. Siamese network [286]isoneof
the most commonly used architectures for similarity-based track-
ing. It processes two different inputs through the same network
computations and provides a similarity score based on the ex-
tracted features. One of the early work is due to Bertinetto et al.
[287], who propose a fully convolutional Siamese network to find
a target object in consecutive frames with a region-wise similarity
measure. Similar strategies have been widely developed, includ-
ing GOTURN tracker with box regression on targets [288], DSiam
tracker with online Siamease network updating [289], variants of
CFNET with add-on correlation filters [290,291] and different vari-
ants of SiamRPN with region proposals after feature extraction
[292–294]. Besides similarity learning based Siamese networks,
different models considering domain and appearance changes
have been studied. Nam et al. [295] proposed MDNet which learns
a domain independent representation which encodes the mov-
ing object and uses it for detection in the next frames. CREST
[296] represents the discriminative correlation filter (DCF) [297,
298] as convolution and applies residual learning to accommo-
date appearance changes. Zhu et al. [299] also take optical flow
information into account and proposed a model on correlation
tracking with spatial-temporal attention. The application of such
models in cardiac imaging is under development. Recently, Para-
juli et al. [300] applied a flow network on left ventricle motion
analysis where the motion is modeled as flow through graphs and
the similarities between graph nodes are learned from a Siamese
network.
Tracking with Recurrent Neural Networks. It is found that re-
current neural networks can well encode temporal state infor-
mation and thus are effective for sequential data. Cui et al. [301]
proposed a Recurrently Target-attending Tracking (RTT) model.
It estimates a confidence map for object motion using a multi-