3D Object Localization in 3D CT Volumes

Introduction

This project was inspired from the paper by Criminisi et al. from Microsoft Research. In the paper, they proposed an approach in which they apply random regression forests to predict the location of anatomical structures in CT scans. They use a data set of 100 torso CT scans, which differ highly in several aspects (e.g. scanner type, resolution, organ size and pose, etc.). For these scans, the 3D bounding boxes for the organs of interest are computed at first to generate the ground truth. The bounding boxes are parametrized by the positions of the front upper left corner and back lower right corner.

Source

Regression Forests

A random regression forest consists of bunch of decision trees where each node in a tree splits the incoming datapoints according to random feature selection and random splits on those selected features, However, a decision is made at each node to pick the best randomly chosen split criteria. Hence, decision trees are binary trees, which recursively partition the input space into two subsets. This allows a partitioning of a complex problem into smaller, simpler problems. Every resulting subset contains a model to predict the output. For regression trees this is a regression function, which returns a real-valued prediction.

Source
Source
Source

Preprocessing

Since the goal is to find the axis aligned bounding boxes for the femur, a parametrization is chosen, which fully describes the bounding boxes. To do so, for each bounding box, a vector

Source
Source

Training

The training phase of the regression forest starts at the root node of each tree. The nodes of the regression trees are recursively divided by splitting up all the voxels contained in the node into two disjunct sets of voxels. The voxels at given node are separated by the following test function: ξ > f(v; θj). Each voxel is passed to the left or to the right child if the test function evaluates to true or respectively to false. The function f defines the feature response of a particular voxel v for a feature θj. The quality of the split is measured by the information gain, which is defined based on the entropy H(S) as

Source
Source

Testing

After having trained the regression forest, it is tested on a new, unseen scan. All the voxels of the new scan, along with their feature responses and voxel locations are pushed through each tree of the forest. The voxels are passed to the left or to the right child, based on comparison of the feature response with the threshold, stored at the node during the forest training. The voxels are recursively sent down the trees until each voxel reaches a leaf.

Conclusion

The number of trees in the forest, number of feature responses, number of trial feature splits for finding split criteria with maximum information gain are the hyperparameters that can be searched upon to find the best ones for given test cases. I could only find about 15 CT Volumes for femur by visiting hospitals in my city , out of which I tested it on only 1 CT volume. I used about 12 trees, 10 feature responses and 10 feature splits for each feature response as my hyperparameter. The results were astonishngly good given such few training cases.

Femur detected as per the algorithm

Machine Vision Researcher

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store