Oct 25, 2024
KoSAIM
Abstract
OSTEO
Positional Embedding-Enhanced Model for Vertebrae and Rib Segmentation and T12 Identification in Chest Radiographs
Minjee Kim
Object
This study aims to segment vertebrae and ribs in chest posteroanterior radiographs (CXR) and identify the T12 vertebra. T12, the last thoracic vertebrae attached to a rib, serves as a reference for distinguishing lumbar from thoracic vertebrae and indexing vertebral bodies.
Methods
We developed a model by integrating positional embeddings into the UNet++[1] baseline architecture to enhance segmentation performance of the vertebrae and ribs, and identification of the T12 vertebra. A layer was added to the decoder to incorporate positional information into the feature maps, enhancing spatial awareness. The modified UNet++ model was compared with the baseline to evaluate the impact of positional embeddings. A total of 1,158 synthetic CXRs[2] (835 for training, 93 for validation, 230 for testing), along with vertebral spatial positions labeled by a radiologist. 100 CXRs from the NIH public dataset[3], and 559 real CXRs from a major hospital in South Korea were used as external datasets. Labels included thoracic vertebrae (T1–T11), T12 vertebra, lumbar vertebrae (L1–L4), 12th rib, and the other ribs. Segmentation performance was evaluated using Dice score on the test set. The T12 vertebra identification accuracy was assessed with a Dice threshold of 0.5 on the internal and external datasets.
Results
Positional embeddings improved Dice scores: thoracic vertebrae (T1–T11) from 0.859 to 0.865, T12 from 0.911 to 0.924, lumbar vertebrae (L1–L4) from 0.854 to 0.861, 12th rib from 0.642 to 0.697, and other ribs from 0.872 to 0.884. T12 identification accuracy was 0.877 on synthetic data, 0.921 on real dataset from South Korea, and 0.860 on NIH dataset.

Figure 1. An example of a typical synthetic CXR image and segmentation outputs on real CXR image
Thoracic vertebrae (T1–T11) | T12 vertebra | Lumbar vertebrae (L1–L4) | Last Rib (12-rib) | The other ribs (Rib) | Total | |
Baseline | 0.859 | 0.911 | 0.854 | 0.642 | 0.872 | 0.825 |
+ Positional embedding | 0.865 | 0.924 | 0.861 | 0.697 | 0.884 | 0.846 |
Table 1: Comparison of Dice scores for segmentation performance with and without positional embeddings
Conclusions
The positional embedding-enhanced UNet++ successfully segmented vertebrae and ribs and accurately identified T12 vertebra in CXR images. The model's performance suggests synthetic data-trained models can be effectively applied to real datasets, providing a foundation for developing spinal disorder diagnosis models.
Reference
[1] Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018). 2018;11045:3-11. doi:10.1007/978-3-030-00889-5_1
[2] National Information Society Agency (NIA). 주요질환 이미지 합성데이터(X-ray) ChestPA Normal. AI Hub, https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=71521. Accessed [2024. 08].
[3] Wang, Xiaosong et al. “ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017): 3462-3471.
*Keywords
Vertebrae Segmentation, Rib Segmentation, T12 Identification