Pixelwise Positional Embeddings for Medical Images in Vision Transformers

Publication: EP4531001A1
Published: 2025-04-02
Family Size: 3
Granted: No

Simple SummaryContent extracted from patent full text and abstract with AI.

This invention relates to an improved method for analyzing medical images using vision transformer networks. Specifically, it introduces pixelwise positional embeddings, which encode detailed positional information at the pixel level for every image. By integrating precise location and scale metadata from medical imaging devices, this approach produces features that vision transformers can use for various medical analysis tasks (like classification, detection, or segmentation), overcoming limitations of conventional patch-wise embedding techniques that are better suited for natural images.

Use CasesContent extracted from patent full text and abstract with AI.

  • Enhanced automated diagnosis or detection of diseases in medical images such as CT, MRI, X-ray, and ultrasound.
  • Accurate segmentation of anatomical structures or abnormalities, including tumor or organ boundaries, in 2D/3D medical imaging.
  • Longitudinal comparison of patient images from different imaging sessions or devices, increasing consistency and accuracy.
  • Multimodal image analysis where images from different modalities (e.g., MRI and CT) with varying spatial relationships can be robustly processed.
  • Development and training of advanced AI models for clinical decision support, radiology, and telemedicine applications.

BenefitsContent extracted from patent full text and abstract with AI.

  • Enables the use of precise spatial information of each pixel or voxel, leading to improved model performance over traditional methods using only patch-wise positional embeddings.
  • Allows flexible processing of medical images acquired from different devices or imaging protocols without the need for resampling, reducing information loss.
  • Facilitates better alignment and comparison between images from varying modalities and sessions, thus enhancing accuracy in analysis and diagnosis.
  • Improves the ability of machine learning models to learn detailed spatial relationships in complex medical data.
  • Demonstrated superior performance in medical image analysis tasks (e.g., brain infarction segmentation) compared to established models like UNETR and Swin UNETR.

Technical Classifications (CPCs)

Main Classifications

Physics & Measurement

Sub Classifications

Computing & Calculating

Information and Communication Technology for Specific Applications

CPC Codes

G06T7/74G06V10/82G16H30/40

Inventors & Applicants

Applicants

Siemens Healthineers Ag

Univ Friedrich Alexander Er

Patent Abstract

Systems and methods for performing a medical imaging analysis task based on pixelwise positionally encoded features are provided. One or more input medical images are received. One or more pixelwise positional embedding images are generated for the one or more input medical images using a spatially varying function. Patches are extracted from the one or more input medical images and the one or more pixelwise positional embedding images. The patches extracted from the one or more input medical images are encoded with corresponding ones of the patches extracted from the one or more pixelwise positional embedding images into pixelwise positionally encoded features. A medical imaging analysis task is performed using a machine learning based network based on the pixelwise positionally encoded features. Results of the medical imaging analysis task are output.

Key Information

Publication No.

EP4531001A1

Family ID

92925695

Publication Date

2025-04-02

Application No.

EP24202841A

Application Date

2024-09-26

Priority Date

2023-09-26

Granted

No

Possible Cooperation

For further information please contact the transfer office.