Methods for Action Segmentation of a Video Sequence
Simple SummaryContent extracted from patent full text and abstract with AI.
This invention discloses an advanced computer-implemented method for action segmentation in video sequences. It utilizes a two-level transformer-based attention mechanism: first, it analyzes local segments (consecutive frames) to learn relationships among nearby frames; second, it samples and processes non-consecutive global segments to capture long-term dependencies across the video. This efficient approach enables accurate frame-level classification of actions while managing computational resources effectively, making it feasible for long or complex videos.
Use CasesContent extracted from patent full text and abstract with AI.
- Automated analysis of surveillance or security footage to detect and classify actions or behaviors frame by frame.
- Industrial production line monitoring for real-time detection of operator actions, faults, or anomalies.
- Sports video analytics to segment and identify specific plays or actions for coaching or broadcasting.
- Video content indexing and retrieval for large media libraries based on temporal action segments.
- Behavioral studies, such as analyzing animal or human activity in research videos.
- Healthcare applications, including patient activity recognition for remote monitoring or rehabilitation.
- Robotics, where robots need to interpret human actions in videos for collaboration or assistance.
BenefitsContent extracted from patent full text and abstract with AI.
- Combines local and global attention for more accurate and robust action segmentation, outperforming previous methods.
- Efficiently captures both short-term and long-term temporal dependencies without the excessive computational cost typical of full-transformer models.
- Scalable to long-duration videos due to reduced memory and processing demands, enabling its use in real-world, high-volume video analytics.
- Flexible and adaptable attention window sizes allow optimization based on available computational resources.
- Can achieve high accuracy while using fewer model parameters, leading to faster training and inference.
- Supports overlapping frame analysis, improving segmentation quality and reducing boundary errors.
Technical Classifications (CPCs)
Main Classifications
Physics & Measurement
Sub Classifications
Computing & Calculating
CPC Codes
Inventors & Applicants
Applicants
Toyota Motor Co Ltd
Univ Bonn Rheinische Friedrich Wilhelms
Patent Abstract
A computer-implemented method for action segmentation of a video sequence, the method comprising: selecting a plurality of local segments (113) from the video sequence, each local segment comprising a plurality of successive frames of the video sequence; (S12) processing each local segment with a first attention transformer (114) in order to generate first enhanced feature maps (116) modelling relations between local segments (113); (S20) selecting a plurality of global segments (123) from the first enhanced feature maps (116), each global segment comprising a plurality of non-successive frames of the first enhanced feature maps (116); (S22) processing each global segment with a second attention transformer (124) in order to generate second enhanced feature maps (126) modelling relations between global segments (123); and (S30) assigning an action class to each frame of the video sequence based on the second enhanced feature maps (126).
Key Information
Publication No.
EP4481696A1
Family ID
87047920
Publication Date
2024-12-25
Application No.
EP23181135A
Application Date
2023-06-23
Priority Date
2023-06-23
Granted
No
Possible Cooperation
For further information please contact the transfer office.