Methods for Action Segmentation of a Video Sequence

Publication: EP4481696A1

Published: 2024-12-25

Family Size: 1

Granted: No

Simple SummaryContent extracted from patent full text and abstract with AI.

This invention discloses an advanced computer-implemented method for action segmentation in video sequences. It utilizes a two-level transformer-based attention mechanism: first, it analyzes local segments (consecutive frames) to learn relationships among nearby frames; second, it samples and processes non-consecutive global segments to capture long-term dependencies across the video. This efficient approach enables accurate frame-level classification of actions while managing computational resources effectively, making it feasible for long or complex videos.

Use CasesContent extracted from patent full text and abstract with AI.

Automated analysis of surveillance or security footage to detect and classify actions or behaviors frame by frame.
Industrial production line monitoring for real-time detection of operator actions, faults, or anomalies.
Sports video analytics to segment and identify specific plays or actions for coaching or broadcasting.
Video content indexing and retrieval for large media libraries based on temporal action segments.
Behavioral studies, such as analyzing animal or human activity in research videos.
Healthcare applications, including patient activity recognition for remote monitoring or rehabilitation.
Robotics, where robots need to interpret human actions in videos for collaboration or assistance.

BenefitsContent extracted from patent full text and abstract with AI.

Combines local and global attention for more accurate and robust action segmentation, outperforming previous methods.
Efficiently captures both short-term and long-term temporal dependencies without the excessive computational cost typical of full-transformer models.
Scalable to long-duration videos due to reduced memory and processing demands, enabling its use in real-world, high-volume video analytics.
Flexible and adaptable attention window sizes allow optimization based on available computational resources.
Can achieve high accuracy while using fewer model parameters, leading to faster training and inference.
Supports overlapping frame analysis, improving segmentation quality and reducing boundary errors.

Technical Classifications (CPCs)

Main Classifications

Physics & Measurement

Sub Classifications

Computing & Calculating

CPC Codes

G06V10/82G06V20/49G06V20/52G06V20/70

Inventors & Applicants

Inventors

Applicants

Toyota Motor Co Ltd

Univ Bonn Rheinische Friedrich Wilhelms

Patent Abstract

A computer-implemented method for action segmentation of a video sequence, the method comprising: selecting a plurality of local segments (113) from the video sequence, each local segment comprising a plurality of successive frames of the video sequence; (S12) processing each local segment with a first attention transformer (114) in order to generate first enhanced feature maps (116) modelling relations between local segments (113); (S20) selecting a plurality of global segments (123) from the first enhanced feature maps (116), each global segment comprising a plurality of non-successive frames of the first enhanced feature maps (116); (S22) processing each global segment with a second attention transformer (124) in order to generate second enhanced feature maps (126) modelling relations between global segments (123); and (S30) assigning an action class to each frame of the video sequence based on the second enhanced feature maps (126).

Key Information

Publication No.

EP4481696A1

Family ID

87047920

Publication Date

2024-12-25

Application No.

EP23181135A

Application Date

2023-06-23

Priority Date

2023-06-23

Granted

Possible Cooperation

For further information please contact the transfer office.

See full document in Espacenet