HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding. (arXiv:2205.09753v1 [cs.AI])

One essential task for autonomous driving is to encode the information of a
driving scene into vector representations so that the downstream task such as
trajectory prediction could perform well. The driving scene is complicated, and
there exists heterogeneity within elements, where they own diverse types of
information i.e., agent dynamics, map routing, road lines, etc. Meanwhile,
there also exist relativity across elements – meaning they have spatial
relations with each other; such relations should be canonically represented
regarding the relative measurements since the absolute value of the coordinate
is meaningless. Taking these two observations into consideration, we propose a
novel backbone, namely Heterogeneous Driving Graph Transformer (HDGT), which
models the driving scene as a heterogeneous graph with different types of nodes
and edges. For graph construction, each node represents either an agent or a
road element and each edge represents their semantics relations such as
Pedestrian-To-Crosswalk, Lane-To-Left-Lane. As for spatial relation encoding,
instead of setting a fixed global reference, the coordinate information of the
node as well as its in-edges is transformed to the local node-centric
coordinate system. For the aggregation module in the graph neural network
(GNN), we adopt the transformer structure in a hierarchical way to fit the
heterogeneous nature of inputs. Experimental results show that the proposed
method achieves new state-of-the-art on INTERACTION Prediction Challenge and
Waymo Open Motion Challenge, in which we rank 1st and 2nd respectively
regarding the minADE/minFDE metric.

Source: https://arxiv.org/abs/2205.09753


Related post