Readit News logoReadit News
omernivro commented on σ-GPTs: A new approach to autoregressive models   arxiv.org/abs/2404.09562... · Posted by u/mehulashah
omernivro · 2 years ago
This is an interesting study. A similar permutation approach appears already in the Taylorformer paper (https://arxiv.org/pdf/2305.19141v1). The authors use a Transformer decoder for continuous processes, like time series. During training, each sequence is shuffled randomly. Each sequence element has a positional encoding. Then, they use log-likelihood on the shuffled sequence. There, the permutation helps with predictions for interpolation, extrapolation and irregularly sampled data. Also, they show it helps with 'consistency', i.e., roughly the MSE is the same regardless of the generated order.

What might this paper add to our understanding or application of these ideas?

The idea of permuting the sequence order also appears in the Transformer Nerural Process paper: https://arxiv.org/pdf/2207.04179.

u/omernivro

KarmaCake day1June 14, 2023View Original