Breakthroughs in Time Series Forecasting at Neurips 2020

Author : amitkumar
Publish Date : 2021-01-09 17:06:54


https://www.iveycases.com/kpf/video-seahawks-v-rams-live-2.html
https://www.iveycases.com/kpf/video-seahawks-v-rams-live-h.html
https://www.iveycases.com/kpf/video-seahawks-v-rams-live-hd.html
https://www.iveycases.com/kpf/video-seahawks-v-rams-live-Q.html
https://www.iveycases.com/kpf/video-seahawks-v-rams-live-Qa.html
https://www.willseyeonline.org/dox/vid-bil-v-colt-001.html
https://www.willseyeonline.org/dox/vid-bil-v-colt-tv.html
https://www.willseyeonline.org/dox/vid-bil-v-colt-tvc4.html
https://www.willseyeonline.org/dox/vid-bil-v-colt-t.html
https://www.willseyeonline.org/dox/vid-bil-v-colt-tvc.html
https://tools.ispor.org/wbs/files/vid-bucca-v-washi-liv.html
https://tools.ispor.org/wbs/files/vid-bucca-v-washi-liv-nfl.html
https://tools.ispor.org/wbs/files/vid-bucca-v-washi-liv-nfl-00.html
https://tools.ispor.org/wbs/files/vid-bucca-v-washi-liv-nfl-0.html
https://tools.ispor.org/wbs/files/vid-bucca-v-washi-liv-nfl-2K1.html
https://www.cmaeurope.org/vos/Seah-v-Ram-2-nfl-2k21.html
https://www.cmaeurope.org/vos/Seah-v-Ram-2-nfl-2k21-01.html
https://www.cmaeurope.org/vos/Seah-v-Ram-2-nfl-2k21-012.html
https://www.cmaeurope.org/vos/Seah-v-Ram-2-nfl-2k21-n.html
https://www.cmaeurope.org/vos/Seah-v-Ram-2-nfl-2k21-nf.html


As the creator/maintainer of an open-source framework, both myself and our core contributors have to constantly weigh the time necessary to add new models and methods versus the benefits for our end users. At flow forecast, as a framework that serves both businesses and researchers, we have a seemingly contradictory mission: we want to rapidly add the latest state of the art deep learning for time series forecasting/classification research papers while, simultaneously, providing stability, ease of use, interpretability, robustness, and reliability to our overall end users (who are often not familiar with the latest research or how to effectively leverage them with respect to their business problems). In other words, we want to constantly incorporate the most complex techniques while still keeping our framework easy to use.
Therefore, deciding which papers to port to flow forecast while balancing other priorities is difficult. Here I breakdown what we are working on in terms of possible papers to integrate from Neurips. Of course, if any of you readers have time and want to contribute porting one of these papers it would be greatly appreciated. Additionally, please note that these readings are not an overall indication of how good a paper is. Rather they are an evaluation of how well they would fit into our framework based on their performance (including which datasets they were tested on), complexity to port, relevance to our users’ use cases, and speed.
Benchmarking Deep Learning Interpretability in Time Series Predictions
Video Link (by the way I believe you can only see video links if you registered for the conference unfortunately)
Summary: This is an interesting paper that discusses common flaws with deep learning for time series interpretability methods. The authors describe how most saliency methods suffer from two major problems: saliency methods often break down with respect to multiple time steps; and model architecture plays a big role in the quality of the methods. To address these problems the authors purpose a framework called Temporal Saliency Rescaling (TSR). TSR operates as follows:
(a) we first calculate the time-relevance score for each time by computing the total change in saliency values if that time step is masked; then (b) in each time-step whose time-relevance score is above a certain threshold, we calculate the feature-relevance score for each feature by computing the total change in saliency values if that feature is masked. The final (time, feature) importance score is the product of associated time and feature relevance scores.
Code/Quality: The authors do provide an implementation of their code with the paper. The code for the most part seems to be built around Captum, a PyTorch based framework for interpretation of DL models, which is good. But, it will definitely require a fair amount of refactoring/stylistic changes. I think it could be done in around two weeks of focused work and testing. Our framework already includes Shap but incorporating Captum along with their methods could also provide a great extension.
Relevance for our users: Interpretability or lack thereof remains probably one of the most common criticisms I hear of DL models for time series over more classical models. I cannot tell how many times I’ve heard the (IMO ignorant line) “we would use deep learning but we need to be able to explain our decisions to stakeholders. We can’t have a black box…” Therefore any model that increases interpretability is great. Similarly, on the research side of things, I think finding better visualizations of models is a budding area.
Performance on Datasets: The authors use synthetic datasets where they already know the important features to evaluate the quality of the predictions. In addition, they try their TSR method on several real world datasets such as FMRI classification data (a sequence of FMRI images). They find TSR performs better than the vanilla methods at providing interpretable/accurate saliency maps.
Final Verdict: Including better interpretability methods is a major focus area for our project. Making it easier to explain model decisions to third parties and even allow ML engineers themselves to debug models better is a pressing problem. I think this paper is a step in the right direction. I do wish it wasn’t limited in scope to just classification as we are primarily a forecasting repository. So I’m not sure how well their techniques will generalize to forecasting problems. That said, I think porting this over will help our framework. So at the end of the day I would say this is a high priority.
Adversarial Sparse Transformer for Time Series Forecasting
Video Link (none)
Summary:The paper addresses the accumulation error problem when conducting multi-step forecasting (i.e., this is essentially when we append the model’s own output to the real values and use it to forecast subsequent time steps). The paper also addresses creating more diverse forecasts with multiple ranges of values. To address these problems the author purposes using GANs. It is one of the first articles that I have seen that describes using GANs for forecasting. The GANs are used as a method of regularizing multi-step time series predictions. They work in conjunction with a sparse attention mechanism that utilizes an ent-max activation function instead of Softmax. This allows the network to better learn long range dependencies in time steps and, in particular, which steps aren’t important at all.
Code/Style: There is an implementation of the paper located here. I’m not sure if it is the official implementation of the paper or not. The code quality appears decent at first glance. That said, I still think there would be a lot of difficulties porting the model fully to our framework at the moment. For one thing we would have to modify the main training module in flow forecast to be able to work with GAN like architectures. Currently, our training module assumes we only have a single model and loss function. This would require either adding if/else blocks or creating a function that maps model type to a training loop (if we envisioned many models consisting of multiple losses).
Relevance for our users: Most of our users are interested in increasing the performance of their models; however, at the same time, even within the deep learning space, simpler models with fewer hyper-parameters are preferred. Given the complexity associated with converging GANs and the fact that many in corporate data scientist positions likely have very limited knowledge of GANs, I don’t see many using this model. Even within the ML time series research community I don’t see that much work directly building off the GAN architecture, except maybe for using the sparse attention mechanism (though I have been wrong before).
Performance on Datasets: The model performs better than the Convolutional Transformer (which BTW we have implemented in flow forecast) as well as several other models on the traffic and electricity datasets. My major grip here is that the authors don’t directly compare performance with Temporal Fusion Transformers. When you compare their model with TFT their performance improvements are in many cases negligible. For instance when you compare TFT with AST on traffic we see TFT achieved 0.095 compared to their 0.093+/- .01. Additionally, I overall don’t like the trend of time series forecasting papers that only use these simple univariate time series datasets. At minimum, call you paper AST for Univariate Time Series Forecasting because there are no results that demonstrate a broader applicability.
Final Verdict: Adding code for training I think would be very difficult as I would have to modify several modules in Flow-Forecast. Additionally, GANs are notoriously difficult to train and performance gains seem marginal. That said, I really do like the idea of utilizing sparse attention and finding ways to address the compound error of multi-step predictions. I might also implement sparse attention and allow people to utilize it as a swappable parameter. Altogether though I’d say adding this full model is relatively low priority for our team. However, that doesn’t mean I wouldn’t like it added to the repository. If anyone is interested in porting it let me know!
Probabilistic Time Series Forecasting with Structured Shape and Temporal Diversity
Video Link
Summary: This paper addresses a dilemma where common time series loss functions (MSE, RMSE, etc) result in potentially accurate predictions but fail to gauge uncertainty and cause the model to fall apart when the distribution shifts. On the other hand, loss functions such as Gaussian or Quantile loss often fail to provide narrow enough predictions. To rectify this problem the author purposes STRIPE. STRIPE basically is able to produce a diverse set of potential forecasts each of which have the added benefit of being sharper (i.e. so you just don’t have a huge confidence interval). This paper is by t



Catagory :general