Mathematics | Studies, essays, thesises » Bálint Plangár - EMD and wavelet decomposition based denoising and forecasting of crude oil prices

Datasheet

Year, pagecount:2019, 63 page(s)

Language:English

Downloads:1

Uploaded:May 25, 2024

Size:3 MB

Institution:
[BCE] Corvinus University of Budapest
[ELTE] Eötvös Loránd University

Comments:

Attachment:-

Download in PDF:Please log in!



Comments

No comments yet. You can be the first!

Content extract

Eötvös Lóránd University Corvinus University of Budapest EMD and wavelet decomposition based denoising and forecasting of crude oil prices MSc thesis Author: Supervisor: Bálint Plangár Milán Csaba Badics May 10, 2019 ACKNOWLEDGEMENT Firstly, I would like to express my sincere gratitude to my advisor Milán Csaba Badics for the continuous support of my research, for his patience, motivation, immense knowledge and critical mindset. His guidance helped me in all the time of research and writing of this thesis. The door to Milán’s office was always open whenever I ran into a trouble spot or had a question about my research or writing. He consistently allowed this paper to be my own work, but steered me in the right direction whenever he thought I needed it. I could not have imagined having a better advisor and mentor for my research project. NYILATKOZAT Név: Plangár Bálint ELTE Természettudományi Kar, szak: Biztosítási és pénzügyi matematika NEPTUN

azonosító: JL3QFB Szakdolgozat címe: EMD and wavelet decomposition based denoising and forecasting of crude oil prices A szakdolgozat szerzőjeként fegyelmi felelősségem tudatában kijelentem, hogy a dolgozatom önálló munkám eredménye, saját szellemi termékem, abban a hivatkozások és idézések standard szabályait következetesen alkalmaztam, mások által írt részeket a megfelelő idézés nélkül nem használtam fel. Budapest, 2019.0510 a hallgató aláírása Tab le o f Co nte nts 1 . I nt ro duct io n 1 2 . L it er at ur e r evie w 4 3 . Cr it ica l r e view 10 4 . Resear c h fr a mewo r k fo r fina nc ia l t ime s er ies fo r ecast ing 12 5 . P o ss ible r esear c h quest io ns

21 6 . Deco mpo s it io n met ho ds 22 6. 1 E mp ir ica l mo de deco mpo s it io n 22 6. 2 D is cr et e wave let based deco mpo s it io n 26 7 . Dat a 30 8 . E mp ir ica l a na l ys is a nd r esu lt s 36 8. 1 P r ed ict io n st r at egy 36 8. 2 P r ed ict io n mo de l 40 8. 3 Resu lt s 40 9 . Ro bust nes s check 48 1 0 . Co nc lu s io n 51 Re fer e nc es .

54 List o f figures 1. FIGURE: SIMPLIFIED REPRESENTATION OF THE FOUR BROAD RESEARCH DESIGNS, SOURCE: OWN FIGURE . 14 2. FIGURE: RESEARCH FRAMEWORK OF FINANCIAL TIME SERIES FORECASTING, SOURCE: O WN FIGURE . 19 3. FIGURE: PLOTTING THE ENVELOPE AND THEIR MEAN, SOURCE: METATRADER, 2012 25 4. FIGURE: COMPARISON OF TRANSFORMATIONS, SOURCE: ULIHA, 2016, 512P 26 5. FIGURE: PROCESS OF WAVELET DECOMPOSITION, SOURCE: MIRZAEI ET AL, 2010, 303P 29 FIGURE 6: BRENT CRUDE OIL PRICES AND RETURNS FOR THE ENTIRE SAMPLE, SOURCE: O WN FIGURE . 31 7. FIGURE: NUMBER OF IMFS DURING THE ESTIMATION PERIOD USING EXPANDING WINDOW, SOURCE: OWN FIGURE . 32 8. FIGURE: COMPONENTS OF BRENT CRUDE OIL GENERATED BY EMD ON THE ENTIRE SAMPLE, SOURCE: OWN FIGURE . 33 9. FIGURE: IN-SAMPLE COMPONENTS OF BRENT CRUDE OIL GENERATED BY EMD, SOURCE: O WN FIGURE . 34 10. FIGURE: COMPONENTS OF BRENT CRUDE OIL GENERATED BY EMD DURING THE RECESSION, 2006.09 – 201009, SOURCE: OWN FIGURE 35

11. FIGURE: PREDICTION PROCESS, SOURCE: O WN FIGURE 36 12. FIGURE: SELECTED RESEARCH DESIGN FOR EMPIRICAL MODE DECOMPOSITION, SOURCE: O WN FIGURE . 37 13. FIGURE: RATIO OF SIGNIFICANT LAGS IN THE FIRST THREE IMFS, SOURCE: O WN FIGURE 41 14. FIGURE: TYPICAL VALUES OF PERMUTATION ENTROPY ESTIMATED FROM DENOISED SIGNALS, SOURCE: OWN FIGURE . 42 15. FIGURE: NUMBER OF DROPPED IMFS BASED ON SAMPLE ENTROPY AND NUMBER OF GENERATED IMFS USING EXPANDING WINDOW, SOURCE: O WN FIGURE . 43 16. FIGURE: NUMBER OF DROPPED IMFS BASED ON SHANNON ENTROPY AND NUMBER OF GENERATED IMFS USING EXPANDING WINDOW, SOURCE: O WN FIGURE . 43 17. FIGURE: NUMBER OF DROPPED DETAIL COMPONENTS BASED ON SHANNON AND SAMPLE ENTROPY USING EXPANDING WINDOW, SOURCE: O WN FIGURE . 45 18. FIGURE: DENOISED SIGNALS AND THEIR PERMUTATION ENTROPY USING WAVELET DECOMPOSITION, SOURCE: O WN FIGURE . 46 19. FIGURE: CUMULATIVE RSE OF THE TWO BEST PERFORMING MODELS THROUGHOUT THE OUT-OFSAMPLE PERIOD, SOURCE: O WN FIGURE 48 20. FIGURE:

HISTOGRAMS OF THE NUMBER OF IMFS USING ROLLING WINDOW, SOURCE: O WN FIGURE . 49 21. FIGURE: PERMUTATION ENTROPY BASED NOISE SELECTION IN CASE OF EMD, SOURCE: OWN FIGURE . 50 22. FIGURE: NUMBER OF DROPPED DETAIL COMPONENTS BASED ON SHANNON AND SAMPLE ENTROPY USING WEEKLY DATA AND ROLLING WINDOW, SOURCE: O WN FIGURE . 50 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár 1. Intro d uctio n Signal processing is long-known technique for analyzing and detecting hidden components in a measured signal. It has been applied mainly in the field of electrical engineering, however signal processing has several application fields for example processing or interpreting spoken words (Smith et al., 2017), processing pictures or videos (Baimbetov, 2015). It can be used also for image or video compression (Berres et al., 2017) and noise reduction (Boukhayma et al, 2016) Signal decomposition is a useful technique for applying noise reduction or

analyzing the original time series in a less complicated representation. The most commonly used strategy is the ‘divide and conquer’ strategy, which is a decompositionensemble learning paradigm. The strategy divides the original time series into meaningful components then predicts the components instead of the original time series. The decomposition-ensemble models show better performance than the conventional single models. Signal decomposition is also useful for noise reduction which helps focusing on the most important components of the time series. Noise reduction means dropping a component or components after decomposition. Studies showed that noise reduction can guarantee high superiority in data fitting, resulting in better prediction performance. (Jammazi & Aloui, 2012) (Guo et al., 2012) (Harris & Yilmaz, 2009) We can think of signal processing (decomposition, noise reduction) as the preprocessing stage of model building. Financial time series have the

characteristics of complex nonlinearity, dynamic variation, high irregularity and non-stationarity (Watkins and Plourde, 1994) (Krichene, 2007) (Zhang et al, 2015). That is why conventional financial econometric tools (ARIMA, GARCH, VAR etc.) are not efficient methods for describing financial time series. Even machine learning models failed to fit the data and produce satisfying prediction results. Due to the benefits of signal processing several studies applied signal decomposition in the field of economic/financial time series prediction. The advantages of signal processing turned the attention of researchers to the methods of signal processing. The traditional forecasting strategies can be generally described as (Tang et al. 2014): ��+ℎ = �(�� ) + �� 1 (1) EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár where �� denotes the value of a time series at time t, h is the prediction horizon, �� = {��−1

, , ��−� } are the past values of the original time series and �� is the prediction errors following independent and identical distribution. Based on the f function design and the parameter evaluation methods, the existing models for crude oil price forecasting can fall into three main types (Yu et al., 2015): (1) traditional econometric models with relatively simple fixed functions and strict data assumptions for example auto-regressive integrated moving average (ARIMA) (Xiang & Zhuang, 2013), generalized autoregressive conditional heteroscedasticity (GARCH) (Nomikos & Andriosopoulos, 2012), vector auto-regression (VAR) (Mirmirani & Li, 2005) or error correction models (ECM) (Lanza et al., 2005) (2) Machine learning techniques with flexible functions and self-learning capability such as artificial neural networks (ANN) (Guo et al., 2012), support vector machines (SVM) (Kim, 2003) or support vector regressions (SVR) (Lin et al., 2012) (3) Hybrid models

combining several single models. (Yu et al, 2015) Nevertheless these techniques gradually infiltrated into the field of financial time series analysis because these methods are able to represent smooth and also volatile functions in a way that can obtain time and frequency information of a time series (Yousefi et al., 2005) (Guo et al, 2012) (Bekiros, Marcellino, 2013) The prediction of oil prices are even more challenging than the prediction of financial time series, since the price of oil is strongly influenced by many factors, which can cause large-scale price movements for example political events, investors’ expectations about the future, weather or economic reports of top oil producing countries. Oil price series forecasting receives a great attention, since oil price plays an important role in the world economy (Guan et al., 2016) (H-Y Zhang et al 2015) (Juvenal, Petrella, 2014). Crude oil is among the most important energy resource since it is the world’s most dominant

fuel, making up just over a third of all energy consumed (BP, 2018). Furthermore crude oil is also the world’s largest and most actively traded commodity, Brent crude oil and West Texas Intermediate are among the top three most traded commodities in the world (FIA, 2018). Although the current literature of traditional financial econometric forecasting has promising results, the tools of signal processing are not widespread in financial econometric researches. Nonetheless one can find several mistakes in the literature, which make the reproduction and comparability of articles difficult. There is no general research framework, which can help categorize the articles. It is not possible to recreate most of the papers because they lack the necessary parameters, data or program. It is a common 2 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár mistake that papers ignore the look-ahead bias, since their models use future information.

Some papers do not specify the window size or type (rolling or expanding) and the hyperparameter optimization method, furthermore researchers rarely emphasize the sensitivity of the applied method to window type and size. The selection of benchmark models are often not designed properly, frequently a flexible model is compared to a relatively simple one. It is a frequent mistake in the literature that the differences between EMD and wavelet are analysed with different prediction models, consequently the partial effects (decomposition, noise reduction etc.) are not described thoroughly Some papers overcomplicate their prediction models using for example neural network for the decomposed data and another neural network for the predictions of the previous one. There are papers which reconstruct the components into low-medium-high frequency components or low-medium-high-trend components etc. however often there is no analysis on the optimality of the reconstruction method. The model

comparison often lacks statistical hypothesis testing. In most of the papers an economic evaluation based on the prediction model (portfolio selection, Sharp-rate etc) or robustness check (different frequencies, volatile vs calm period) are missing. Some papers compare their prediction models based only on one time series data and they choose a relatively short out-ofsample period. Given the available literature, the paper’s contribution is threefold: (1) the paper provides a thorough literature review based on the most important articles, (2) the paper introduces a general research framework which describes the possible research designs in decomposition based economic/financial time series forecasting and classifies the articles introduced in the literature review with the help of the general research framework, (3) the paper compares PACF, entropy and the expert judgement based noise selection methods in terms of their contribution to prediction accuracy. The remainder of this

paper is organized as follows. Section 2 summarizes the most important studies that has been carried out in decomposition based financial time series forecasting. Section 3 provides a critical review of the literature, focusing on the factors that restrain papers’ reproducibility and comparability. Section 4 describes the general research framework for the current and future literature of financial time series forecasting. Section 5 provides some of the possible research questions and designs that can be formulated based on the framework. The applied methodologies including advanced techniques and the research framework of the paper are described in section 6. After that section 7 introduces the research design and the time series data, which is 3 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár followed by the description of the empirical analysis and results in section 8. Finally, section 9 provides a robustness check for the

prediction models and section 10 concludes the paper. 2. Literature re view This section summarizes the most important studies that has been carried out in decomposition based economic/financial time series forecasting. All the papers apply one of the decomposition methods from wavelet or empirical mode decomposition family. The main purpose of this section is to describe the trends in financial time series forecasting, focusing particularly on the differences of signal processing methods. Yousefi et al. (2005) illustrated an application of wavelets as a possible technique for investigating the issue of market efficiency in futures markets for crude oil. They introduced a wavelet-based prediction procedure to provide forecasts for the spot price over the horizons of one, two, three and four months. The results of their models are compared with data from the actual futures markets for oil. The relative performance of this procedure is used to investigate whether futures markets are

efficiently priced. They used average monthly WTI spot prices and NYMEX futures prices, the data covers the period 1986-2003. The Daubechies’ wavelet of order seven and a five level wavelet decomposition is applied as a prediction model. The predictions are calculated as an extension of the decomposed data on each level, then the authors reconstructed the data with the help of inverse wavelet transform. For the approximation level a spline fit, while for the lower detail levels a trigonometric fit is applied. Researchers came to the conclusion that the futures market might not be efficiently priced, since the wavelet-based predictions of spot prices were closer to the real spot prices than actual futures prices. Jammazi and Aloui (2012) implemented the dynamic properties of multilayer back propagation neural network (MBPNN) and the Harr A trous wavelet with six level decomposition to achieve prominent prediction of crude oil prices. They use monthly WTI crude oil spot prices to

generate out of sample forecasts, the data covers the period from 1988 to 2010. They choose the prediction horizon to be 19 months and the conventional MBPNN as a benchmark model. To ameliorate the fitting ability of the MBPNN, the high frequency components (D1 – D6) are dropped and only the smoothed signal is used for model building. The inverse wavelet transform is applied for the smoothed component to reconstruct smoothed WTI prices. They came to the conclusion 4 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár that reducing excess noise from WTI price can ameliorate the fitting ability of the MBPNN, since the hybrid model outperformed the standard MBPNN model. Bekiros and Marcelino (2013) used a shift-invariant wavelet transform to analyze the dependence structure and predictability of currency markets across different timescales. Their study attempts to probe into the micro-foundations of across-scale causal heterogeneity

on the basis of trader behavior with different time horizons. They use three time series of daily closing currency rates, namely EUR/USD, YPN/USD and GBP/USD. They calculated the foreign exchange returns and realized volatility series for model building purposes and they chose random walk as a benchmark model. The data span a time period from 1999.0105 to 20100510 (2960 observations) The researchers determined the optimal level of multiscale decomposition with respect to the minimization of the Shannon entropy-related criterion. They used different models for the approximation and the details. For the approximation level (A4) a cubic spline fit, while for the details (D1-D4) ARIMA is applied to extend the decomposed signal. The prediction procedure includes the following steps: invariant transformation with the SIDWT, boundary extension with spline and ARIMA, reconstruction of the wavelet series with inverse SIDWT, finally the out-of-sample forecasts for one day to five day are

obtained and compared to the prediction calculated from neural network. The authors showed that the application of wavelet decomposition and artificial neural networks provided enhanced predictability. Yu et al. (2008) proposed an empirical mode decomposition (EMD) based neural network ensemble learning paradigm for forecasting crude oil spot prices. They used daily WTI and Brent crude oil from the period of 1986-2006. After the original crude oil spot series were decomposed, a three-layer-feedforward neural network (FNN) model was used to model each of the extracted IMFs. After that an adaptive linear neural network (ALNN) was applied to formulate an ensemble output for the original crude oil price series. The following models were used as benchmarks: EMD-FNN-Averaging, EMD-ARIMA-ALNN, EMD-ARIMA-Averaging, Single FNN, and single ARIMA. The authors’ results show that the decomposition-and-ensemble strategy can effectively improve the prediction performance based on RMSE and deviation

statistics. They also show that EMD is a meaningful tool for prediction performance improvement. Lin et al. (2012) proposed a hybrid forecasting model using EMD and least squares support vector regression (LSSVR) for foreign exchange rate forecasting. The LSSVR is constructed to forecast each IMFs and the residual value individually and then all these 5 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár forecasted values are aggregated to produce the final forecasted value for foreign exchange rates. This is a typical application of ‘divide and conquer’ strategy Daily USD/NTD, JPY/NTD and RMB/NTD exchange rates are used and the data covers the period 2005.0701 – 20091231 The researchers use the following benchmark models: EMD-ARIMA, single LSSVR and single ARIMA without time series decomposition. Their results show that the proposed EMD-LSSVR model outperforms the benchmark models based on various statistical performance

measures. Xiong et al. (2013) proposes a hybrid model built on EMD based on the feedforward neural network (FNN) modeling framework incorporating the slope based method (SBM). The slope based method is proposed to restrain the end effect that occurred during the shifting process of EMD. The authors examine the iterated, direct and multiple-input multiple output (MIMO) forecasting strategy. After the original crude oil spot series were decomposed, a three-layer-feedforward neural network (FNN) model was used to model each of the extracted IMFs. This was followed by the application of another FNN to formulate an ensemble output for the original crude oil price series. Weekly data from the WTI crude oil spot price are used between the period of 2000.0107 – 2011.1230 They examine several prediction horizons including 4, 8, 12, 16, 20 and 24. The researchers use the following models as benchmarks: single FNN without EMD, naïve random walk without EMD and EMD-FNN without SBM. The results

indicate that the proposed EMD-SBM-FNN model using the MIMO strategy is the best in terms of prediction accuracy. Shu-ping et al.’s (2014) study incorporates the idea of decompositionreconstruction-ensemble The new insight of their paper is to use the run length judgement method to reconstruct the component sequences based on the characteristics of the components. They built a multiscale combined forecasting model based on EMD They apply ANN and SVM as prediction models. Monthly spot price of WTI crude oil from January 1986 to November 2013 is selected. The oil price series was decomposed and reconstructed into high-, medium-, low frequency and trend sequences. They use ANN model for the high frequency, SVM for medium and low frequency individually and ARIMA for the trend component. The authors apply another SVM to formulate an ensemble output for the original time series. In their analysis the researchers apply the run length judgement method, which is a potential tool for noise

selection, however they do not drop any components. Their model generated out of sample prediction for 12 and 23 periods ahead. They came to the conclusion that the multiscale combined model 6 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár obtained the best forecasting result compared with single ARIMA, Elman, SVM and GARCH and combined models including ARIMA-SVM and EMD-SVM-SVM method. Yu et al. (2015) proposed a decomposition-ensemble methodology with datacharacteristic driven reconstruction for crude oil price forecasting to enhance prediction accuracy and reduce computation complexity. Four main steps are involved in the study: data decomposition for simplifying the complex data, component reconstruction based on data-characteristic driven modeling, individual prediction for each reconstructed component and ensemble prediction for final output. The weekly crude oil prices in the WTI and Brent markets are used, the data covers

the period January 1986 and July 2014. They analyze multiple reconstruction methods including run length judgement, fine to coarse and sample entropy reconstruction. Besides, numerous benchmark models are applied to test their proposed method including typical decomposition-ensemble models without reconstruction and similar decomposition-ensemble models with existing reconstruction strategies. The authors tested the proposed method with several prediction horizons including 1,2,3 and 4 weeks. The results indicate that the data-characteristic driven reconstruction approach improves the existing decomposition-ensemble techniques based on statistical performance measures and computational time. Zhu et al. (2016) developed an adaptive multiscale ensemble learning paradigm incorporating ensemble empirical mode decomposition (EEMD), particle swarm optimization and LSSVM with kernel function prototype. Three main steps are involved in the study: with the help of extrema symmetry expansion

EEMD (ESE-EEMD) the original oil price series is decomposed, after that the authors applied the fine-to-coarse reconstruction algorithm in order to identify the high frequency, low frequency and trend components. Different prediction models are used for each of the components, ARIMA is used to predict the high frequency components, LSSVM is used to predict the low frequency and trend components, finally the prediction results of all components are aggregated. The article analyzes three energy price series including daily WTI crude oil The study applies the fine-to-coarse method which can be used for noise selection. Numerous benchmark models are applied, including typical decomposition-ensemble models without reconstruction and similar decomposition-ensemble models with existing reconstruction strategies. The results indicate that the proposed method can significantly improve the level and directional prediction accuracy. Lahmiri (2016) presents a new time series forecasting model

which integrates variational mode decomposition (VMD) and general regression neural network (GRNN). 7 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár Three benchmark models are applied: EMD-GRNN, FFNN and ARIMA. Daily data of WTI, CANUS and the Volatility index from 2008.0102 to 20131216 are used to conduct the experiments. Two main steps are involved: EMD or VMD is applied to the original data to obtain components, then they will be fed to the GRNN for forecasting purpose. The researchers demonstrated the superiority of the VMD-based method over the three competing prediction approach, consequently VMD is an effective technique for analysis and prediction of economic and financial time series. VMD has the ability to separate tones of similar frequencies and it is more robust to noisy data contrary to EMD. Table 1. summarizes the main articles mentioned in this section and gives extra details of the research papers. The last three

rows of the table contains articles that are not mentioned in the literature review however the decomposition method they use can be useful for financial time series forecasting. This section introduced the most important research papers that readers can most frequently encounter. The above review helps the reader to become familiar with the current trends in financial time series forecasting particularly with the decomposition based prediction strategies. Although the current literature of traditional financial econometric forecasting has promising results, the tools of signal processing are not widespread in financial econometric researches. In spite of the fact that one can find promising results in the literature, signal processing methods in economic/financial time series forecasting is not widespread because articles are not reproducible and comparable. A general research framework is missing from the literature, which could help categorize the articles, determine the necessary

parameters for reproduction and foster comparability of studies. 8 Author Data Frequency Window Decomposition method Stopping criterion Noise selection /Noise reduction Prediction horizon Aggregation Main prediction model 99 Yousefi et al. (2005) WTI spot price NYMEX futures Monthly 100 random samples Daubechies’ Wavelet Expert judgement Not used Signal processing inverse 1,2,3,4 spline, trigonometric fit Yu et al. (2008) WTI spot price Brent spot price Daily NaN EMD Residual based Not used Learning 1, 30 FNN (ensemble: ALNN) Jammazi & Aloui (2012) WTI spot price Monthly NaN Haar A Trous Wavelet Expert judgement Expert judgement / Drop D1-D6 Signal processing inverse 19 MBPNN Lin et al. (2012) FX rates Daily NaN EMD Residual based Not used Sum of components NaN LSSVR Guo et al. (2012) Wind speed Monthly/Daily NaN Modified EMD Residual based Expert judgement/ Drop more freq. Sum of components 1,18 FNN Bekiros &

Marcelino (2013) FX rates, volatility, return Daily Rolling Shift invariant DWT Expert judgement Not used Signal processing inverse 1,2,3,4,5 spline, ARIMA Xiong et al. (2013) WTI spot price Weekly Multiple window type SBM-EMD NaN Not used Learning 1,4,8,12,16,20, 24 FNN (ensemble: FNN) Shu-ping et al. (2014) WTI spot price Monthly NaN EMD NaN Not used Learning 12,23 Xiong et al. (2014) NN3 competition Monthly NaN EMD, Daubechies’ Wavelet Expert judgement Not used Learning 1 Yu et al. (2015) WTI spot price Brent spot price Weekly NaN EEMD NaN Not used Sum of components 1,2,3,4 LSSVR, ANN Zhu et al. (2016) WTI spot price CO2 EUA Daily Rolling ESE-EEMD Expert judgement Not used Sum of components 1 ARIMA, LSSVM Lahmiri (2016) WTI spot price FX rates, VIX Monthly/Daily NaN VMD Residual based Not used Learning 1 GRNN Afanasyev & Fedorova (2016) Power exchange Daily Rolling CEEMDAN Expert judgement Not used NaN

1 NaN 1. Table: Details of the research papers described in the literature review, Source: Own table SVM,NN,ARIMA (ensemble: SVM) SVR (ensemble: SVR) EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár 3. Critica l re view The main purpose of this section is to provide a critical review of the literature focusing primarily on the mistakes and criticisms. The criticisms are formulated based on the research papers mentioned in the literature review. The section can help researchers to form an opinion on the results of studies and it can foster the application of signal processing in economic/financial time series forecasting. First and foremost, there is no general research framework, which summarizes the main results and conclusions of studies, the relevant research questions and the possible research designs. There is no review paper in the decomposition based forecasting literature, which summarizes the main results of studies,

the possible research designs and the relevant research directions. Consequently a general research framework can solve the aforementioned problems. This study intends to provide a research framework for the current and future literature of economic/financial time series forecasting in section 4. and it also provides the possible research questions and designs in section 5 It is not possible to reproduce most of the papers because they lack the necessary model parameters, data or code. In spite of the fact that decomposition based economic/financial time series forecasting studies have promising results, their external and internal validity is low. The most frequently missing elements of research descriptions are the window size and type used for decomposition and prediction. Data, packages/toolboxes used for the analysis and a model description with parameter selection should be provided. If studies were easily reproducible, a great progress could be made on their application in

economic/financial time series analysis. The stopping criterions of decomposition methods are not analyzed thoroughly, their instability is frequently ignored. Analyzing the connection between the number of components and the characteristics of a time series is a prerequisite to the spread of signal processing methods in economic/financial time series analysis. A well-designed robustness check can solve this problem. Changing the data type (return or price), window size and type or the frequency of the data provide solution for the problem. It is a common mistake that papers ignore the look-ahead bias, since their models use future information. This improves prediction accuracy and it gives an accurate prediction result, which is actually unreliable. The decomposition result is highly sensitive to the window therefore decomposing the entire time series, then utilizing this 10 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár at an

earlier prediction is a mistake. Consequently, it cannot be compared to traditional econometric tools. Based on the descriptions and figures provided in studies, researchers do not analyze the number of components during the prediction process, which is a crucial element of an analysis. A thorough analyses should be made on the number of components, since it changes as the window used for decomposition rolls or expands (e.g in case of EMD). Due to the fact that wavelet has a predetermined component number the information content of the component should be analyzed throughout the prediction horizon. It is rarely explained which model should be fit on components (low-, medium,high-frequency etc.) The statistical analysis of components are often missing from studies. Reconstructing the components into low-, medium-, high-frequency components is a frequently used method however it is difficult to explain why this method should work in general. Researchers should pay more attention to the

analysis of components (complexity, nonlinearity, structural breaks etc.) and choose the reconstruction method and forecasting models accordingly. This can foster the comparison of different decomposition methods. It is a frequent mistake in the literature that the differences between decomposition methods are analyzed with different prediction models, consequently the partial effects (decomposition, reconstruction, noise reduction) are not described properly. In this case it is impossible to decide whether the decomposition or the noise reduction improved the prediction accuracy. A properly designed study selects benchmark models in a way that can separate the positive effects of decomposition and noise reduction. Consequently the choice of benchmark models is crucial to the separation of partial effects. It is also a mistake that the out-of-sample time period is often too short. Out-ofsample evaluation shows how good the applied method is The longer the out-of-sample period is, the

more reliable the results become. Consequently it is worth choosing a long out-of-sample period and repeat the analysis both with rolling and expanding window. Statistical comparison of models are rarely done, therefore the significance of the difference of two prediction models is not checked. Diebold-Mariano test is the most frequently used, however using model confidence set is a better approach to test the difference of two models. Moreover in most of the papers an economic evaluation, based on the prediction models is missing (for example analyzing the differences between 11 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár Sharpe-ratios based on a portfolio selection). Statistical evaluation, per se, does not provide information about the economic efficiency and applicability of models. Studies usually miss robustness check. It can be easily done by changing the frequency of the data (intraday, daily, weekly etc.), using

price time series instead of return series or the window size and type can also be changed. Applying linear and nonlinear prediction models is also a good strategy to check robustness. Articles often ignore the analysis of decomposition and noise reduction in case of periods which have different characteristics (volatile, smooth, noisy periods). Using robustness check can strengthen the reliability of results and provides more information about the decomposition strategy. There are no articles in the decomposition based forecasting literature that apply simulation. A well-designed research is missing which analyses how noise reduction performs in case of time series with different characteristics. In case of simulation the data generating process can be controlled and a comprehensive analysis can be performed on decomposition and noise reduction. The definition of noise and its representation in a time series are not described thoroughly. The implementation of noise reduction can

differ in case of different prediction approaches. That is why these characteristics make the comparability of research papers more difficult. It is difficult to measure the partial effect, which stems from noise reduction, if the concept of noise is not described properly. The criticisms expressed in this section are the author’s own opinion and not part of any review papers. Nevertheless avoiding the aforementioned mistakes have several benefits. It can foster the application of signal processing methods in economic/financial time series forecasting and strengthen the reliability, validity of results. 4. Research fra mew ork fo r fina nc ia l time series forecasting The main purpose of this section is to provide a research framework for the current and future literature of financial time series forecasting. The section emphasizes how difficult the interpretation and reproduction of an article is without a proper general framework. It is a currently missing element of the

literature in spite of the fact that it has many advantages. The framework paves the way for comparing papers in the field of financial time series forecasting and gives a road map for future researches. 12 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár The framework provides the following advantages: (1) the proposed framework facilitates the aggregation of results of the current literature, consequently it helps us better understand the efficiency of signal processing techniques in financial time series analysis. Moreover it paves the way for a meta-study in which the current results can be combined. (2) It helps with the formulation of the research design, since it is easier to design your research if you know the general framework of the field, (3) it makes a great progress in comparing and classifying research papers, since it provides the necessary groupings for classification. (4) It helps researchers specify all the

necessary details or parameters of their research design thereby facilitating the paper’s reproduction, (5) it helps determine the reliability of the results presented in a research paper. All in all the framework fills a gap in the current literature which opens up the opportunities for further researches. Based on the papers described in the literature review there are four broad research designs. The designs are depicted on figure 1, in spite of the fact that figure 1 simplifies the research approaches, its perspicuity makes them easy to understand. As a first step all of the approaches involve the decomposition of the original signal into components. The first method applies noise selection and noise reduction in the second stage in order to enhance the prediction performance, then applies signal processing inverse in order to obtain the denoised signal. If the denoising method is well designed the resulting signal should be less complex and hopefully easier to predict. The

second approach also involves decomposition in the first stage, after that it predicts the future value of each components in the second stage, then applies signal processing inverse and obtains the predicted values. This research design gives us the possibility to predict different components with different prediction methods (e.g ANN for highly irregular components and linear regression for a smooth component). Reconstructing the components into low-mediumhigh components is a frequently applied technique in the literature The third research design involves decomposition in the first stage, prediction of the components in the second stage (reconstruction of the components can be used here as well), however instead of using the inverse method of signal processing it builds a new prediction model which uses the prediction of the second stage as input variables. This research design can put different weights on the predicted values of the second stage, however it is a complex prediction

approach and its application should be well-founded. The fourth research design applies first a decomposition method, then the components are used as input 13 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár variables for a prediction model. Of course the reconstruction of components into lowmedium-high components can be used here as well Nevertheless the fourth research design is the less frequently used design in the literature. 1. Figure: Simplified representation of the four broad research designs, Source: Own figure The detailed representation of the research designs is described on figure 2. This paper proposes the structure introduced on the figure as the general research framework for decomposition based financial time series forecasting. The dark green boxes represent the main stages of a prediction process. The first stage involves the data selection, the second stage is the decomposition, which is followed by the

frequency selection. After the frequency selection is done we arrive to the fourth stage which is the reconstruction. After that researchers should choose the number of models in the fifth stage, design the prediction thoroughly in the sixth stage and finally select the aggregation method. This framework is sufficiently detailed to categorize research papers, moreover it defines all the necessary parameters which should be given in any research paper in order to ensure comparability and replication. Furthermore, with the help of the framework, it is easier to determine what the research question of an article is. The first stage involves the selection of data type, data frequency and window. The return and price level should be separated due to the different characteristics of the same data expressed in returns and price level. Another important issue with the data selection 14 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár is

its frequency, because a model that is the best fit on weekly data is not necessarily the best on intraday data. The window size and type (rolling or expanding) should also be given, because some methods are highly sensitive to these parameters. The size and type of the window are the most frequently missing elements of a research design. Here bootstrap means selectin random samples of consecutive observations with equal length. The elements of the first stage are data type, frequency and window. They can be used for robustness check, which is rarely done in researches. The second stage is the decomposition. In this stage a broad method family, the exact decomposition method and the stopping criterion should be selected. In the current literature there are two frequently used approaches: empirical mode decomposition (EMD) and wavelet based decomposition. Both of the methods have improved modifications which have -in theory- better characteristics, however there are few papers in the

literature which analyze the partial effect of choosing an improved modification instead of the simplest version. These are listed in column II b) The variational mode decomposition (VMD) method in column II. a) has been proposed as an alternative of EMD to easily separate tones of similar frequencies in data where EMD fails. This paper lists separately VMD and EMD because VMD is based on a different algorithm. EMD is the simplest version of the decomposition family, it will be described later in this paper. The EMD modified with the slope based method intends to handle the end effect problem of the simple EMD, while the ensemble empirical mode decomposition (EEMD) intends to handle the potential mode mixing problem of EMD. The extrema symmetry expansion EEMD is the modified version of EEMD, which gives a solution for both mode mixing and end effect problem. Nevertheless EEMD introduces additional noise into the results of decomposition and does not produce stable number of IMFs after

applying to the same time-series. The complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is introduced in the literature to solve this problem. The third decomposition method described in this paper is the discrete wavelet transform. This method will be described later in detail. The most frequently applied wavelet is the Daubechies’ and the Haar wavelet in the literature. However the classical decimated DWT involves subsampling of the filter output to half the original length, which leads to a serious drawback, namely the transform is not shift invariant. Specifically, the DWT of a shifted signal is not the shifted version of the DWT of the signal. Nevertheless an undecimated DWT can be implemented without the subsampling technique, moreover 15 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár they are invariant to circularly shifting the time series. A new variation of the undecimated DWT, namely the

shift invariant DWT (SIDWT) is proposed in the literature. Besides being shift invariant, SIDWT employs a specialized periodic extension pattern to deal with boundary effects. However SIDWT is not an orthogonal basis, since it produces an over-determined representation of the series. The SIDWT method will be described later in this paper. After the decomposition method has been chosen a stopping criterion should be selected. Research papers which apply certain thresholds or ���2 � or determine the maximum number of shifting or use a predetermined order as a stopping criterion are classified in the expert judgement group. The residual based stopping criterion is applicable only in case of EMD family. This will be described later in this paper Some papers pursue an optimal decomposition with respect to the minimization of an entropyrelated criterion, which describes the information-relevant properties of the representation of a signal. The third stage is the frequency selection.

This stage starts with noise selection Here expert judgement contains all the papers that selected certain component or components as noise without analysis (e.g select the highest frequency component) A noise component can be selected with the help of partial autocorrelation function. PACF is a way to measure the linear relation of a time series with its own lagged values when the intermediate effects are filtered out. The run length judgement method is a tool for measuring the irregularity of a given signal. It assigns a run number to a signal and larger the number is, the higher the volatility is. Another way of selecting noise is to use an entropy related approach. The permutation entropy, the sample entropy and the Shannon entropy are possible tools for noise selection. There could be other entropy definitions as well, however column III. a) lists all that are mentioned in one of the papers in Table 1 Lot of papers do not apply any noise selection method, they are classified into

the ‘skip noise selection’ category. After the noise component or components are selected we can drop one or more components. Here a ‘no drop’ box is introduced in order to classify papers which skipped the noise selection procedure. The fourth stage is the reconstruction. In this stage one should select the reconstruction type and rule. Total aggregation means the aggregation of the components for the original level. It is a box for those articles which drop a noise component then aggregates the components and analyze the denoised time series later on. Several papers reconstruct the components into low, medium and high frequency components in order 16 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár to analyze different features of the signal separately and improve prediction performance. The ‘no reconstruction’ box is for those papers which do not use reconstruction. There are several reconstruction rules that can be

applied. Expert judgement incorporates all the papers that use reconstruction without analysis. The run length judgement method is the same as in the case of noise selection. This method can also be used as a reconstruction rule. In case of the data characteristic driven reconstruction rule, the decomposed modes are thoroughly analyzed to explore the hidden data characteristics (complexity, cyclicity, mutability, tendency) and are accordingly reconstructed. Fine to coarse reconstruction rule can be described as the following: high-pass filtering by adding fast oscillations (IMFs with smaller index) up to slow (IMFs with larger index). First we sum some components then we calculate t test to identify how many components can be summed up without departing significantly from zero. These components will be reconstructed into a high frequency component and the rest of the IMFs will be reconstructed into a low frequency component. A clustering method can be used as well on statistics

calculated from each components. In the fifth stage one should choose the number of models used for prediction. The ‘one model’ contains typically those papers which decompose the original data, drop a noise component then aggregate the components for the original level. In case of the ‘same models’ and ‘different models’ boxes researchers build multiple prediction models. For example, a paper that is classified into the ‘same models’ box decomposes the original data and builds ANN for each of the components, while a paper from the ‘different models’ box builds ANN for one component, SVM for another etc. The sixth stage is the prediction. Here the window, the prediction horizon, the prediction model, feature selection method and the hyperparameter optimization should be selected. Researchers can select the ‘same’ window if they want to use the same type as in the case of I. c) or it is possible to choose different The prediction horizon can be set to one or

multiple periods. Column VI c) lists all the prediction models that were used in papers introduced in Table 1. Feature selection lists the methods that can be used for selecting input variables. Here expert judgement contains all the papers that selected input variables without analysis. Using ANN typically involves the input selection through optimization on a validation set. It is important to point out that most of the papers do not use hyperparameter selection through optimization on a validation set, instead they use a predetermined model architecture. 17 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár The seventh stage is the aggregation. Here ‘no aggregation’ box is created for those researches that aggregated the decomposed signal in an earlier stage (first method) or use the components as input variables to predict directly the future value of the original signal (fourth method). The ‘prediction’ box incorporates

researches where a new model is fit on the predicted values obtained using each of the components. Some papers apply the inverse of the decomposition method at the end to obtain the predicted values (fourth method), these are classified into the ‘signal processing inverse’ box. It is the wavelet inverse or the summation in case of EMD family. Table 2. classifies the researches described in the literature review based on the general research framework introduced in this section. This paper assigns seventeen numbers for each of the researches based on their main prediction model. Every number represents a column from figure 2., the first number shows the data type, the second number the data frequency etc. and the last number represents the aggregation method, from each of the columns a number should be selected that is why a zero value is given to a column in case the researchers do not specify it. The first three numbers would be 12-0 in case of a paper which analyze daily return

data but there is no information written about the window. Some papers apply multiple prediction models or window type In this case more box numbers are given for the same column. This section of the paper introduced a general research framework which describes the possible research designs in decomposition based economic/financial time series forecasting. The framework can help compare papers in the field of economic/financial time series forecasting and gives a road map for future researches. Furthermore it defines all the necessary parameters that should be given for replication. Besides, this section also classified those research papers that were introduced in the literature review. Based on Table 1. the window size, type and the stopping criterion are the most frequently missing parameters from a research design. 18 19 2. Figure: Research framework of financial time series forecasting, Source: Own figure Author Title Category Wavelet-based prediction of oil prices

2-4-3| 3-6-1| 7-1| 3-7| 1| 1-12- 12 -1-1| 3 Yu et al. (2008) Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm 2-2-0| 2-1-2| 7-1| 3-7| 2| 0-12-6-1-2| 2 Lin et al. (2012) Empirical mode decomposition–based least squares support vector regression for foreign exchange rate forecasting 2-2-0| 2-1-2| 7-2| 3-7| 2| 0-0-7-1-2| 3 Jammazi & Aloui (2012) Crude oil price forecasting: Experimental evidence from wavelet decomposition and neural network modeling 2-4-0| 3-7-1| 1-2| 3-7| 1| 0-2-6-1-2| 3 Bekiros & Marcelino (2013) The multiscale causal dynamics of foreign exchange markets 1-2-1| 3-8-1| 7-1| 3-7| 1| 1-12 – 24 -3-1| 3 Xiong et al. (2013) Beyond one-step-ahead forecasting: Evaluation of alternative multi-step-ahead forecasting models for crude oil prices 2-3-12| 2-2-0| 7-2| 3-7| 2| 1-12-6-3-2| 2 Shu-ping et al. (2014) Multiscale Combined Model Based on Run-Length-Judgment Method and Its Application in Oil Price Forecasting

2-4-0| 2-1-0| 7-2| 2-2| 3| 0-2-468-3-2| 2 Yu et al. (2015) A decomposition–ensemble model with data-characteristic-driven reconstruction for crude oil price forecasting 2-3-0| 2-3-0| 7-2| 2-2345| 2| 0-12-67-1-12| 3 Zhu et al. (2016) An Adaptive Multiscale Ensemble Learning Paradigm for Nonstationary and Nonlinear Energy Price Time Series Forecasting 2-2-2| 2-4-1| 7-2| 2-4| 3| 1-1-47-1-2| 3 Lahmiri (2016) A variational mode decompoisition approach for analysis and forecasting of economic and financial time series 2-23-0| 12-1-2| 7-2| 3-7| 1| 0-1-6-1-1| 2 20 Yousefi et al. (2005) 2. Table: Classification of researches introduced in the literature review, Source: Own table EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár 5. Possib le researc h q uestio ns This section provides some of the possible research questions and designs that can be formulated based on the framework. The main purpose of this section is to briefly

introduce the research questions which can solve one of the problems mentioned in the critical review. These are potential articles which can make a significant progress in the economic/financial time series forecasting literature. Analyzing the stopping criterion of the EMD model family. This problem is important because the stopping criterion can influence the number of components. There are cases where the spline fit cannot be done or the threshold should be changed in order to ensure convergence. During the analysis the window size and type, data frequency, volatile and smooth periods should be taken into account. Investigating the sensitivity of the decomposition of economic/financial time series to the selected window. In this case one should test how sensitive the EMD and wavelet resolution is, how the number of components change when the window rolls or expands in case of EMD and how the information content of the components change in case of wavelet. One should select the

number of components in advance, when applying wavelet decomposition, that is why the information content of these components should be considered. Comparing noise selection methods and analyzing their effect on prediction accuracy. In this research project one should summarize the possible noise definitions in case of the four prediction methods (introduced in section 4.) and find which noise selection method is appropriate to use in each of the cases. Comparing different reconstruction methods. In this research project one should test which of the reconstruction method is the most efficient, how many components (low, medium, high etc.) should be made, which of the characteristics should be used to reconstruct the components. Analyzing the partial effects of decomposition, reconstruction and noise reduction separately. All of these methods can enhance prediction performance however their individual contribution to prediction accuracy is rarely analyzed. Testing the efficiency of

prediction models in case the values of components are predicted separately. Some papers apply different prediction models on components, some choose only one model. Nevertheless it should be investigated whether it is worth choosing models based on statistical properties of a component. 21 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár Perform an analysis based on simulated data. Define several data generating processes which produce time series with different frequencies then use them to create an aggregated time series. Due to the controlled nature of the analysis the result of signal processing methods will be more reliable. Investigating whether there is a relationship between the amplitude of noise and liquidity, volatility or volatility of volatility in case of time series which are from different asset classes. Analyze the efficiency of decomposition methods in prediction accuracy in case of economic/financial time series

which are rarely analyzed by signal processing methods (for example volatility index, inflation, GDP). Apply multi-dimensional decomposition, exploiting the relations between time series. One should investigate whether the result of simultaneous decomposition can be used to predict one or the other time series more accurately. The potential research questions mentioned above do not claim to be exhaustive, however they illustrate well that several studies are missing from the literature. Nevertheless the author claims that the results of these researches could make a great progress in decomposition based economic/financial time series forecasting literature. 6. Deco mpos itio n methods This section provides a detailed introduction to the decomposition methods, namely, empirical mode decomposition (EMD) and wavelet based decomposition. These are the two methods which are used to analyse the original signal in a new representation. 6.1 Empirical mode decomposition Empirical mode

decomposition (EMD) method first appeared in the article of Huang et al. (1998) They introduced a new method to deal with both non-stationary and nonlinear data by decomposing the signal first, and analyse the physical meaning of the decomposition later. EMD has the characteristics of being intuitive, direct, a posteriori and adaptive with the basis of the decomposition based on the data. The basic principle of EMD is to decompose the signal into a sum of oscillatory functions, namely, intrinsic mode functions (IMF). The decomposition based on three assumptions: (1) the signal has at least two extrema, one maximum and one minimum, (2) the characteristic time scale is 22 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár defined by the time lapse between the extrema, (3) if the data has no extrema but contains only inflection points then it can be differentiated once or more times to get the extrema (Huang et al. 1998) Huang et al

(1999) introduced two requirements in order to get meaningful IMFs: (1) in the whole data series, the number of extrema (sum of maxima and minima) and the number of zero crossings, must be equal, or differ at most by one, (2) the mean value of the envelopes defined by local maxima and minima must be zero at all points. Nevertheless the components’ orthogonality is not guaranteed theoretically For some data, the neighboring components could certainly have sections of data carrying the same frequency at different time durations. The amount of leakage usually depends on the length of data as well as the decomposition results. However Huang et al (1998) argues that orthogonality is a requirement only for linear decomposition systems, it would not make physical sense for a nonlinear decomposition as in EMD. The different scales can be identified directly in two ways. First by the time lapse between the successive alternations of local maxima and minima, secondly by the time lapse between

the successive zero crossings. Huang et al (1998) adopted the time lapse between successive extrema as the definition of the time scale for the intrinsic oscillatory mode. This choice is beneficial because it gives a fine resolution of the oscillatory mode One can extract the scales by the shifting process. Any data series � (�) (� = 1,2, �) can be decomposed according to the following shifting procedure (Yu et al., 2008): 1) Identify all the local extrema, including local maxima and local minima of the time series �(�) 2) Connect all local extrema by a cubic spline line to generate its upper and lower envelopes ��� (�) and ���� (�). In this step we should fit a cubic spline separately to the time series of local minimum and local maximum points. 3) Compute the point-by-point envelope mean �(�) from upper and lower envelopes (�(�) = ��� (�)+���� (�) 2 ) 4) Extract the details: �(�) = �(�) − �(�). Steps 1) – 4)

is plotted on Figure 3 5) Check the properties of �(�), if � (�) meets the two requirements of Huang et al. (1999), an IMF is derived and �(�) should be replaced with the residual : �(�) = � (�) − �(�). In case �(�) is not an IMF then �(�) should be replaced with �(�) One has to repeat the algorithm 1) – 5) until a stopping criterion is satisfied. 23 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár The typical stopping criterions can be classified into three groups (1) residual based, (2) expert judgement, (3) entropy based. According to the residual based criterions the above algorithm should be stopped if we reach the final time series �(�) as a residual component that becomes a monotonic function or has at most one local extremum. This criterion is suggested by Huang et al. (1999) The following researches also used this method as a stopping criterion (Lin at al., 2012), (Yu et al,

2008), (Guo et al, 2012) Riling et al. (2003) introduces the mode amplitude �(�) ≔ ��� (�)− ���� (�) 2 , and the �(�) evaluation function � (�) ≔ | �(�) |. Thus the sifting is iterated until � (�) < �1 for some prescribed fraction (1 − �) of the total duration, while �(�) < �2 for the remaining fractions, where �1 and �2 aimed to guarantee globally small fluctuations in the mean while taking into account locally large excursions. One can typically set �1 = 005, �2 = 0.5 and �(�) = 005 There are several approaches in the literature where the stopping criterion includes a certain threshold that is determined by the researcher. Lahmiri (2016) computed the standard deviation (SD) from two consecutive sifting results. According to this approach the shifting process should be stopped if the standard deviation is less than an arbitrary small number 1. Huang et al (1998) emphasize that carrying the shifting process

to an extreme could make the resulting IMF a pure frequency modulated signal of constant amplitude. To guarantee that the IMF components have enough physical sense one should set SD value between 0.2 – 03 Another stopping criterion can be defined by the following three conditions: (1) at each point (mean amplitude) < (threshold * envelope amplitude), (2) mean of Boolean array ((mean amplitude)/(envelope amplitude) > threshold) < tolerance, and (3) the number of zero crossings and the number of extrema is less than or equal to one Lahmiri (2016). In this case threshold, threshold2 and the tolerance value are set by the researcher, Lahmiri (2016) applied 0.5, 05, 05 values Zhu et al. (2016) terminated the shifting process when it reached the maximum shifting times of 10. In Xiong et al (2014) paper the whole sifting process stops after ���2 � IMFs have been extracted, where N is the length of the data series. Tseng and Lee (2010) applied an etropic analysis strategy.

They analyized to what extent information relevant to underlying functions of x(t) is carried in the IMFs. They defined a normalized information scale to measure the information extent. Their 1 ��(�) = 2 ∑� �=0(��−1 − �� ) 2 ∑� �=0 ��−1 < � 24 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár numerical studies showed that the scale correctly quantifies the extent of information that is codified in an IMF. Based on this scale the IFMs that are information-free components can be identified. After the stopping criterion is satisfied, the original data series can be expressed by �(�) = ∑��=1 �� (�) + �� (�), where n is the number of IMFs, �� (�) is the final residual which is the main trend of �(�) and �� (�) (� = 1, , �) are the IMFs. Thus, one can achieve decomposition of the data series into n-empirical mode functions and one residual. The IMF

components have different frequency band and they change with variation of time series �(�), while �� (�) represents the central tendency of the data (Yu et al., 2008) 3. Figure: Plotting the envelope and their mean, Source: Metatrader, 2012 Empirical mode decomposition has several distinct advantages, however it also has some serious disadvantages. On the one hand it is relatively easy to understand and implement, the fluctuations within a time series are automatically and adaptively selected from the time series, it is robust for nonlinear and nonstationary time series decomposition, EMD can adaptively decompose a time series into several IMF components and one residual components. Unlike wavelet decomposition, EMD is not required to determine a filter base function before decomposition (Yu et al., 2008) On the other hand the decomposition results can be mode mixing, which means that a single IMF contains sparsely distributed timescales, or similar timescales are broken

down into different IMFs (i.e orthogonality condition is not satisfied) (Zhu et al, 2016) Furthermore EMD suffers the end effect. End effect refers to the situation in which when 25 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár calculating the upper and lower envelopes with the cubic spline function in the sifting process of EMD, divergence appears on both ends of the data series and gradually influences the inside of the data series, greatly distorting the results (Deng et al., 2001) 6.2 Discrete wavelet based decomposition Wavelet methodology, a refinement of Fourier analysis, is an alternative for analyzing nonstationary data with high irregularities and cyclical pattern. The wavelet multiscale decomposition allows for simultaneous analysis in the time and frequency domain. It converts a signal into a series of wavelets and provides a way for analyzing waveforms, bounded in both frequency and duration. That is why wavelet

decomposition could be a valuable means of exploring the complex dynamics of financial time series (Bekiros, Marcellino, 2013). Figure 4 depicts the benefits of wavelet transform in comparison with the time domain representation, Fourier transform and the short-time Fourier transform. 4. Figure: Comparison of transformations, Source: Uliha, 2016, 512p Figure 4. highlights that in case of a time domain representation we have no frequency information however we have information about the amplitude of a signal. The Fourier transform uses a basis of sines and cosines of different frequencies to determine how much of each frequency the signal contains. The Fourier transform does not allow the frequency content of the signal to change over time therefore it can tell us how much 26 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár of each frequency exists in the signal but it does not tell us when in time these frequency components

exist. To overcome such limitation it has been suggested the short-time Fourier transform. It consists in applying a short-time window to the signal and performing the Fourier transform within this window as it slides across all the data. However, any time-frequency analysis is limited by the Heisenberg uncertainty principle, which states it is impossible to know simultaneously the exact frequency and the exact time of occurrence of this frequency in a signal (i.e there is a trade off between time and frequency resolution). The problem with the short-time Fourier transform is that it uses constant length windows. In contrast, the wavelet transform uses local base functions that can be stretched and translated with a flexible resolution in both frequency and time. In case of the wavelet transform, the time resolution is intrinsically adjusted to the frequency with the window width narrowing when focusing on high frequencies while widening when assessing low frequencies. Allowing for

windows of different size makes it possible to improve the frequency resolution of the low frequencies and the time resolution of the high frequencies. This means that, a certain high or low frequency component can be located better in time. Wavelet enables a more flexible approach in time series analysis, wavelet analysis is seen as a refinement of Fourier analysis. (Rua, 2012) (Uliha, 2016) The signal �[�] is a discrete time function i.e a sequence, where n is an integer The procedure starts with passing the sequence through a half band digital lowpass filter with impulse response ℎ[�]. Signal filtering corresponds to the mathematical operation of convolution of the signal with the impulse response of the filter. The convolution is defined as follows: �[�] ∗ ℎ[�] = ∑∞ �=−∞ � [� ] ∙ ℎ[� − �] (2) A half band lowpass filter removes all frequencies2 that are above half of the highest frequency in the signal. After passing the signal through a

half band lowpass filter, half of the samples can be eliminated. Discarding every other sample will subsample the signal by two and the signal will then have half the number of points. The scale of the signal is now doubled. The lowpass filtering removes the high frequency information, but leaves the scale unchanged. Only the subsampling process changes the scale However resolution is related to the amount of information in the signal, and therefore, it is affected by the filtering operations. Nevertheless the subsampling operation after filtering does 2 In discrete signals frequency is expressed in terms of radians. 27 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár not affect the resolution, half the samples can be discarded without any loss of information. In summary, the lowpass filtering halves the resolution, but leaves the scale unchanged. The signal is then subsampled by 2 since half of the number of samples are

redundant. This doubles the scale This procedure can be expressed as: �[�] = ∑∞ �=−∞ ℎ [� ] ∙ �[2� − �] (3) The DWT analyzes the signal at different frequency bands with different resolutions by decomposing the signal into a coarse approximation and detail information. DWT employs two sets of functions, called scaling functions and wavelet functions, which are associated with low pass and highpass filters, respectively. The decomposition of the signal into different frequency bands is simply obtained by successive highpass and lowpass filtering of the time domain signal. In summary the original signal is first passed through a halfband highpass filter�[�] and a lowpass filter ℎ[�], after the filtering half of the samples can be eliminated, then the signal can be subsampled by 2, simply by discarding every other sample. This constitutes one level of decomposition and can be expressed as follows: �ℎ��ℎ [�] = ∑� �[�] ∙ �[2�

− �] (4) ���� [� ] = ∑� �[�] ∙ ℎ[2� − �] (5) where �ℎ��ℎ [�] and ���� [�] are the outputs of the highpass and lowpass filters after subsampling by 2. The decomposition halves the time resolution since only half the numbers of samples now characterize the entire signal. However, this operation doubles the frequency resolution, since the frequency band of the signal now spans only half the previous frequency band, reducing the uncertainty in the frequency by half. This procedure can be repeated for further decomposition. Figure 5 illustrates this procedure 28 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár 5. Figure: Process of wavelet decomposition, Source: Mirzaei et al, 2010, 303p The highpass and lowpass filters are not independent of each other and they are related by the following equation, where L is the filter length (in number of points): �[� − 1 − �] =

(−1)� ∙ ℎ[�] (6) The frequency bands that have little information for the original signal will have very low amplitudes, consequently that part of the signal can be discarded without loss of information, allowing data reduction. The reconstruction of the original signal is easy if we use halfband filters, since they form orthonormal basis. The reconstruction formula can be expressed by: �[�] = ∑∞ �=−∞((�ℎ��ℎ [� ] ∙ �[2� − �]) + (���� [� ] ∙ ℎ[2� − �])) (7) However, if the filters are not ideal halfband, then perfect reconstruction cannot be achieved Daubechies (1992). The most famous wavelets are known as the Daubechies’ wavelets, however Coiflet, Haar and Symlet wavelets are also frequently used types. One of the most important benefits of wavelet decomposition is its strong theoretical background and the possibility of applying wavelet that produces orthogonal components. In comparison with Fourier transform,

wavelet transform uses local base functions that can be stretched and translated with a flexible resolution in both frequency and time, resulting in more frequency and time domain information. Due to the filtering one can easily choose noise component. On the other hand one should choose a base function before the analysis which can highly affect the results, there is no recipe book 29 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár for choosing the type of wavelet for a specific time series. Not only should the base function be chosen by the researcher in advance, but also the order of the wavelet. The classical, decimated discrete wavelet transform involves subsampling of the output of the high- and low-pass filters to half their original length. This leads to a serious drawback, namely the transform is not invariant in the real-axis. Specifically, the DWT of a shifted signal is not the shifted version of the DWT of the signal

(Bekiros, Marcellino, 2013). Furthermore wavelet transform, just like EMD, suffers from the boundary effect (Su et al., 2012) 7. Data This section introduces the data that is used for the research. First, the main properties of the data will be presented, as well as some of its most important descriptive statistics. Then the effect of empirical mode decomposition will be described on an example. In this study the daily Brent crude oil spot price is chosen as an experimental sample. The data is available and can be downloaded from the website of Energy Information Administration. The data span a time period from 20000104 to 20190314 (4867 observations). The given sample length is chosen as it encompasses the most relevant extreme events occurred in the history of oil price for example 2001’s Terrorist attack, IRAQ invasion of 2003, subprime crisis in 2008 and the OPEC decision in 2014.The original time series is non-stationary based on KPSS and ADF tests, that is why this study uses

log returns for prediction purposes. Only the log returns will be decomposed and later predicted with the selected models. The data set is divided into two parts, the in-sample period starts from 2000.0104 and lasts 20060103 (1542 observations), while the out-of-sample covers the period 2006.0104 to 20190314 (3325 observations). The original Brent crude oil time series and the log returns are shown on figure 6. the dashed line separates the in-sample from the out-of-sample period Table 3. describes some of the most important descriptive statistics of log returns 30 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár Mean Standard deviation Median Min – max ADF statistic (p-value) KPSS statistic (p-value) Auto(1) – Auto(2) In-sample 0.06089 % 0.02445 0.13609 % -0.19891 – 012853 -38.91 (0001) 0.023 (01) 0.0079 – 00259 Out-of-sample 0.00153 % 0.02137 0.02151 % -0.16832 – 018129 -56.79 (0001) 0.067 (01) 0.0149 – 00138 3.

Table: Descriptive statistics of log returns calculated on observations from the in-sample and out-of-sample, Source: Own table Based on the statistics in table 3. the two samples have relatively similar characteristics, their mean return can be regarded as equal based on two-sample t-test. They are both stationary and have no first and second order autocorrelation. Nevertheless both contain volatile and calm periods. Figure 6: Brent crude oil prices and returns for the entire sample, Source: Own figure Figure 8-10. show the empirical mode decomposition of log returns in different periods Figure 8. shows a decomposition procedure which uses the entire sample The red signal is the original log return series, the green signal is the residual and the blue signals are the IMFs. This figure describes how EMD decomposes the original signal into meaningful 31 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár components. The original

signal can be reconstructed by simply summing up all the components. The components contain high-, mid- and low-frequency information and capture the complex characteristic of returns. The same decomposition for the in-sample is described on figure 9. The figure shows us that the decomposition result is not independent from the window size. In case of using only the in-sample information 15 components are generated, while 19 components are obtained from the entire sample. I also calculated the components using data two years before and after the collapse of Lehman Brothers. The results are shown on figure 10 In this case 13 components are generated. The instability of components can be explained by the continuously changing environment and by the occurrence of extreme events which can alter the data generating process. Nevertheless the instability of components makes the result of noise selection more difficult to interpret, since the number of selected components will also vary during

the analyzed period. Histograms of the number of EMD components can be seen on figure 7. An expending window type is applied for the decomposition, the first window covers the in-sample period then the window expands as new data is given to the sample on a daily basis. Figure 7 shows that the number of IMFs gradually increases as the window expands. It is important to emphasize that the decomposition process in case of EMD can be time consuming if we apply an expanding window. In this paper I used MATLAB R2016b software for my calculations, for wavelet decomposition I used ‘wavedec’ function, while ‘emd’ function was applied for empirical mode decomposition. Signal reconstruction can be done by summation in case of EMD, while ‘waverec’ function can be used for reconstructing wavelet coefficients. 7. Figure: Number of IMFs during the estimation period using expanding window, Source: Own figure 32 33 33 8. Figure: Components of Brent crude oil generated by EMD on the

entire sample, Source: Own figure 34 34 9. Figure: In-sample components of Brent crude oil generated by EMD, Source: Own figure 35 35 10. Figure: Components of Brent crude oil generated by EMD during the recession, 200609 – 201009, Source: Own figure EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár 8. Emp irica l a na lys is and res ults This section introduces the prediction strategy, briefly describes the ARIMA model and presents the results. The prediction strategy will be described with the help of the general research framework which was introduced in section 4. This paper applies ARIMA as prediction model, because it is a simple model and its parameters can be estimated relatively quickly. It is important to emphasize that the focus of this paper is on the denoising ability of EMD and wavelet. 8.1 Prediction strategy This study applies the first research design from the four broad designs which were introduced in

section 4 on figure 1. This involves the decomposition of the original data, the approach then applies different noise selection methods to choose and drop noise components. The rest of the components are aggregated with the help of signal processing inverse, which is summation in case of EMD, wavelet inverse in case of wavelet decomposition. The prediction process is shown on figure 11 This study defines noise as the following: the component or components of an observable signal, which, if dropped, improves prediction accuracy. 11. Figure: Prediction process, Source: Own figure The detailed version of the research design is depicted on figure 12. 36 37 12. Figure: Selected research design for empirical mode decomposition, Source: Own figure EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár Figure 12. helps better understand the selected research design for empirical mode decomposition. This study analyzes daily log return

data, using expanding window The first window is from 2000.0104 to 20060103 and it expands on a daily basis Empirical mode decomposition is selected as a decomposition method and a residual based stopping criterion is chosen for terminating the algorithm. This paper applies the same stopping criterion as Lahmiri (2016). Lahmiri (2016) computed the standard deviation (SD) from two consecutive sifting results. The shifting process should be stopped if the standard deviation is less than an arbitrary small number 3. Huang et al (1998) emphasize that carrying the shifting process to an extreme could make the resulting IMF a pure frequency modulated signal of constant amplitude. To guarantee that the IMF components have enough physical sense one should set SD value between 0.2 – 03 This paper selected � =0.2 as the stopping criterion, however several times the stopping criterion had to be set � =0.3 because the algorithm failed to converge This paper applies expert judgement, PACF,

permutation entropy, sample entropy and Shannon entropy as noise selection methods. In case of expert judgement the first, the first two, the first three then the first four components are dropped. PACF approach is based on the consideration that an uncorrelated identically distributed random sequence with zero expected value can be regarded as white noise. The entropies are used as a tool for optimal decomposition with respect to the minimization of an entropy, which describes the information-relevant properties of the representation of a signal. The entropy of each denoised signal is estimated step-wise and it is compared with the one from the previous level. The procedure is the following, after decomposition the first component is dropped, the rest of the signal is aggregated then the entropy of the denoised signal is estimated. After that the first two components are dropped, the rest of the signal is aggregated then the entropy of the denoised signal is estimated etc. The optimal

level of decomposition is determined at the minimum value of the entropy. After the noise selection is done, a component or some components are dropped and the rest of the components are aggregated. This paper applies only one prediction model and uses the same window for prediction as for the decomposition (i.e expanding window). An ARIMA (p,0,q) model is selected for a one period prediction, the lag parameters are p=1,2,3,4 and q=0,1,2,3,4. The optimal parameters are chosen based on 3 ��(�) = 2 ∑� �=0(��−1 − �� ) 2 ∑� �=0 ��−1 < � 38 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár Bayesian information criterion. Due to the fact that the components were aggregated in an earlier stage, there is no need for aggregation at the end of the process. The research design is the same in case of wavelet decomposition except for the fact that a 10-level discrete wavelet decomposition is applied

with the help of order 7 Daubechies wavelet. The level is selected based on Bekiros & Marcellino (2013), and the order 7 Daubechies wavelet is applied in this study because it is one of the most popular selection in the literature. Wavelet decomposition generates one approximation component and ten detail components. The detail components contain the high frequency information, therefore these are the components which are potentially selected as noise. The noise selection procedure is the same as in the case of EMD. The expert judgement approach involves dropping the first, the first two, the first three then the first four detail components. After decomposition, PACF based noise selection aggregates each D1-D10 components per se with wavelet inverse. Components that have no autocorrelation are dropped. In case of the entropy statistics, after decomposition, D1 component is dropped then the rest of the components (A10, D2-D10) are aggregated using the inverse wavelet transform and

the three entropies are estimated. After that the first two detail components are dropped, the rest of the signal is aggregated (A10, D3-D10) using wavelet inverse then the entropy of the denoised signal is estimated etc. The optimal level of decomposition is determined at the minimum value of the entropy. To measure the forecasting performance, two main criteria are used for evaluation of level prediction and directional forecasting, respectively. The root mean squared error (RMSE) is selected as the evaluation of level prediction. RMSE can be defined as 1 ���� = √� ∑� ̂(�) − �(�))2 �=1(� (8) where N is the number of prediction, �̂(�) is the predicted value and �(�) is the observed signal. Accuracy is one of the most important criteria for forecasting models, the other being the decision improvements generated from directional predictions. From the business point of view the latter is more important than the former. The ability to predict

movement direction can be measured by a directional statistic (����� ) (Yu et al. 2008) The statistic can be expressed as 39 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár 1 ����� = � ∑� �=1 �� (9) where �� = 1 if (��+1 − �� )(��+1 − �� ) ≥ 0 and �� = 0 otherwise. Here ��+1 represents the predicted value given by a model and �� , ��+1 are observed values. 8.2 Prediction model This section describes briefly the prediction model ARIMA. In this research ARIMA models are trained on the denoised signal in order to generate one period out of sample prediction. The literature is rich in the description of ARIMA that is why only the most important characteristics of the two models are highlighted in this section. In an ARIMA model (Box & Jenkins, 1970) the future value of a variable is assumed to be a linear function of several past observations and

random errors. The underlying process that generates the time series takes the following form: Φ(�)�� = �(�)�� (10) where �� and �� are the actual value and random error at time t respectively. � denotes the backward shift operator ��� = ��−1 and �2 �� = ��−1 etc. and Φ(�), �(�) denotes the following: Φ(�) = 1 − Φ1 �1 − Φ2 �2 − ⋯ − Φ� �� (11) � (�) = 1 − �1 �1 − �2 �2 − ⋯ − �� �� (12) where p ang q are parameters and often referred to as lag orders of the model. Random errors are assumed to be independently and identically distributed with a mean of zero and a constant variance. If the dth difference of {�� } is an ARMA process of order p and q, then �� is called an ARIMA(p-d-q) process. 8.3 Results This section summarizes the most important empirical results of the study. For the sake of simplicity the section first introduces the results of the PACF noise

selection, followed by the results of the entropy based noise selection in cases when EMD was used as decomposition method. After that, the results of wavelet based decomposition are summarized in the same order, followed by the noise selection made with an expert 40 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár 13. Figure: Ratio of significant lags in the first three IMFs, Source: Own figure judgement approach. At the end of this section the results of EMD and wavelet decomposition are compared. Using the prediction strategy with PACF noise selection is proved to be a weak approach. In case we want to drop components that are uncorrelated with their own lags (i.e PACF can be considered as zero for all lags) then none of the components are selected as noise. Even the highest frequency component has significant first, second and third order autocorrelation. The first bar chart of figure 134 shows the ratio of the first IMFs

that have statistically significant lags, based on the figure all the IMF1 have statistically significant first, second and third order autocorrelation (i.e PACF lags are not zero) The results are the same in case of IMF2 and IMF3 sequences. That is why this approach suggests that all the IMFs have information content thereby using the observed signal (return) is beneficial. Permutation entropy, a natural complexity measure for time series (Bant & Pompe, 2002), is proved to be a weak approach in this study. The time delay was set to one, the order of the ordinal patterns was set to three. This means that three consecutive observations were grouped in embedded vectors5. The noise selection was not successful with permutation entropy, since its value decreased as more and more IMFs were 4 The signal was decomposed on a daily basis, therby the number of decomposition was 3325. I collected all the IMF1, IMF2 and IMF3 sequences becasue theese are the highest frequency components. The

figure shows the ratio of IMFs that have statistically significant 1-6 lags based on PACF. 5 For more details check Riedl et al. (2013) 41 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár dropped. Consequently the minimum value of permutation entropy was calculated at the trend component in 91% of all decompositions. 14. Figure: Typical values of permutation entropy estimated from denoised signals, Source: Own figure Figure 14. shows the effect of denoising on permutation entropy The values on figure 14. were calculated using the entire sample and shows how permutation entropy decreases as more and more IMFs are dropped6. In spite of the fact that figure 14 shows the result of one decomposition, the same pattern appeared in most of the cases. Therefore permutation entropy suggests in 91% of decompositions that all of the IMF components should be dropped except the trend. Noise selection with sample and Shannon entropy were more

successful. Figure 15-16 show the result of noise selection based on sample and Shannon entropy. The left histogram shows the number of dropped components based on the entropy and the right is the number of components generated with expanding window. 6 The first point on the figure was calculated after denoising the signal from IMF1. The second point on the figure shows the value of permutation entropy when IMF1 and IMF2 are dropped etc. 42 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár 15. Figure: Number of dropped IMFs based on sample entropy and number of generated IMFs using expanding window, Source: Own figure 16. Figure: Number of dropped IMFs based on Shannon entropy and number of generated IMFs using expanding window, Source: Own figure Noise selection with sample entropy led to a similar result but not that radical as permutation entropy. Sample entropy suggests to drop several components and use the last two to

three components for reconstruction. In case of sample entropy the embedding dimension was set to 200 and the tolerance value to 0.37 Nevertheless Sample entropy can be used for the analysis because in most of the times it suggests to keep some of the components which we can use for reconstructing a signal. Shannon entropy based noise 7 For determining the parameters I used Richman and Mooran (2000) study, however parameter selection involved several trials and erros. 43 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár selection has the most promising result, it suggests to keep on average seven to eight components. Later in this section the prediction enhancing performance of sample and Shannon entropy based noise selection is described. Using level ten wavelet decomposition and PACF for noise selection has the same result as in case of EMD: none of the components are selected as noise because all are autocorrelated based on

PACF. Consequently in this study PACF could not be used as a noise selection tool. Wavelet decomposition also led to the same conclusion in case of the three entropies. The more detail components we drop the less the value of permutation entropy is. In general the reconstruction of approximation coefficient is suggested based on permutation entropy. Figure 18 shows how permutation entropy decreases as more and more detail components are dropped. The top chart on the left side shows a signal that is denoised from D1, the chart under it shows the signal that is denoised from D1 and D2, the last chart is the reconstructed approximation component. Their permutation entropy value is presented on the right side of the figure. Figure 18 was created using the entire sample, thereby it depicts one decomposition, however the pattern on the figure is similar to the majority of decompositions. Sample and Shannon entropy led to a similar result as in case of empirical mode decomposition. Figure 17

shows the dropped detail components based on Shannon and sample entropy. A level ten wavelet decomposition was applied, therefore every time an entropy suggests that ten detail components should be dropped is equivalent to reconstructing only the approximation coefficients. As in case of EMD Shannon entropy based noise selection has the most promising result, it suggests to keep two to three detail components. 44 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár 17. Figure: Number of dropped detail components based on Shannon and Sample entropy using expanding window, Source: Own figure 45 46 18. Figure: Denoised signals and their permutation entropy using wavelet decomposition, Source: Own figure EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár Due to the fact that sample and Shannon entropy suggest to drop several components both in case of EMD and wavelet decomposition,

an expert judgement approach is applied for noise selection. This strategy involves dropping the first, the first two, the first three and the first four components. These components are selected arbitrarily, nevertheless these are the ‘high frequency’ components and it stands to reason that dropping them is beneficial. In the following part of this section the prediction performances of the above methods are summarized, using the prediction strategy described in section 8.1 Table 4. shows the RMSE values for predicting the original log returns, the prediction performance of using an expert judgement approach and the results of entropy based denoising. It also shows the Diebold Mariano test statistics Table 5 shows the values of the direction statistic. Original 2.115 Method - Drop 1 EMD Wavelet DM test 2.812 2.589 5.84 Expert judgement Drop 1- Drop 12 3 3.445 3.550 2.625 2.464 11.29 14.54 Drop 14 3.402 2.370 14.96 Entropy Shannon Sample 3.191 2.30 14.28 3.040 2.412 11.23

4. Table: Prediction performance of the denoising methods based on RMSE (multiplied by 100), Source: Own table Original 75.42% Method - Drop 1 EMD Wavelet 64.18% 67.88% Expert judgement Drop 1- Drop 12 3 45.65% 3086% 65.92% 70% Drop 14 29.41% 71.22% Entropy Shannon Sample 37.62% 74.11% 36.92% 62.33% 5. Table: Prediction performance of the denoising methods based on ����� ,Source: Own table Based on table 4-5. dropping IMF1 (ie the highest frequency component) is the best prediction strategy for empirical mode decomposition, while selecting noise with Shannon entropy gives the most accurate level prediction in case of wavelet decomposition. The direction statistic led to the same conclusion Diebold-Mariano test analyses the equivalence of two forecasts based on squared prediction errors. Every EMD prediction strategy is compared to its wavelet counterpart. Based on DM statistics the equivalence of forecasts can be rejected. Based on table 4-5 wavelet based

decomposition led to a more accurate prediction relative to EMD. However, forecasting the original signal is the most accurate prediction based on both RMSE and directional statistic. If we compare the prediction accuracy of the original signal to the best performing EMD and wavelet, the DM statistic rejects their equivalence. Figure 19 shows the evolution of 47 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár cumulative RSE throughout the out-of-sample period in case of the best performing EMD and wavelet models. 19. Figure: Cumulative RSE of the two best performing models throughout the out-of-sample period, Source: Own figure 9. Robustness check In the last section of this study, I will perform robustness checks. In this section, I will test the validity of my results, by recalculating the models with slightly different settings. This way I can check how sensitive my results are The general research framework (figure 2.)

gives a tool for robustness check by selecting different settings in Data column. This study uses weekly log returns for the same period and applies rolling widow for decomposition and prediction. Each window contains 250 observations and the number of decomposition was 751. The threshold selection for empirical mode decomposition was highly sensitive. This study used � =0.2 as a threshold for terminating the shifting process However the threshold had to be changed several times between values 0.2 and 04 in order to ensure convergence. Seemingly there is no connection between threshold changes and volatile periods. The root cause of the parameter change is unknown I had the same problem in case of expanding window with daily data and rolling window with weekly data. I also had difficulties with the spline fit on the local minima and maxima time series. In case of using rolling window (500, 1000, 1500, 1600, 2000 and 2500) and daily log 48 EMD and wavelet decomposition based

denoising and forecasting of crude oil prices Bálint Plangár returns the spline fit could not be applied, which is an important part of the shifting process. The number of IMFs are gradually increased in case of expanding window (figure 7.), while it remained stable for weekly data and rolling window This can be explained by the change in the data generating process. In case we use expanding window all the past information is used even those that are observed before a potential regime shift. A rolling window, that incorporates 250 observations every time, has less change to use information from multiple regimes. Another explanation for the stable IMF number stems from the fact that weekly data are smoother than daily data. Therefore high frequency components are removed as we change from daily to weekly data. 20. Figure: Histograms of the number of IMFs using rolling window, Source: Own figure The result of PACF noise selection remained the same both in case of EMD and wavelet

decomposition. The components are autocorrelated based on PACF, consequently they cannot be considered as white noise. Noise selection with permutation entropy remained the same both in case of EMD and wavelet decomposition. 49 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár 21. Figure: Permutation entropy based noise selection in case of EMD, Source: Own figure Figure 21. shows the histogram of IMFs on the left, using rolling window, weekly data and the result of all 751 decompositions. The histogram on the right shows the number of IMFs that are suggested to drop by permutation entropy. The figure shows that almost all of the IMFs should be dropped based on permutation entropy. Wavelet decomposition led to the same conclusion. The more detail components we drop the less the value of permutation entropy is. In general the reconstruction of approximation coefficients is suggested based on permutation entropy. Sample entropy

drops all the detail components, while Shannon entropy gives similar results as in case of expanding window and daily data. 22. Figure: Number of dropped detail components based on Shannon and Sample entropy using weekly data and rolling window, Source: Own figure 50 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár Table 6. shows the RMSE values for predicting the original log returns, the prediction performance of using an expert judgement approach and the results of entropy based denoising. The best performing models changed, sample entropy based denoising is the most accurate in case of EMD, while dropping IMF1 and IMF2 strategy is the most accurate in case of wavelet based decomposition. Using rolling window and weekly data does not change the fact that wavelet based denoising strategies are more accurate than empirical mode decomposition based strategies. All the wavelet strategies are better than predicting the original log

returns. Original 4.569 Method - Drop 1 EMD Wavelet DM test 7.468 4.056 22.23 Expert judgement Drop 1- Drop 12 3 7.049 6.986 4.048 4.045 19.38 18.73 Drop 14 6.133 4.054 14.72 Entropy Shannon Sample 5.365 4.052 12.68 5.217 4.102 12.74 6. Table: Prediction performance of the denoising methods based on RMSE (multiplied by 100) using rolling window and weekly data, Source: Own table Original 69.6% Method - Drop 1 EMD Wavelet 58.0% 63.47% Expert judgement Drop 1- Drop 12 3 55.2% 57.07% 59.73% 6493% Drop 14 57.87% 67.2% Entropy Shannon Sample 57.5% 68.4% 60.0% 64.1% 7. Table: Prediction performance of the denoising methods based on ����� using rolling window and weekly data, Source: Own table 10. Conc lus io n Given the available literature, the paper’s contribution is threefold: (1) the paper introduced a general research framework which describes the possible research designs in decomposition based financial time series forecasting, (2) the paper provided

a thorough literature review based on the most important articles and classified it with the help of the general research framework, (3) the paper compared PACF, entropy and the expert judgement based noise selection methods in terms of their contribution to prediction accuracy. The framework introduced in this paper has several advantages, it helps with the formulation of the research design, it helps researchers specify all the necessary details or parameters of their research design thereby facilitating the paper’s reproduction, the 51 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár framework makes a great progress in comparing and classifying research papers, since it provides the necessary groupings for classification. The proposed framework facilitates the aggregation of the results of the current literature, consequently it helps us better understand the efficiency of signal processing techniques in financial time series

analysis. Moreover it paves the way for a meta-study in which the current results can be combined. It helps determine the reliability of the results presented in a research paper. The framework fills a gap in the current literature which opens up the opportunities for further researches. This paper provided a thorough literature review on financial time series forecasting. All the papers apply one of the decomposition methods from wavelet or empirical mode decomposition family and they analyze oil price or foreign exchange data. The main purpose of the literature review was to describe the trends in financial time series forecasting, focusing particularly on characteristics such as decomposition method, data, noise selection, reconstruction method, prediction models and the result of their analysis. Finally the paper compared PACF, entropy and the expert judgement based noise selection methods. The noise selection was based on the decomposition result of empirical mode decomposition

and wavelet decomposition. Dropping the highest frequency component was the best strategy in case of EMD, while Shannon entropy noise selection resulted the most accurate prediction in case of wavelet decomposition. However none of the strategies produced better performance than predicting the original time series. This paper performed a robustness check where the decomposition strategies were recalculated with slightly different settings. Using weekly data and rolling window the wavelet based denoising strategies produced more accurate forecasts than predicting the original signal. This result emphasizes the sensitivity of denoising methods to the input data and the parameter settings. The analysis gives reason for concern. First of all the threshold selection for empirical mode decomposition was highly sensitive. The convergence of the EMD algorithm is not stable, it frequently stopped because it failed to fit a spline. Moreover the number of IMFs gradually increased using expanding

window, which makes the interpretation of components more difficult. Using EMD for decomposition is time consuming, the daily decomposition with expanding window took about 4-5 hours. The decomposition based forecasting strategies have promising results, as it was shown in the literature review, however the decomposition strategies presented in this paper failed to 52 EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár beat the strategy where decomposition was not involved in case of expanding window. The situation was different when rolling window was applied. Finding a proper noise selection method can enhance prediction performance, since it can help models to train for the fundamental part of a signal and capture the most important factors. 53 References A. Boukhayma, A Peizerat, C Enz, 2016, Noise Reduction Techniques and Scaling Effects towards Photon Counting CMOS Image Sensors, Sensors, 2016. Apr 09 A. Chen, MT Leung, D

Hazem, 2003, Application of neural networks to an emerging financial market: Forecasting and trading the Taiwan Stock Index, Computers & Operations Research, Vol. 30, 901-923 p A. Mirzaei, A Ayatollahi, P Gifani, L Salehi, 2010, Spectral Entropy for Epileptic Seizures Detection, 2010 Second International Conference on Computational Intelligence A. Lanza, M Manera, M Giovannini, 2005,Modeling and forecasting cointegrated relationships among heavy oil and product prices, Energy Economics, Vol. 27, 831–48 p. A. Rua, 2012, Wavelets in economics, Banco de Portugal Economic Bulletin, Vol 18, No. 2, pp 71–79 A.C Smith, P Monaghan, F Huettig, 2017, The multimodal nature of spoken word processing in the visual world: Testing the predictions of alternative models of multimodal integration, Journal of Memory and Language vol. 93, 2017 April, 276-303 p. A. S Berres, T L Turton, M Petersen, D H Rogers, JP Ahrens, 2017, Video Compression for Ocean Simulation Image Databases, Workshop on

Visualisation in Environmental Sciences BP, 2018, Statistical Review of World Energy, 2018 June, 67th edition [Link:https://www.bpcom/content/dam/bp/businesssites/en/global/corporate/pdfs/energy-economics/statistical-review/bp-stats-review2018-full-reportpdf] B. Zhu, X Shi, J Chevallier, P Wang, Y-M Wei, 2016, An Adaptive Multiscale Ensemble Learning Paradigm for Nonstationary and Nonlinear Energy Price Time Series Forecasting, Journal of Forecasting C. Bandt, B Pompe, 2002, Permutation Entropy: A Natural Complexity Measure for Time Series, Physical Review Letters, Vol. 88, No 17 C-S. Lin, S-H Chiu, T-Y Lin, 2012, Empirical mode decomposition-based least squares support vector regression for foreign exchange rate forecasting, Economic Modelling, vol 29, 2583-2590. p C-Y. Tseng, HC Lee, 2010, Entropic interpretation of empirical mode decomposition and its applications in signal decomposition, Advances in Adaptive Data Analysis, Vol. 2, No. 4, 429-449 p FIA, 2018, Total 2017 volume 25.2

billion contracts, down 01% from 2016, 2018 jan 24. [Link: https://fia.org/articles/total-2017-volume-252-billion-contracts-down-01-2016] 54 J.L Zhang, YJ Zhang, L Zhang, 2015, A novel hybrid model for crude oil price forecasting, Energy Economics, Vol. 49, 2015 May, 649-659 p N. Krichene, 2007, Recent Dynamics of Crude Oil Prices, International Monetary Fund, Working Paper December 2006 N.E Huang, Z Shen, SR Long, MC Wu, HH Shih, Q Zheng, NC Yen, CC Tung, H.H Liu,1998, The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis, Proceedings of the Royal Society A: Mathematical, Physical & Engineering Sciences 454, 903–995. L. Juvenal, I Petrella, 2014, Speculation in the oil market, Journal of Applied Econometrics, vol 30, 2015 June/July, 621-649. p L. Yu, Z Wang, L Tang, 2015, A decomposition-ensemble model with datacharacteristic-driven reconstruction for crude oil price forecasting, Journal of Applied Energy, vol. 156,

251-267 p L.Yu, S Wang, KK Lai, 2008, Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm, Energy Economics, vol. 30, 2623 – 2635 p L. Yu, W Dai, L Tang, 2016, A novel decomposition ensemble model with extended extreme learning machine for crude oil price forecasting, Engineering Application of Artificial Intelligence, Article in Press M. Khashei, M Bijari, 2010, An artificial neural network (p, d,q) model for timeseries forecasting, Expert Systems with Applications, Vol. 37, 479-489 p G. Uliha, 2016, Az olajár és a makrogazdaság kapcsolatának elemzése folytonos wavelet transzformáció segítségével, Statisztikai Szemle, Vol. 94, No 5, 505 -534 p G.C Watkins, A Plourde, 1994, How volatile are crude oil prices?, OPEC Review, vol 18. 220-245p G.EP Box, G Jenkins, 1970 Time Series Analysis: Forecasting and Control, HoldenDay, San Francisco, CA G.P Zhang, B E Patuwo, MY Hu, 2001, A simulation study of artificial neural networks for nonlinear

time-series forecasting, Computers & Operations Research, Vol. 28, 381-396.p H. Su, Q Liu, J Li, 2012, Boundary Effects Reduction in Wavelet Transform for Timefrequency Analysis, WSEAS Transaction on Signal Processing, Vol 8, Issue 4, 169179p H-Y. Zhang, Q Ji, Y Fan, 2015, What drives the formation of global oil trade patterns, Energy Economics, vol. 49, 2015 March I. Daubechies, 1992, Ten Lectures on Wavelets Regional Conference Series in Applied Mathematics (SIAM), vol. 61 Society for Industrial and Applied Mathematics, Philadelphia, USA 55 J.S Richman, JR Mooran, 2000, Physiological time-series analysis using approximate entropy and sample entropy, American Journal of Physiology, Vol. 278, 2039-2049 p K-J. Kim, 2003, Financial time series forecasting using support vector machines, Neurocomputing, Vol. 55, Issues 1-2, 307-319p M. Riedl, A Müller, N Wessel, 2013, Practical consideration of permutation entropy, The European Physical Journal Special Topics, Vol. 222, June

2013, 249-262 p N. Nomikos, K Andriosopoulos, Modelling energy spot prices: empirical evidence from NYMEX. Energy Econ 2012, Vol 34, 1153–69 p Q. Guan, H An, X Gao, S Huang, H Li, 2016, Estimating potential trade links in the international crude oil trade: A link prediction approach, Energy, Vol 102, 406-415. p R. Jammazi, C Aloui, 2012, Crude oil price forecasting: Experimental evidence from wavelet decomposition and neural network modeling, Energy Economics, Vol. 34, 828 – 841.p R. DF Harris, F Yilmaz, 2009, A momentum trading strategy based on the low frequency component of the exchange rate, Journal of Banking and Finance, Vol. 33, 1575-1585 p S. Bekiros, M Marcellino, The multiscale causal dynamics of foreign exchange markets, Journal of International Money and Finance, Vol. 33, 282-305 p S. Lahmiri, A variational mode decompoisition approach for analysis and forecasting of economic and financial time series, Expert Systems with Application, Vol. 55, 268-273p S. Mirmirani, HC

Li, 2005, A comparison of VAR and neural networks with genetic algorithm in forecasting price of oil, Advances Econometrics, Vol. 19, 203–23 p S. Yousefi, I Weinreich, D Reinarz, 2005, Wavelet based prediction of oil prices, Chaos, Solitons and Fractals, 265-275. p Tang L, Yu L, He KJ., 2014, A novel data-characteristic-driven modeling methodology for nuclear energy consumption forecasting, Applied Energy, Vol. 128, 1–14 p T. Xiong, Y Bao, Z Hu, 2013, Beyond one-step-ahead forecasting: Evaluation of alternative multi-step-ahead forecasting models for crude oil prices, Energy Economics, Vol. 40, 405-415p T. Xiong, Y Bao, Z Hu, 2014, Does restraining end effect matter in EMD-based modeling framework for time series prediction? Some experimental evidences, Neurocomputing, Vol. 123, 174-184 p W. Shu-ping, H Ai-mei, W Zhen-xin, L Ya-qing, B Xiao-wei, 2014, Multiscale Combined Model Based on Run-Length-Judgment Method and Its Application in Oil Price Forecasting, Hindawi Publishing

Corporation Y. Baimbetov, I Khalil, M Steinbauer, G Anderst-Kotsis, 2015, Using Big Data for Emotionally Intelligent Mobile Services through Multi-Modal Emotion Recognition, In: 56 Geissbühler A., Demongeot J, Mokhtari M, Abdulrazak B, Aloulou H (eds) Inclusive Smart Cities and e-Health. ICOST 2015 Lecture Notes in Computer Science, vol 9102 Springer, Cham Y. Deng, W Wang, C Qian, Z Wang, D Dai, 2001, Boundary-processing-technique in EMD method and Hilbert transform, Chinese Science Bulletin, Vol. 46, 954 – 960 p Y. Xiang, HX Zhuang, 2013, Application of ARIMA model in short-term prediction of international crude oil price, Advances in Material Research, Vol. 798, 979–82 p Z. Guo, W Zhao, H Lu, J Wang, 2012, Multi-step forecasting for wind speed using a modified EMD-based artificial neural network model, Renewable Energy, Vol. 37, 241249 p Z. Wu, NE Huang, 2009, Ensemble empirical mode decomposition: a noise assisted data analysis method, Advances Adapive Data Analysis, Vol.

1, 1- 41 p 57