This study is based on the original M3-Competition. (The M3 competition was a competition designed to examine the forecasting capabilities of several forecasting organisations). The project, which uses the M3 data, replicates the results obtained by the original researchers and confirms the calculations of their study in terms of a SMAPE (Symmetric Mean Percentage Error) analysis. The data was also analysed using an alternative error analysis methodology (ROC Rate of Change) and conclusions drawn on the comparative analysis of the results. In conclusion this study has shown that the findings drawn in the original M3 study did differ from those obtained using the ROC methodology, although there was some general agreement in the context of complexity or otherwise of the forecasting methodologies employed. For example, the ROC methodology showed that one of the top performing methods was the ‘Theta’ this in agreement with the SMAPE analysis which ranked it as the best overall performing method. Given also that the ‘Theta’ method is considered as a simple forecasting approach this tends to confirm the conclusions drawn from the original study. As previously mentioned, this study also showed that there were differences in the overall rankings, using the two different methods of comparison, between the 24 different methods used in the original study. This study also showed that there are differences between the published results of the original study and those replicated in this study.
I hereby declare: That except where reference has clearly been made to work by others, all the work presented in this report is my own work; that it has not previously been submitted for assessment; and that I have not knowingly allowed any of it to be copied by another student I understand that deceiving or attempting to deceive examiners by passing off the work of another as my own is plagiarism. I also understand that plagiarising the work of other of another or knowingly allowing another student to plagiarise from my work is against the University regulations and hat doing so will result in loss of marks and possible disciplinary proceeding against me. SignedA¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦.. Data A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦A¢â‚¬A¦.
Table of figures
Table 1 Number of negative data in each forecasting method Table 2 SMAPE across the 18 forecasting horizons Table 3 SMAPE between 18 forecasting horizons boundaries Table 4 Comparative ranking of SMAPE between published results in the M3-Competition and those calculated in this study. Table 5 Table 5 ROC Error on Single across the 18 forecasting horizon Table 6 Ranking of ROC Results Table 7 ROC Result per observation Table 8 Comparative ranking between SMAPE and ROC Table A1 Comparative ranking between ROC and SMAPE Table A2 ROC Result Table A3 ROC of Single across 18 forecasting horizons Table A4 ROC of Winter across 18 forecasting horizons Graph 1 Comparative ranking of SMAPE between published results in the M3-Competition and those calculated in this study Graph 2 Matching difference between published results in the M3-Competition and those calculated in this study Graph 3 Z/O – Z/A chart, by John (2004) Graph 4 Comparative ranking between SMAPE and ROC Graph A1 Matching difference of Rank between ROC and SMAPE Graph A2 Single ROC on the 11th forecasting horizon Graph A3 Winter ROC on the 9th forecasting horizon
1.0 Introduction 1 2.0 Study of Forecasting Competition 3 2.1 Previous Study 3 2.2 M-Competition 4 2.3 M2-Competition 6 2.4 M3-Competition 7 3.0 Source data 9 3.1 Data format 9 3.2 Actual data 10 3.3 Forecasted data 13 3.4 Data source error 18 4.0 SMAPE Concept and Calculation 19 4.1 Definition 19 4.2 Calculation 20 4.3 Results 21 4.4 Matching M3-Competition data 23 5.0 ROC (Rate of change) Concept and Calculation 26 5.1 Definition 26 5.2 Results 29 5.3 Ranking ROC Result 30 6.0 Comparative analysis between MAPE and ROC 32 7.0 Discussion and Conclusion 36 Reference 39 Appendix Comparative different between SMAPE and ROC Result i ROC Result iii Single Result iv Winter Result vi Summary of the result from the 24 forecasting methods viii
The lists of all the files include were listed on File list.txt, which more information and format of the files would be explained.
Prediction has become very important in many organisations since decision-making process rely mostly on prediction of future event. From the important of these forecasts, many forecasting methods have been applied and used. Furthermore, measurement errors had been implied to forecasting methods to determine their performance. In this study, M3-Competition is to be re-analysed and also investigate with ROC (Rate of Change) methodology. M3-Competition was published in 2000, from the researchers at INSTEAD, Paris.
The project had explored and been investigated the conclusions and subsequent commentaries from the original M3-Competition and then undertake an analysis, based on the Rate of Change methodology, on the original data sets and draw comparison of the results.
The study would be involved in the tasks as followed Study the work of all M-Competitions and also related previous work Replicate the result of SMAPE (Symmetric Mean Average Percentage Error) introduced in M3-Competition. To undertake the ROC (Rate of Change Methodology) on the M3 data and produce consequent error Compare the variance measurement errors between SMAPE and ROC Conclude on error measurement goodness
2.0 Study of Forecasting Competition
2.1 Previous Studies
Early studies on forecasting accuracy, in the context of this report, were started in 1969. At that time, the studies were only based on limited number of methods. In 1979, Makridakis and Hibon expanded the range and scope of such studies. The study compared 111 time series drawn from real-life situations such as business, industry and macro data. Theil’s U-Coefficient and MAPE (Mean Average Percentage Error) were used as the measures of accuracy. The major conclusion from these studies was that simple methods such as the ‘smoothing method’ out performed the more sophisticated ones, as reported in M3-Competition by Makridakis et al. (2000, p. 452). However these conclusions conflicted with the accepted views at that time.
Despite the critics, Makridakis continued his argument by introducing M-Competition (1982). This time the number of series was increased by 1001 and also the number of methods increased to 15. In addition different trials of the same method were also tested. Minor changes were made to the general structure of the competition such as the type of series, which changed to macro, micro, industry, demographic. The observations were arranged as 18 for monthly, 8 for quarterly and 6 for yearly. Also additional measurement errors were added, these were Mean Square Errors, Average Rankings and Median of Absolute Percentage errors. From the results, the four conclusions drawn by Makridakis et al. (1982, 2000, p. 452) were: 1. It was not true that statistically sophisticated or more complex methods, out performed simpler methods. 2. The relative ranking of the various methods varied according to the method of accuracy measurement used. 3. The forecast accuracy, when various individual methods are combined, outperforms the individual methods which were the constituent parts of the combined method and the combined methods on the whole did very well in comparison to other methods. 4. The accuracy of each of the various methods depends upon the length of the forecasting horizon involved. At the conclusion of the study, the results were made available to other researchers for the purposes of verification and replication. This showed that:- 1. The calculations contained in the study were verified and found to be correct. 2. The results were also confirmed when other researchers, using the same data sets, employed different methods of measuring the developed results. 3. Other researchers, using different data series, also reinforced, in their results, the validity of such studies. Throughout this period it was still too soon to state that statistically sophisticated methods did not do better than simple methods when there was considerable randomness in the data. It was also shown that simple and sophisticated methods could be equally effective when applied to series which exhibited seasonal patterns.
In 1993, a further attempt was made to measure and develop the accuracy of various forecasting methods in the M-2 Competition (1993). This was constructed on a real time basis with a further five forecasting organisations (the data was provided by four companies and included six economic series). In this more recent study other forecasting methods, such as NaAƒA¯ve 2, single smoothing and Dampen were included. The accuracy measure employed was based on MAPE (mean absolute percentage error). The four companies provided the experts with the actual data of past and present situations (information on the nature and prevailing business conditions was also provided to the experts). Then the participating experts had to provide forecasts for the next 15 months. After a year, the forecasting data was checked against the actual data from the companies. However the conclusions from this study were identical to those drawn from the M-Competition, in that the more sophisticated methods did not create more accurate forecasts than the simpler ones. The study also agreed that the conclusions drawn from previous studies were confirmed.
The M3-Competition (2000), involved more methods, researchers and more time series. The number of time series was extended to 3003. To reduce the demands on data storage it was decided that a minimal number of observations for each type of data would be used: 14 observations for yearly series 16 observations for quarterly series 48 observations for monthly series 60 observations for other Given the source data, the participating experts were asked to develop further forecasts as follows: 6 for yearly 8 for quarterly 18 for monthly 8 for others The given time series data did not include any data containing negative values. Thus it was expected that the submitted forecasted data should also not include negative data. Despite this requirement, it was decided that any negative values received in the forecasted data, would be set to zero. Also seven methods were also added to the submitted data received from those who used neural networks, expert systems and decomposition to produce their forecasts. Five accuracy measures were used to analyze the data as follows: Symmetric MAPE Average Ranking Median symmetric APE Percentage Better Median RAE From the analysis of the M3- Competition, the conclusions were identical to the previous M-Competitions. However it was recognised that ‘Theata’, a new method used in the M3 competition, had out performed all other methods, and performed consistently well across both forecasting horizons and accuracy measures, suggested by Makridakis et al. (2000, p.459)
3.0 Source Data
3.1 Data format
The original data came from the M3- Competition which has been provided by INSTEAD. The data was broken down into two parts, which were actual data and forecast data. The actual data was taken from the website of international institute of forecasters (M3-Competition data). This data was given as an xls. file or in an Excel spreadsheet format. Meanwhile the data was broken down into 5 parts, which were titles as Competition, M3 Year, M3 Quart, M3 Month and M3 Other. However the forecast data was provided by Michele Hibon, who was one of the authors of the M3-Competition: results, conclusions and implication. The data’s format was in a DAT. file, this meant that the data was needed to be converted into xls. format in order for the forecast data to be compatible with the actual data.
3.2 Actual data
As mention previously actual data was provide as an xls. file. This meant that the data could be used into calculation straight away. However the forecast data only provide the last 6 data in yearly, 8 data in quarterly, 18 data in monthly and 8 data in other. Therefore the last of each type of data would only be used. In order to copy the unsynchronised last data repeatedly, Macro tool was utilised. The code (Macro code) was used to move all the data to right hand side of the spread sheet. This meant that the following last data in each category could be easily copy. Macro code For rearrange the actual data of the M3-Competition Yearly data Sub () aa = Range(“A1:BA646”) For i = 1 To 646 t = 53 For h = 53 To 1 Step -1 rr = aa(i, h) aa(i, h) = Empty If rr <> Empty Then aa(i, t) = rr t = t – 1 End If Next h Next i Range(“A1:BA646”) = aa End Sub Quarterly data Sub () aa = Range(“A1:BZ757”) For i = 1 To 757 t = 78 For h = 78 To 1 Step -1 rr = aa(i, h) aa(i, h) = Empty If rr <> Empty Then aa(i, t) = rr t = t – 1 End If Next h Next i Range(“A1:BZ757”) = aa End Sub Monthly data Sub () aa = Range(“A1:ET1429”) For i = 1 To 1429 t = 150 For h = 150 To 1 Step -1 rr = aa(i, h) aa(i, h) = Empty If rr <> Empty Then aa(i, t) = rr t = t – 1 End If Next h Next i Range(“A1:ET1429”) = aa End Sub Other data Sub () aa = Range(“A1:DF175”) For i = 1 To 175 t = 110 For h = 110 To 1 Step -1 rr = aa(i, h) aa(i, h) = Empty If rr <> Empty Then aa(i, t) = rr t = t – 1 End If Next h Next i Range(“A1:DF175”) = aa End Sub
3.3 Forecasted data
Forecasted data consisted of 24 forecasters which were all provided in DAT. file. The data was then converted into xls. file by opening the file through Excel. Also margins were added to separate each of the value to according cells. However the imported data still has data which overlay each other and did not match the format of the actual data. Therefore Macro was used to rearrange data to working format. The data was first rearrange to remove the overlay on each observations by demonstrate an example on macro. Then this was done repeated to set condition on macro. Meanwhile the cells which used to have the overlay values were still present. Therefore another macro was used made to remove all the empty cells. Meanwhile with AAM1 and AAM2 data, condition on macro needed to be changed as only 2184 observations were provided. At last in order for the data to be compatible with the actual data, heading for each observation was then remove by another written macro. Macro code: Rearrange the overlay values Sub ()
For i = 2804 To 8514 Range(“A” & i + 1 & “:H” & i + 1).Select Selection.Cut Range(“I” & i).Select ActiveSheet.Paste Range(“A” & i + 2 & “:H” & i + 2).Select Selection.Cut ActiveWindow.ScrollColumn = 2 ActiveWindow.ScrollColumn = 3 ActiveWindow.ScrollColumn = 4 ActiveWindow.ScrollColumn = 5 ActiveWindow.ScrollColumn = 6 Range(“Q” & i).Select ActiveSheet.Paste ActiveWindow.ScrollColumn = 5 ActiveWindow.ScrollColumn = 4 ActiveWindow.ScrollColumn = 3 ActiveWindow.ScrollColumn = 2 ActiveWindow.ScrollColumn = 1 i = i + 3 Next i
End Sub Remove all the empty cells Sub ()
j = 5659 For i = 2805 To j ‘ Rows(“2819:2820”).Select Rows(i & “:” & i + 1).Select Selection.Delete Shift:=xlUp i = i + 1 j = j – 4 Next i
End Sub Remove all the heading Sub ()
j = 3003 For i = 1 To j ‘ Rows(“2819:2820”).Select Rows(i & “:” & i).Select Selection.Delete Shift:=xlUp Next i
End Sub Macro code (AAM1 and AAM2): Rearrange the overlay value Sub ()
For i = 1514 To 7224 Range(“A” & i + 1 & “:H” & i + 1).Select Selection.Cut Range(“I” & i).Select ActiveSheet.Paste Range(“A” & i + 2 & “:H” & i + 2).Select Selection.Cut ActiveWindow.ScrollColumn = 2 ActiveWindow.ScrollColumn = 3 ActiveWindow.ScrollColumn = 4 ActiveWindow.ScrollColumn = 5 ActiveWindow.ScrollColumn = 6 Range(“Q” & i).Select ActiveSheet.Paste ActiveWindow.ScrollColumn = 5 ActiveWindow.ScrollColumn = 4 ActiveWindow.ScrollColumn = 3 ActiveWindow.ScrollColumn = 2 ActiveWindow.ScrollColumn = 1 i = i + 3 Next i
End Sub Delete the empty cells Sub ()
j = 7224 For i = 1515 To j ‘ Rows(“2819:2820”).Select Rows(i & “:” & i + 1).Select Selection.Delete Shift:=xlUp i = i + 1 j = j – 4 Next i
3.4 Data source error
By dealing with a large data set, errors could have been occurred through out data transfer from the original and tested data. In the process of prepare forecasted data, five forecasts had been found to obtain negative forecasted results. The forecasts are as followed; Robust-Trend Automat ANN Theata ARARMA SmartFcs Also some forecasts had more negative values than other. Robust-Trend was found to have the most number of negative values present in the data, with 151 negative values. The second was Automat ANN, which as the list followed. This meant that the least would be SmartFcs with one negative values presented. Therefore negative values in the five forecast were then replace by positive sign. This was the case as of the reason that the result, which obtained was nearer to result published in M3-Compettiton (2000). Method Number of data Robust-Trend 151 Automat ANN 47 Theata 19 ARARMA 4 SmartFcs 1 Table 1 Number of negative data in each forecasting method
4.0 SMAPE (Symmetric Mean Average Percentage Error) Concept and Calculation
Symmetrical Mean Average Percentage Error (SMAPE) or Adjusted Mean Average Percentage Error, Armstrong (1985) A could be defined as: SMAPE = (1.1) Note: X- Actual value, F – Forecasted value, which the sum of the total divided by number of observations Despite the similarity to MAPE (Mean Average Percentage Error), SMAPE had an advantage to Mean Average Percentage Error as this would eliminate the favour for low estimates and also there were no limits to high side, mentioned by Armstrong’s Long-Range Forecasting book (1985). Meanwhile the limit of SMAPE was between 0%, which meant for perfect and 200% for infinitely bad forecast. This meant that SMAPE was to be less sensitive than MAPE to measurement errors in actual data, stated by Armstrong (1985, p. 348). However SMAPE was not totally symmetric as over-forecasts and under-forecasts were not treated equally.
In each forecast, SMAPE had been done individually according to forecasting horizon. At first, Average Percentage Mean (APE) was calculated according to each observation for 3003 observations and 2184 observations (AAM1 and AAM2) as followed: The calculation of Error, which could be defined as Error = Actual – Forecast Take the Absolute Error, | Error | Calculate the sum of Actual and Forecast Divided the sum of Actual and Forecast by 2 Calculate the APE by taken the value in step 2 divided by value in step 4. In order to produced, SMAPE, the sum of value of APE in each observation were divided by the number of observations which it had been considered. Then, the SMAPE was calculated according to boundary forecasting horizon, such as 1 to 4, 1 to 6, 1 to 8, 1 to 12, 1 to 15, and 1 to 18.
From the SMAPE, the forecasts were ranked from best with the least error to worst with the highest. These were ranked according to the result from boundary forecasting horizon 1 to 18. This was the case as the selected error would combine all the errors in 18 forecasting horizons. This was shown in Table 2 and 3: Example result from Theata method Forecasting horizon SMAPE N 1 0.084017 3003 2 0.095669 3003 3 0.113103 3003 4 0.125112 3003 5 0.131298 3003 6 0.139994 3003 7 0.122699 2358 8 0.119834 2358 9 0.131595 1428 10 0.133898 1428 11 0.1347 1428 12 0.132214 1428 13 0.154032 1428 14 0.151862 1428 15 0.162854 1428 16 0.177043 1428 17 0.168029 1428 18 0.182731 1428 Table 2 SMAPE across the 18 forecasting horizons Forecasting Horizon 1 to 4 1 to 6 1 to 8 1 to 12 1 to 15 1 to 18 Total Percentage Error 1254.958 2069.647 2641.54 3401.817 4071.19 4824.892 N 12012 18018 22734 28446 32730 37014 SMAPE 0.104475 0.114866 0.116193 0.119589 0.124387 0.130353 Table 3 SMAPE between 18 forecasting horizons boundaries
4.4 Matching M3-Competition data
In order to replicate the result from the M3-Competition, the same SMAPE, which was mention previously, was used as the measurement error on the M3 data. The obtained SMAPE result was then compare to the result published in M3-Competition, which was shown in the table 4. Rank SMAPE SMAPE(M3) 1 Theata Theata 2 Forecast X Forecast Pro 3 Forecast Pro Forecast X 4 Comb S-H-D Comb S-H-D 5 Dampen Dampen 6 RBF RBF 7 B-J automatic Theata-sm 8 Automat ANN B-J automatic 9 SmartFcs PP-autocast 10 PP-autocast Automat ANN 11 Flores-Pearce2 SmartFcs 12 Single Flores-Pearce2 13 Theata-sm Single 14 Autobox2 Autobox2 15 AAM1 Holt 16 Flores-Pearce1 AAM2 17 ARARMA Winter 18 AAM2 Flores-Pearce1 19 Holt ARARMA 20 Winter AAM1 21 Autobox1 Autobox1 22 NaAƒA¯ve2 Autobox3 23 Autobox3 NaAƒA¯ve2 24 Robust-Trend Robust-Trend Table 4 Comparative ranking of SMAPE between published results in the M3-Competition and those calculated in this study From the comparison, the ranking of the forecasting methods were not the same as it was expected. From the result, some methods seem to out perform better than it was expected. For example, AAM1 had moved up by 5 ranks. Also NaAƒA¯ve2 had out performed Autobox3. Meanwhile, some methods did not perform as well, for example Theata-sm was decreased by 6 ranks. Graph 1 Comparative ranking of SMAPE between published results in the M3-Competition and those calculated in this study Graph 1 Comparative Ranking between M3-Competition and calculated result Graph 2 Matching difference between published results in the M3-Competition and those calculated in this study As mention earlier, some errors had been found in the raw data. These were the negative data in the 5 forecasting methods. However, as all the negative values was replaced with positive values in these forecasting methods. There were two forecasting methods which produced the corresponded ranking to the original M3 SMAPE analysis. These methods were Robust-trend and Theata. But the rest which were AutomatANN, SmatFcs, and ARARMA, had mis-matched the original SMAPE by 2 ranks. Despite, the argument above, it was clear that there were other forecasting methods which had perfectly good data, produced mis-match result. As the size of the data, it was possible that errors could have occurred in various stages in the calculation, even though this had been treated with caution. For example, rounding errors could occur when the 3003 observations were used to calculate the total SMAPE in each forecasting horizon. This meant that by considered the more number of observations, the likelihood of the errors in rounding would be more noticeable. Also it was to be mention that the forecasted data was not directly obtained from the M3-Competion, as the data was not published in the paper of M3-Competition (2000). Therefore it was fair to that the forecasted data could not have been the identical one which was used in the M3-Competition. However, this data was provided by Michele Hibon. Therefore, despite the difference in result, both results produced the same conclusions and also the data was obtained from a reliable source. This was also proven by the small deviation in the plot of both results, as Root Mean Square value was 0.9012.
5.0 ROC (Rate of Change) Concept and Calculation
The Rate of change method is based on the Centred Forecast-Observation diagram for change developed by Theil (1958) and subsequently reported by Gilchrist (1976) and extended by John (2004, p.1000). In 1968, the diagram of actual and predicted changes was a graphical picture of turning point error, mentioned by Theil (1958, p. 29). This was represented on the horizontal axis as actual change, vertical axis as predicted change (observation change), and with a line of perfect forecast which was 45° to the origin. The diagram is divided into four quadrants, the second and fourth quadrants represent turning point errors. These are determined by the sign of the preceding actual change in the same variable. Meanwhile, the other two quadrants were divided by the line of perfect forecast into equal areas of overestimation and underestimation of changes. Centred Forecast-observation diagram, Gilchrist (1976 p. 223) was used to explain more about the characteristics of forecasting. The diagram is split into six quadrants. This was also mentioned by John (2004, p. 1001). John (2004) uses the diagram as a chart with the forecast series on the y-axis and actual series on the x-axis. For each actual series a pair would be determined as: Z/Ai = Ai + 1 – Ai (2.1) Then for each forecast series the pair would be: Z/Oi = Ôi + 1 – Ôi (2.2) Note: A – pair of actual values, Ô – pair of forecast values Then each of the individual pairs of Z/Ai and Z/Oi could be determined and plotted. As mentioned previously the chart was divided into six quadrants, the quadrants start in a clockwise direction from the forecast pair axis in positive. The quadrants are as follows: Sector 1 – Overestimate of positive change Sector 2 – Underestimate of positive change Sector 3 – Forecast decrease when an increase in the actual occurs Sector 4 – Overestimate of negative change Sector 5 – Underestimate of positive change Sector 6 – Forecast increase when a decrease in the actual occurs Graph 3 Z/O – Z/A chart, by John (2004) In each quadrant the number of errors could then be determined. However, the magnitude of the errors in each quadrant was not as equal as each other. The error was then divided into two distinct types, ‘normal error’ and ‘quadrant error’, as recognised by John (2004). Normal error was given to the pair which had the same direction (sign of change), and measured as: Normal error = (|Z/Ai| – | Z/Oi|) A² (2.3) When the direction (sign of change) of Z/Ai and Z/Oi was different, for example Z/Ai was positive and Z/Oi was negative. This was known as the quadrant error, and measured as: Quadrant error = (3|Z/Ai| + | Z/Oi|) A² (2.4) This was devised by John (2004), since it could be argued that even if a forecast failed to recognize the correct magnitude of change, it would be expected that it should at least recognise the direction of change.
In each forecasting horizon, all the Z/Ai and Z/Oi were calculated and allocated to correct position on the chart. Then the magnitude of errors in each quadrant was calculated. Then total normal error, total quadrant error, and total error for individual forecasting horizon were calculated, as shown in Single method in table 5. Therefore by adding all the total errors from each forecasting horizons, the total magnitude of error could be determined for each forecast methods. Results from the ‘Single’ method Forecasting horizon Normal Error Quadrant Error Total Error 1 1,669,235,393 4,762,819,858 6,432,055,251 2 3,090,915,225 7,819,879,985 10,910,795,211 3 3,809,937,394 10,672,690,256 14,482,627,650 4 4,684,287,634 18,397,987,871 23,082,275,504 5 4,736,994,108 16,554,461,103 21,291,455,210 6 4,360,306,528 20,049,475,146 24,409,781,674 7 2,373,819,526 10,918,011,198 13,291,830,724 8 2,532,396,960 11,776,071,077 14,308,468,037 9 1,792,817,169 4,172,721,813 5,965,538,981 10 1,801,230,076 4,202,413,276 6,003,643,352 11 1,378,997,041 3,858,393,264 5,237,390,306 12 2,121,625,068 4,529,366,741 6,650,991,810 13 1,386,801,507 5,628,793,890 7,015,595,396 14 1,798,343,821 7,168,856,951 8,967,200,772 15 2,150,461,735 8,319,292,944 10,469,754,679 16 2,980,716,589 4,849,696,617 7,830,413,206 17 2,754,027,689 6,003,654,290 8,757,681,979 18 2,359,952,669 6,596,002,482 8,955,955,151 Table 5 ROC Error on Single across the 18 forecasting horizon
5.3 Rank ROC Result
In order to rank the forecasting methods, any bias, caused by the different number of observations in each method, could be eliminated by implied Error per observation. The results from the 24 forecasting methods were as followed: Methods Error per observation Rank Single 67,953,198 1 Theata 72,562,195 2 Comb S-H-D 72,944,577 3 Forecast X 76,111,849 4 Flores-Pearce2 76,589,909 5 SmartFcs 79,120,086 6 Theata-sm 81,199,075 7 Dampen 83,862,347 8 Forecast Pro 83,931,783 9 AAM1 86,621,738 10 NaAƒA¯ve2 87,290,813 11 AAM2 88,513,888 12 B-J automatic 89,875,446 13 RBF 89,950,117 14 Automat ANN 92,188,194 15 PP-autocast 94,661,129 16 Holt 94,735,528 17 Autobox3 102,203,567 18 Flores-Pearce1 115,985,051 19 Autobox1 120,046,101 20 Robust-Trend 122,657,044 21 Autobox2 134,148,029 22 ARARMA 237,620,699 23 Winter 6,602,739,469 24 Table 6 Ranking of ROC Results This was also applied to the other calculated errors such as normal errors and, quadrant errors. In addition, the numbers of observed data points, which were over-estimates, under-estimates, quadrants and correct, have been normalised and listed. The list of the forecasting method’s performance is shown in table 7. Method Correct Over-estimates Under-estimates Quadrants Normal Errors Quadrant Errors Total Errors Best NaAƒA¯ve2 Single Robust-Trend Theata Single Comb S-H-D Single AAM1 Automat ANN Autobox3 Dampen Automat ANN Theata Theata Single NaAƒA¯ve2 Autobox1 Comb S-H-D NaAƒA¯ve2 Single Comb S-H-D Forecast X B-J automatic Holt Forecast Pro Forecast X Flores-Pearce2 Forecast X Flores-Pearce1 Theata-sm ARARMA Single Theata-sm Forecast Pro Flores-Pearce2 SmartFcs Theata Winter Forecast X AAM1 Dampen SmartFcs AAM2 Autobox2 RBF PP-autocast AAM2 Forecast X Theata-sm B-J automatic Forecast X SmartFcs B-J automatic SmartFcs SmartFcs Dampen Flores-Pearce2 Dampen Flores-Pearce2 Flores-Pearce1 Flores-Pearce2 B-J automatic Forecast Pro Forecast Pro Flores-Pearce2 Flores-Pearce1 RBF Theata Theata-sm AAM1 ARARMA PP-autocast Autobox2 Winter Comb S-H-D Holt NaAƒA¯ve2 Theata Flores-Pearce1 Forecast Pro ARARMA RBF RBF AAM2 Autobox2 Comb S-H-D PP-autocast NaAƒA¯ve2 Dampen Autobox2 B-J automatic Comb S-H-D Forecast Pro Comb S-H-D Holt Forecast Pro PP-autocast RBF Autobox1 SmartFcs Forecast X Theata-sm PP-autocast AAM1 Automat ANN Winter RBF Theata-sm SmartFcs Autobox3 AAM2 PP-autocast Autobox3 Autobox1 NaAƒA¯ve2 Flores-Pearce2 B-J automatic NaAƒA¯ve2 Holt Automat ANN Winter Automat ANN Autobox2 Holt Autobox1 Autobox3 Theata-sm ARARMA Dampen Autobox3 Flores-Pearce1 Autobox3 Flores-Pearce1 Dampen Holt B-J automatic Automat ANN Robust-Trend Automat ANN Autobox1 PP-autocast Autobox3 Theata Robust-Trend Autobox1 ARARMA Robust-Trend RBF Robust-Trend Single Autobox1 Autobox2 Robust-Trend Autobox2 Holt AAM1 AAM2 AAM1 ARARMA Flores-Pearce1 ARARMA Worst Robust-Trend AAM2 AAM1 AAM2 Winter Winter Winter Table 7 ROC Result per observation
6.0 Comparative analysis between SMAPE and ROC
From the analysis, it could be argued that conclusions form the M-Competition could still be valid, since it had been proved that sophisticated or complex methods did not out perform the simpler ones. This became clear as Single out performed all of the selected methods. Single was based on single exponential smoothing, considered to be a simple method. Also the explicit methods, such as Robust-Trend and Winter came last in the competition. Also it was proven that different accuracy measures would produce a different relative ranking of the various methods. By taking a comparative analysis of the two measurement error methods, some forecasting methods did perform better in ROC. An example of a significant change was between Single and NaAƒA¯ve2. These methods improved by 11 ranks when compared with SMAPE. In addition, combined methods did still out perform individual methods. As it showed that Winter, which was explicit trend model did worst in all of the methods. Furthermore, the worst combined method was Flores-Pearce1, which was 19 in ROC and 16 in SMAPE. For all the agreement mentioned above, ROC and SMAPE were still different methods of error measurement and did produce different results. In ROC, errors could be divided into normal and quadrant. This gives the researchers more information on how each forecasting methods did and also indicates where and how improvements could be justified. Also from this extended information of the measurement error, performance on each forecasting horizon in each forecasting methods could be compared against normal and quadrant errors. As mention in the definition of ROC, normal error could be divided into two types, over-estimates and under-estimates. This information would be critical on the improvement of forecasting methods. In this study, the number of error points in each type was calculated. An example of this could be seen in Single (Table A4) and Winter (Table A5) [ROC results] in the appendix. From this sample, the table also showed that correct data point were also obtained form the ROC. Despite the advantage of this information, quadrant errors were still the most dominant in magnitude of the calculated error in each forecasting method. As in NaAƒA¯ve 2, which had the most correct data points with 90, still came 11 in the overall rank. Also this could be supported by Winter which had more correct data points and was still overall the worst performance. If the over-estimates and under-estimates are considered and separated into positive change (Sector 1, S1 and Sector 2, S2) and negative change (Sector 4, S4 and Sector5, S5) then further comments could then be made on each individual observation in each forecasting method. Thus it could be argued that despite the fact that SMAPE and ROC produce the same conclusion on the overall performance of the type of forecasting methods, one major difference could be identified. In that it was true to say that ROC could be used as to better understand the errors caused in each forecasting methods. Such analysis is not possible with other methods of error measurement such as SMAPE. The ranking of each of the forecasting methods are listed in table 8: Rank SMAPE ROC 1 Theata Single 2 Forecast Pro Theata 3 Force X Comb S-H-D 4 Comb S-H-D Force X 5 Dampen Flores-Pearce2 6 RBF SmartFcs 7 B-J automatic Theata-sm 8 Automat ANN Dampen 9 SmartFcs Forecast Pro 10 PP-autocast AAM1 11 Flores-Pearce2 NaAƒA¯ve2 12 Single AAM2 13 Theata-sm B-J automatic 14 Autobox2 RBF 15 AAM1 Automat ANN 16 Flores-Pearce1 PP-autocast 17 ARARMA Holt 18 AAM2 Autobox3 19 Holt Flores-Pearce1 20 Winter Autobox1 21 Autobox1 Robust-Trend 22 NaAƒA¯ve2 Autobox2 23 Autobox3 ARARMA 24 Robust-Trend Winter Table 8 Comparative ranking between SMAPE and ROC Graph 4 Comparative ranking between SMAPE and ROC
7.0 Discussion and Conclusion
This study has replicated the results from the M3-Competition, despite some of the mis-matching of the ranking of methods. Also, the Rate of Change (ROC) concept has been introduced as another method of error measurement. From the results of ROC analysis, many characteristics of the forecasting methods have been better understood. In the analysis, AAM1 and AAM2 had seemed to perform better in most of the categories, but when the number of observations was taken into account their true ranking was obtained. After normalising the results to ‘per observation’ in table 7, Single and Theata did perform as well as expected. Single had the least number of under-estimated errors and Theata had the least number of quadrant errors. However, Robust-trend had the least over-estimates. This was not expected since overall it was one of the worst performing forecasts in both SMAPE and ROC. But, it could be argued that it had presented the most number of under-estimated errors and therefore this meant that the total normal errors would be much higher than the other methods. Also from the ROC results per observation, it was evident that AAM1 and AAM2 were the worst methods in over-estimates, under-estimates and quadrants. In ROC analysis, the number of correct forecast could also be accumulated. This showed that NaAƒA¯ve2 had 90 correct values, whilst the others had only 20 correct values. Also the Winter had obtain 1 correct value, thus it was regarded as the worst performer in ROC. However from the results, it was true to say that quadrant values would still be the most dominant on the performance of the forecast. This meant that the higher the quadrant, the less likely for the forecasting method to perform well. For example, NaAƒA¯ve2 had more correct values than Single. But Single did have lesser quadrant values. Therefore Single would out perform Naive2, [see the results recorded by rank, table 8]. Also ROC analysis showed that the resultant information could be used to improve the accuracy of forecast. [this was mentioned in the comparative of SMAPE and ROC]. In ROC, the trend of the plot in each forecasting horizon could explain the performance of each method. A good example would be by taking ROC plots of the forecasting horizons from the best and worst methods respectively, [Single and Winter]. The Single plot was taken from the 11th forecasting horizon, and the 9th forecasting horizon was taken from Winter data. By comparing the two plots, the differences in the distribution of errors can be clearly seen. In Single, the data points are mainly distributed in the sectors 1, 2, 4 and 5. This means that the errors which were obtained would under-estimates and over-estimates [normal errors]. Whilst, in Winter a large numbers of data points fell in sector 6, which is quadrant error. This means that a greater error was created from the Winter method than from the Single method. In addition, the Winter plot showed that there were more data points in sectors 1 and 4 than in sectors 2 and 5. This meant that the method tended to over-estimate the forecasts. Also it was still to be noted that there were some minority of under-estimates. Also the same observation could be used on Single plot. There were more data points in sector 1 than sector 2. This meant that the method tended to over-estimate in a positive direction. However, this was different in negative direction since there were more data points in sector 5 than in sector 6. Therefore, the method tended to under-estimate in the negative direction. From these observations, the analysis on each method could be sent back to each forecasting method as advice. Then this could be used to improve the methods for better forecasts in the future. Therefore, it was clear that ROC has a much greater advantage in analysing forecasts than other measurement methods. However, more work is needed to produce an ROC analysis than is required for a SMAPE analysis. Also this means that more care is needed since the calculations increase as the number of forecasting horizons in each forecasting method increases.
This study has replicated the results of the M3 competition using SMAPE as the means of error measurement and undertaken a further analysis on the same data using the ROC method developed by John (2004). The study has proven that the conclusions of the M3-Compeition are still valid. However, it could be argued that re-analysis of SMAPE was not totally reliable due to some mis-matching in the ranking performance of the original SMAPE in M3-Competition and calculated SMAPE from this study. Also there were data errors found in the source data. In addition as mentioned earlier, rounding error could have occurred in the calculations, [as suggested by Chatfield C. (1988, p. 28) when he said that “There are obvious dangers in averaging accuracy across many time series”]. Meanwhile, this could be argued that the calculated SMAPE produced the same conclusions as previous SMAPE results in M3-Competition. In addition the same conclusion was also recognised by ROC. This meant that all the results tested produced one critical conclusion which was that Simple method did outperform many of the more sophisticated methods, which included individual and combined methods. Also in general, the combined methods did perform better than the individual ones. From this analysis, the use of the Simple method could be now more appreciated since it gave an equal or greater result of accuracy when compared to the more complex forecasting methods. Also this would broaden the uses of the forecasting method, which despite the fact that accuracy was just one indicator of performance beside cost, ease of use, and ease of interpretation. Also this was mentioned by Chatfield (1998, p. 21), that simple methods were considered likely to be easily understood and implemented by managers and other works that used these forecasted results.
Armstrong J.S. 1985 Testing Outputs, Long-Range Forecasting from crystal ball to computer (2nd Edition) Wiley-Interscience Publication, 348 Chatfiled C. 1988, What is the ‘best’ method of forecasting?, University of Bath, Journal of Applied Statistics, Vol. 15, No. 1, Gilchrist W. 1976, Statistical forecasting, Wiley-Interscience Publication, 222-225 John E. G., 2004Comparative assessment of forecasts, International Journal of Production Research, Vol. 42, NO. 5, 997-1008 Makridakis S. et al. 1979 Accuracy of Forecasting: An Empirical Investigation, Spyros Makridakis and Michele Hibon, Journal of the Royal Statistical Society. Series A (General), Vol. 142, No. 2, 1979, 97-145 Makridakis S. et al. 1982, The Accuracy of Extrapolation (Time series) Methods: Result of a Forecasting Competition, Journal of Forecasting, 1, 111-153 Makridakis S. et al. 1993, The M2-Competition: A real-time judgmentally based forecasting study, International Journal of Forecasting, Vol. 9, 5-22 Makridakis S. et al. 2000, The M3-Competition: results, conclusions and implications, International Journal of Forecasting, Vol. 16, 451-476 Theil H. 1958, Economic forecasts and policy, North-Holland Publishing Company, Amsterdam, 29-30
M3-Competition data, Available at: https://forecasters.org/data/m3comp/M3C.xls Marco code, Available at: https://club.excelhome.net/thread-346575-1-1.html [Accessed: 8 January 2010].
1.1 – Symmetric Mean Absolute Error. From: Appendix A, The M3-Competition: results, conclusions and implications, Spyros Makridakis, Michele Hibon, International Journal of Forecasting, 2000, Vol. 16, 461 2.1 – Actual series pair, 2.2 – Forecast series pair, 2.3 – Normal Error, and 2.4 – Quadrant Error. From: Rate of change method (ROC), Comparative assessment of forecasts, E. G. John, International Journal of Production Research, 2004, Vol. 42, NO. 5, 1000-1001