The Circus | Bowie - The Man Who Changed The World 95 mins Only true legends of rock music are known by one word; and the most creative and influential rock artist of all time is simply know… Watch My List Bowie - The Man Who Changed The World | Computers & Internet

data science course in hyderabad

Loading...
Forecasting Time Series

My Introduction Name: Bharani Kumar Educa+on: IIT Hyderabad Indian School of Business Professional cer+fica+ons: PMP Project Management Professional

PMI-ACP PMI-RMP CSM LSSGB LSSBB SSMBB ITIL Agile PM

Agile Cer4fied Prac44oner Risk Management Professional Cer4fied Scrum Master Lean Six Sigma Green Belt Lean Six Sigma Black Belt Six Sigma Master Black Belt Informa4on Technology Infrastructure Library Dynamic System Development Methodology Atern © 2013 ExcelR Solutions. All Rights Reserved

My Introduction

1

2

3

4

DATA SCIENTIST

RESEARCH in ANALYTICS, DEEP LEARNING & IOT

DeloiHe Driven using US policies Infosys Driven using Indian policies under Large enterprises ITC Infotech Driven using Indian policies SME

HSBC Driven using UK policies © 2013 ExcelR Solutions. All Rights Reserved

AGENDA Why Forecas4ng Learn about the various examples of forecasHng Forecas4ng Strategy

EDA & Graphical Representa4on Forecas4ng Decomposi4on components

Why ng? Forecas4

Forecas4ng Strategy Learn about decomposing, forecasHng & combining EDA & Graphical Representa4on Learn about exploratory data analysis, scaKer plot, Hme plot, lag plot, ACF plot Forecas4ng components Learn about Level, Trend, Seasonal, Cyclical, Random components

AGENDA

Forecas4ng Models & Errors Learn about various forecasHng models to be discussed & the various error measures

Why Forecasting •  • 

Why forecast, when you would know the outcome eventually? Early knowledge is the key, even if that knowledge is imperfect –  For seQng producHon schedules, one needs to forecast sales –  For staffing of call centers, a company needs to forecast the demand for service –  For dealing with epidemic emergencies, naHons should forecast the various flu

© 2013 ExcelR Solutions. All Rights Reserved

Types of forecast

Short Term or Long Term

Micro Scale or Macro Scale

Point Forecast

Forecas4ng Classifica4on Qualita4ve or Quan4ta4ve Data or Judgment

© 2013 ExcelR Solutions. All Rights Reserved

Density Forecast Interval Forecast

Who generates Forecast?

© 2013 ExcelR Solutions. All Rights Reserved

Who generates Forecast?

© 2013 ExcelR Solutions. All Rights Reserved

Time series vs Cross-sectional data

01

Cross-sec4onal Data

02 Time Series Data © 2013 ExcelR Solutions. All Rights Reserved

Dataset for further discussion

Month

Jan-91 Feb-91 Mar-91 Apr-91 May-91 Jun-91 Jul-91 Aug-91 Sep-91 Oct-91 Nov-91 Dec-91 Jan-92 Feb-92 Mar-92 Apr-92 May-92 Jun-92

Monthly FooWalls of customers from Jan 1991 to March 2004

Footfall in thousands 1709 1621 1973 1812 1975 1862 1940 2013 1596 1725 1676 1814 1615 1557 1891 1956 1885 1623

t = 1, 2, 3,…....= Hme period index Yt = value of the series at Hme period t Yt+k = forecast for Hme period t+k, given data unHl Hme t et = forecast error for period t

© 2013 ExcelR Solutions. All Rights Reserved

Forecasting Strategy 01 02 03 04 05 06 07 08

Define Goal Data Collec4on Explore & Visualize Series Pre-Process Data Par44on Series Apply Forecas4ng Method(s) Evaluate & Compare Performance Implement Forecasts / System © 2013 ExcelR Solutions. All Rights Reserved

Forecasting Strategy – Step 1 #1 Is the goal descriptive or predictive?

#2 What is the forecast horizon?

• DescripHve = Time Series Analysis

• How far into the future? k in Yt+k • Rolling forward or at single Hme point?

• PredicHve = Time Series ForecasHng #3 How will the forecast be used?

• Who are the stakeholders? • Numerical or event forecast? • Cost of over-predicHon & under-predicHon

Define Goal

#4 Forecasting expertise & automation

• In-house forecasHng or consultants? • How many series? How ofen? • Data & sofware availability

© 2013 ExcelR Solutions. All Rights Reserved

Forecasting Strategy – Step 2 #1 Data Quality

#2 Temporal Frequency

• Typically small sample, so need good quality

• Should we use real-Hme Hcket collecHon data?

• Data same as series to be forecasted

• Balance between signal & noise • AggregaHon / DisaggregaHon

Data Collec-on #4. Domain exper4se

#3 Series Granularity? • Coverage of the data – Geographical, populaHon, Hme,… • Should be aligned with goal

• Necessary informaHon source • Affects modeling process from start to end • Level of communicaHon/ coordinaHon between forecasters & domain experts

© 2013 ExcelR Solutions. All Rights Reserved

Forecasting Strategy Step3 (Explore Series) Season al Addi4ve: PaHern s Yt = Level + Trend + Seasonality + Noise

Level

SYSTEMATIC PART

Trend Seasonal PaHerns

Mul4plica4ve: Yt = Level x Trend x Seasonality x Noise

NON-SYSTEMATIC PART Noise

© 2013 ExcelR Solutions. All Rights Reserved

Trend Component •  •  •  • 

Persistent, overall upward or downward paKern Due to populaHon, technology etc. Overall Upward or Downward Movement Several years duraHon

Response

Mo., Qtr., Yr.

© 2013 ExcelR Solutions. All Rights Reserved

Seasonal Component •  •  •  • 

Regular paKern of up & down fluctuaHons Due to weather, customs etc. Occurs within one year Example: Passenger traffic during 24 hours

Summer Response

Mo., Qtr. © 2013 ExcelR Solutions. All Rights Reserved

Irregular/Random/Noise Component •  ErraHc, unsystemaHc, ‘residual’ fluctuaHons •  Due to random variaHon or unforeseen events –  Union strike –  War •  Short duraHon & nonrepeaHng

© 2013 ExcelR Solutions. All Rights Reserved

Time Series Components

© 2013 ExcelR Solutions. All Rights Reserved

Time Plot •  Plots a variable against Hme index •  Appropriate for visualizing serially collected data (Hme series) •  Brings out many useful aspects of the structure of the data •  Example: Electrical usage for Washington Water Power (Quarterly data from 1980 to 1991)

© 2013 ExcelR Solutions. All Rights Reserved

Time plot Electrical power usage for Washington Water Power: 1980-1991 Power usage (KilowaHs)

1100 1000 900 800 700 600 500 400 1980

1982

1984

1986 Year

1988

© 2013 ExcelR Solutions. All Rights Reserved

1990

Observations •  There is a cyclic trend •  Maximum demand in first quarter; minimum in third quarter •  There may also be a slowly increasing trend (to be examined) •  Any reasonable forecast should have cyclic fluctuaHons •  Trend (if any) need to be uHlized for forecasHng •  Forecast would not be exact – there would be some error © 2013 ExcelR Solutions. All Rights Reserved

Time plot

© 2013 ExcelR Solutions. All Rights Reserved

Quarterly Sales of Ice-cream

© 2013 ExcelR Solutions. All Rights Reserved

Scatter Diagram •  Plots one variable against another •  One of the simplest tools for visualizaHon ž 

Example: Maintenance cost and Age for nine buses (Spokane Transit)

ž 

This is an example of cross-secHonal data (observaHons collected in a single point of Hme)

© 2013 ExcelR Solutions. All Rights Reserved

Cost 859 682 471 708 1094 224 320 651 1049

Age 8 5 3 9 11 2 1 8 12

Yearly cost of maintenance (US $)

Scatter Plot 1200 1000 800 600 400 200 0 0

2

4

6

8

Age of bus

© 2013 ExcelR Solutions. All Rights Reserved

10

12

14

Observations •  Older buses have higher cost of maintenance •  There is some variaHon (case to case) •  The rise in cost is about $ 80 per year of age •  It may be possible to use ‘age’ to forecast maintenance cost •  Forecast would not be a ‘certain’ predicHon – there would be some error © 2013 ExcelR Solutions. All Rights Reserved

Lag plot •  Plots a variable against its own lagged sample •  Brings out possible associaHon between successive samples •  Example: Monthly sale of VCRs by a music store in a year = Number of VCRs sold in Hme period t = Number of VCRs sold in Hme period t – k © 2013 ExcelR Solutions. All Rights Reserved

Example of lagged variables Number of VCRs sold in a month Time 1 2 3 4 5 6 7 8 9 10 11 12

Original 123 130 125 138 145 142 141 146 147 157 150 160

Lagged one step 123 130 125 138 145 142 141 146 147 157 150 © 2013 ExcelR Solutions. All Rights Reserved

Lagged two steps 123 130 125 138 145 142 141 146 147 157

Lag plot (k = 1) ScaHer plot of VCR sales with 1-step lagged VCR sales 160 155 150 145 140 135 130 125 120 120

125

130

135

140

145

© 2013 ExcelR Solutions. All Rights Reserved

150

155

160

Observations •  There is a reasonable degree of associaHon between the original variable and the lagged one •  Value of lagged variable is known beforehand, so it is useful for predicHon •  AssociaHon between original and lagged variable may be quan+fied through a correlaHon © 2013 ExcelR Solutions. All Rights Reserved

Autocorrelation •  CorrelaHon between a variable and its lagged version (one Hme-step or more)

= ObservaHon in Hme period t = ObservaHon in Hme period t – k = Mean of the values of the series = AutocorrelaHon coefficient for k-step lag

© 2013 ExcelR Solutions. All Rights Reserved

Standard error of rk •  The standard error is The standard error of the mean esHmates the variability between samples whereas the standard deviaHon measures the variability within a single sample.

•  Increases progressively with k, but eventually reaches a maximum value •  If the ‘true’ autocorrelaHon is 0, then the esHmate rk should be in the interval (– 2SE(rk), 2SE(rk)) 95% of the Hme •  SomeHmes SE(rk) is approximated by © 2013 ExcelR Solutions. All Rights Reserved

Correlogram or ACF plot •  Plots the ACF or AutocorrelaHon funcHon (rk) against the lag (k) •  Plus-and-minus two-standard errors are displayed as limits to be exceeded for staHsHcal significance •  Reveals lagged variables that can be potenHally useful for forecasHng © 2013 ExcelR Solutions. All Rights Reserved

Correlogram for VCR data

© 2013 ExcelR Solutions. All Rights Reserved

ACF plot for electricity usage data

© 2013 ExcelR Solutions. All Rights Reserved

Observations •  Every alternate sample is large, many of them staHsHcally significant also •  ACFs at lags 4, 8, 12, etc are posiHve •  ACF at lags 2,6,10 etc are negaHve •  All these pick up the seasonal aspect of the data •  The data may be re-examined afer ‘removing’ seasonality © 2013 ExcelR Solutions. All Rights Reserved

ACF of de-seasoned KW data

© 2013 ExcelR Solutions. All Rights Reserved

Observations •  De-seasoned series has small ACFs •  This part of the data has liKle forecasHng value

© 2013 ExcelR Solutions. All Rights Reserved

Typical questions in exploratory analysis Is there a TREND?

All the plots contain informaHon regarding these quesHons

Is there a SEASONALITY?

Are the data RANDOM?

© 2013 ExcelR Solutions. All Rights Reserved

Time series plots

© 2013 ExcelR Solutions. All Rights Reserved

Effect of omission of data on the Time series plot

© 2013 ExcelR Solutions. All Rights Reserved

Effect of omission of data on the Time series plot

© 2013 ExcelR Solutions. All Rights Reserved

0

0

20

20

y 40

y 40

60

60

80

80

Confusing kind of trend due to other type of scaling

5

10 t

15

20

0

5

10 t

15

20

0

1

0

1

Log t

2

3

2

3

2.5

2.5

3

Log y 3.5 4

Log y 3 3.5 4

4.5

4.5

0

© 2013 ExcelR Solutions. All Rights Reserved

Log t

Few points on Plots Plot helps us to summarize & reveal paKerns in data Graphics help us to idenHfy anomalies in data Plot helps us to present a huge amount of data in small space & makes huge data set coherent To get all the advantages of plot, the “Aspect RaHo” of plot is very crucial The raHo of Height to Width of a plot is called the ASPECT RATIO

© 2013 ExcelR Solutions. All Rights Reserved

Aspect Ratio •  Generally aspect raHo should be around 0.618 •  However, for long Hme series data aspect raHo should be around 0.25. To understand the impact of aspect raHo see the two plots in the next two slides

© 2013 ExcelR Solutions. All Rights Reserved

Aspect ratio

© 2013 ExcelR Solutions. All Rights Reserved

Aspect ratio

© 2013 ExcelR Solutions. All Rights Reserved

Preliminaries for Step 3 of 8-Step forecasting strategy

?

Should we use all historical data for forecas4ng

Training Data Valida4on Data

Solu4on = DATA PARTIONING

Fit the model only to TRAINING period Assess performance on VALIDATION period

© 2013 ExcelR Solutions. All Rights Reserved

Partitioning

Deploy model by joining Training + ValidaHon to forecast the Future © 2013 ExcelR Solutions. All Rights Reserved

How to choose a Validation Period? Forecast Horizon

Seasonality

Strategy to choose Valida4on Data Period Length of series

Underlying condi4ons affec4ng series © 2013 ExcelR Solutions. All Rights Reserved

Rolling-forward forecasts

© 2013 ExcelR Solutions. All Rights Reserved

NAÏVE Forecasts Forecast method: Last sample k-step ahead Seasonal series ( M series )

Ft+k = Yt Ft+k = Yt-M+k

© 2013 ExcelR Solutions. All Rights Reserved

Forecast error •  Forecast error is •  If model is adequate, forecast error should contain no informaHon •  Plots of et should resemble that of ‘white noise’ or uncorrelated random numbers with 0 mean and constant variance (There should be NO PATTERN)

© 2013 ExcelR Solutions. All Rights Reserved

Forecast error •  Forecast error can follow different distribuHons based on business context

© 2013 ExcelR Solutions. All Rights Reserved

Forecasting Errors

© 2013 ExcelR Solutions. All Rights Reserved

Evaluating Predictive Accuracy •  Mean error •  Mean absolute deviation •  Mean squared error •  Root mean squared error •  Mean percentage error •  Mean absolute percentage error © 2013 ExcelR Solutions. All Rights Reserved

Typical plots of ‘White noise’ Time plot

Lag plot ACF plot

Histogram © 2013 ExcelR Solutions. All Rights Reserved

Mean error (ME) •  If the ME is around zero, forecasts are called unbiased. Model is unbiased to overestimation or the underestimation. Certainly this is a desirable property of a model Actual data

Forecast based on Model 1

Error from model 1

Forecast based on Model 2

Error from model 2

100

101

1

110

10

200

199

-1

190

-10

300

301

1

310

10

400

399

-1

390

-10

ME

0 © 2013 ExcelR Solutions. All Rights Reserved

0

Mean error •  Mean error has the disadvantage that small amount and large amount of error may have same effect •  To overcome this problem we may define two different forecast performance measure •  1. Mean Absolute DeviaHon:

•  2. Mean Square Error: © 2013 ExcelR Solutions. All Rights Reserved

MAD & MSE Actual data 100

Forecast based on Model 1 101

Error from model 1 1

Forecast based on Model 2 110

Error from model 2 10

200

199

-1

190

-10

300

301

1

310

10

400

399

-1

390

-10

MAD



1



10

MSE



1



100

ME



0



0

© 2013 ExcelR Solutions. All Rights Reserved

Problem with ME, MAD, MSE •  All these three measures are not unit free and also not scale free •  Just think of a case that one is forecasHng sales figures. Someone in India using rupee figure, and somebody else in USA is expressing the same sales figure in dollar. Both are using the same model. However forecast measure will differ. This is a very awkward situaHon •  MSE has the added disadvantage that its unit is in square. RMSE does not have this added disadvantage •  So we need unit free measure © 2013 ExcelR Solutions. All Rights Reserved

MPE and MAPE---Unit free measure

•  Both expressed in percentage form •  Both are unit free © 2013 ExcelR Solutions. All Rights Reserved

Last Sample: Number of customers requiring repair work Customers Fitted value Residual Y_t e_t 58 54 58 -4 60 54 6 55 60 -5 62 55 7 62 62 0 65 62 3 63 65 -2 70 63 7 MAD 4.25

|e_t|

e_t^2

e_t/Y_t

|e_t/Y_t|

4 6 5 7 0 3 2 7 MSE 23.5

16 36 25 49 0 9 4 49 RMSE 4.85

-0.07407 0.1 -0.09091 0.112903 0 0.046154 -0.03175 0.1 MPE 0.0203

0.074074 0.1 0.090909 0.112903 0 0.046154 0.031746 0.1 MAPE 0.0695

Forecast method: Last sample © 2013 ExcelR Solutions. All Rights Reserved

MA: Number of customers requiring repair work Customers Y_t 58 54 60 55 62 62 65 63 70

Fitted value

Residual e_t

|e_t|

e_t^2

e_t/Y_t

|e_t/Y_t|

57.3333 56.3333 59.0000 59.6667 63.0000 63.3333

-2.3333 5.6667 3.0000 5.3333 0.0000 6.6667 MAD 3.83

2.3333 5.6667 3.0000 5.3333 0.0000 6.6667 MSE 19.91

5.4444 32.1111 9.0000 28.4444 0.0000 44.4444 RMSE 4.46

-0.0424 0.0914 0.0484 0.0821 0.0000 0.0952 MPE 0.0458

0.0424 0.0914 0.0484 0.0821 0.0000 0.0952 MAPE 0.0599

Forecast method: 3-point moving average © 2013 ExcelR Solutions. All Rights Reserved

Challenges

Zero Counts Missing values Compute average metrics Exclude missing values

MAE/RMSE: no problem Cannot compute MAPE Exclude Zero count Use alternate measure - MASE



© 2013 ExcelR Solutions. All Rights Reserved

Forecast / Prediction Interval

1 2 3

Probability of 95% that the value will be in the range [a,b] If the forecast errors are normal, predic4on interval is σ = es4mated standard devia4on of forecast errors k = some mul4ple (k=2 corresponds to 95% probability)

Challenges to formula

• Errors ofen non-normal • If model is biased (over/under-forecasts), symmetric interval around Ft+k? • EsHmaHng the error standard deviaHon is tricky One soluHon is transforming errors to normal

© 2013 ExcelR Solutions. All Rights Reserved

Forecast / Prediction Interval – Non-Normal To construct predicHon interval for 1-step-ahead forecasts 1. Create roll-forward forecasts (Ft+1) on validaHon period 2. Compute forecast errors 3. Compute percenHles of error distribuHon (e(5)=5th percenHle; e(95)=95th percenHle) 4. PredicHon interval: [ Ft+1 + e(5) , Ft+1 + e(95) ] In Excel =percen+le 5th percenHle = -307.0 95th percenHle = 292.8 95% predicHon interval for 1-step ahead forecast Ft+1: [Ft+1 – 307 , Ft+1 + 292.8] © 2013 ExcelR Solutions. All Rights Reserved

Forecasting Different Methods

•  •  •  •  • 

Linear regression Autoregressive models ARIMA LogisHc regression Econometric models

Model based

Data driven

© 2013 ExcelR Solutions. All Rights Reserved

•  Naïve forecasts •  Smoothing •  Neural nets

Forecasting Different Methods Linear Model: Yt = βo + β1t + ε ExponenHal Model: Log (Yt) = βo + β1t + ε QuadraHc Model: Yt = βo + β1t + β2t2 + ε AddiHve Seasonality: Yt = βo + β1DJan + β2DFeb + β3DMar + …...+ β11DNov + ε AddiHve Seasonality with QuadraHc Trend: Yt = βo + β1t + β2t2 + β3DJan + β4DFeb + β5DMar + …...+ β13DNov + ε MulHplicaHve Seasonality: Log (Yt) = βo + β1DJan + β2DFeb + β3DMar + …...+ β11DNov + ε © 2013 ExcelR Solutions. All Rights Reserved

Irregular Component Irregular Components

Solutions

• Outliers

• Remove unusual periods from the model

• Special Events • Model separately • Interventions • Keep in the model, using dummy variable © 2013 ExcelR Solutions. All Rights Reserved

External Information ForecasHng Airline Ticket Price

ForecasHng Internet Sales

Fuel price impacts the airline Hcket

Amount spend in adverHsements

Airfaret = b0 + b1 (Petrol Price)t + e Must be forecasted

Sales(t) = g{ f(sales(t-1, t-2, ... , t-6), a1*SQRT[AdSpend(t-1)] + ... + a6*SQRT[AdSpend(t-6)] }

© 2013 ExcelR Solutions. All Rights Reserved

Linear Regression for forecasting

1

2

Global Trend

Seasonality



•  Linear Trend (constant growth) •  ExponenHal Trend (% growth)



•  AddiHve (Y) •  MulHplicaHve log(Y)

© 2013 ExcelR Solutions. All Rights Reserved

3 Irregular PaHerns

Autoregressive (AR) Models •  AR model is used to forecast errors •  AR model captures autocorrelaHon directly •  AutocorrelaHon measures how strong the values of a Hme series are related to their own past values •  Lag(1) autocorrelaHon = correlaHon between (y1, y2, …, yt-1 ) and (y2,y3,…, yt) •  Lag(k) autocorrelaHon = correlaHon between (y1, y2, …, yt-k) and (yk+1,yk+2,…,yt) © 2013 ExcelR Solutions. All Rights Reserved

Autocorrelation & its uses Check forecast errors for independence

Model remaining information

Evaluate predictability © 2013 ExcelR Solutions. All Rights Reserved

Autoregressive Model •  MulH-layer model •  Model the forecast errors, by treaHng them as a Hme series •  Then examine autocorrelaHon of “errors of forecast errors” ? ü  If autocorrelaHon exists, fit an AR model to the forecast errors series ü  If autocorrelated, conHnue modeling the level-2 errors (not pracHcal) •  AR model can also be used to model original data Yt = α + β1Yt-1 + β2Yt-2 + εt -> AR(2), order = 2 1-step ahead forecast: Ft+1 = α + β1Yt + β2Yt-1 2-steps ahead: Ft+2 = α + β1Ft+1 + β2Yt 3-steps ahead: Ft+3 = α + β1Ft+2 + β2Ft+1 © 2013 ExcelR Solutions. All Rights Reserved

Autoregressive Model •  Use level 1 to forecast next value of series Ft+1 ^

•  Use AR to forecast next forecast error (residual) Et+1 •  Combine the two to get an improved forecast F*t+1 ^ F*t+1 = Ft+1 + Et+1

© 2013 ExcelR Solutions. All Rights Reserved

Random Walk •  Specific case of AR(1) model •  If β1 = 1 in AR(1) model then it is called as Random Walk •  EquaHon will be Yt = a + Yt-1 + εt a = drif parameter σ(std of ε) = volaHlity

•  Changes from one period to the next are random •  How to find out whether there in random walk to not in the data? •  Run AR(1) model & check for the value of β1 •  Do a differenced series and run ACF plot •  How to esHmate drif & volaHlity? © 2013 ExcelR Solutions. All Rights Reserved

Random Walk •  One-step-ahead forecast: Ft+1 = a + Yt •  Two-step-ahead forecast: Ft+2 = a + Yt+1= 2a + Yt •  k-step-ahead forecast : Ft+k = ka + Yt •  If the drif parameter is 0, then the k-step-ahead forecast is Ft+k = Yt for all k

© 2013 ExcelR Solutions. All Rights Reserved

Model based approaches & drawbacks

© 2013 ExcelR Solutions. All Rights Reserved

Model vs Data based approaches Model Based Approach

01

Past is SIMILAR to Future

Data Based Approach

02

Past is NOT SIMILAR to Future © 2013 ExcelR Solutions. All Rights Reserved

Forecast methods based on smoothing There are two major forecasHng techniques based on smoothing •  Success depends on choosing window width •  Balance between over & under smoothing

–  Moving averages

–  ExponenHal smoothing

© 2013 ExcelR Solutions. All Rights Reserved

Smoothing – Moving Average Smoothing Noise

ForecasHng

4 uses

•  • 

Forecast future points by using an average of several past points More suitable for series with no Trend & no seasonality

Data VisualizaHon

•  • 

Removing Seasonality & CompuHng seasonal indexes © 2013 ExcelR Solutions. All Rights Reserved

A Hme-plot of the MA reveals the Level & Trend of a series It filters out the seasonal & random components

Moving Average - Calculations Trailing Moving Average It is calculated based on a window from Hme ‘t’ & backwards

Centered Moving Average It is calculated based on a window centered around Hme ‘t’ 6000 5250

3750 3000 2250

Q

186 386 Q 18 Q 7 387 Q 18 Q 8 38 Q 8 189 Q 38 Q 9 190 Q 39 Q 0 19 Q 1 391 Q 19 Q 2 39 Q 2 193 Q 39 Q 3 194 Q 39 Q 4 19 Q 5 395 Q 196

1500

Q

Sales

4500

Quarter

© 2013 ExcelR Solutions. All Rights Reserved

Calculation – Trailing MA 1. 

Choose window width (W)

2. 

For MA at Hme t, place window on Hme points t-W+1,…,t

W=5 t-3 t-2

t-4 3. 

t-1

t

Compute average of values in the window:

MAt =

yt −W +1 + yt −W + 2 + ! + yt −1 + yt W

© 2013 ExcelR Solutions. All Rights Reserved

Calculation – Centered MA Compute average of values in window (of width W), which is centered at t Odd width: center window on Hme t and average the values in the window Even width: take the two “almost centered” windows and average the values in them

W=4

W=4 t-2

W=5 t-1 t t+1

⎛yyt − 2++yyt −1++yyt++yyt +1+ y⎞ t −1 t t +1 + ⎟t + 2 MAt =⎜ t −2 4 ⎟/2 5 MAt = ⎜ ⎜ yt −1 + yt + yt +1 + yt + 2 ⎟ ⎜ ⎟ 4 ⎝ ⎠

t+2

© 2013 ExcelR Solutions. All Rights Reserved

Moving Average Hands On

© 2013 ExcelR Solutions. All Rights Reserved

Exponential Smoothing Simple Exponen4al Smoothing •  No Trend •  No Seasonality •  Level •  Noise (cannot be modeled)

Holt’s method •  Also called double exponenHal •  Trend •  No Seasonality

Winter’s method

•  Assigns more weight to most recent observaHons

•  Trend •  Seasonality •  Variants are possible

•  Assigns less weight to farthest observaHons

© 2013 ExcelR Solutions. All Rights Reserved

Simple Exponential Smoothing Forecasts = es+mated level at most recent Hme point: Ft+k = Lt AdapHve algorithm: adjusts most recent forecast (or level) based on the actual data:

Lt = αYt + (1-α) Lt-1

α = the smoothing constant (0<α≤ 1) IniHalizaHon: F1 = L1 = Y1

© 2013 ExcelR Solutions. All Rights Reserved

Simple Exponential Smoothing The formula: Lt = αYt + (1-α) Lt-1 Substitute Lt with its own formula: Lt

= αYt + (1-α)[ αYt-1 + (1-α) Lt-2] = = αYt + α (1-α)Yt-1 + (1-α)2 Lt-2 = … = αYt + α (1-α)Yt-1 + α (1-α)2 Yt-2 +… © 2013 ExcelR Solutions. All Rights Reserved

Simple Exponential Smoothing The formula: Lt = αYt + (1-α) Lt-1 ^

Yt+1 = Lt = Lt-1 + α (Yt - Lt-1 ) ^

^

= Yt + α (Yt - Yt ) ^

= Yt + α Et update previous forecast

By an amount that depends on the

error in the previous forecast

α controls the degree of “learning” © 2013 ExcelR Solutions. All Rights Reserved

Smoothing Constant ‘α’ α determines how much weight is given to the past α =1: past observations have no influence over forecasts (undersmoothing) = αYt + α (1-α)Yt-1 + α (1-α)2 Yt-2 +…

α→0: past observations have large influence on forecasts (oversmoothing) Selecting α “Typical” values: 0.1, 0.2 Trial & error: effect on visualization Minimize RMSE or MAPE of training data © 2013 ExcelR Solutions. All Rights Reserved

Exponential Smoothing Hands On

© 2013 ExcelR Solutions. All Rights Reserved

MA vs ES

•  Assigns equal weights to all past observaHons •  BeKer to forecast when data & environment is not volaHle

MA

ES

•  Window width is key to success

•  Assigns more weight to recent observaHons than past observaHons •  BeKer to forecast when data & environment is volaHle •  Smoothing constant (α) value is key to success

© 2013 ExcelR Solutions. All Rights Reserved

De-trending & De-seasoning

1

•  To remove trend and/or seasonality, fit a regression model with trend and/or seasonality •  Series of forecast errors should be de-trended & de-seasonalized

Regression

•  Simple & popular for removing trend and / or seasonality from a Hme series •  Lag-1 difference: Yt – Yt-1 (For removing trend) ; Lag-M difference: Yt – Yt-M (For removing seasonality) •  Double – differencing: difference the differenced series

2 Differencing



• 

Uses moving average to remove seasonality

• 

Generates seasonal indexes as a byproduct

© 2013 ExcelR Solutions. All Rights Reserved

3 Ra4o to Moving average

Seasonal Indexes For a series with M seasons: Sj = seasonal index for the jth season indicates the exceedance of Y on season j above/below the average of Y in a complete cycle of seasons Make sense out of this statement:

“Daily sales at retail store shows that Friday has a seasonal index of 1.30 and Monday has an index of 0.65”

Let us put in easy terms: “Friday sales is 30% higher than the weekly average, and Monday sales is 35% lower than the weekly average sales” Average of the M seasonal indexes is 1 (they must sum to M)

© 2013 ExcelR Solutions. All Rights Reserved

Seasonal Indexes 1. 

Construct the series of centered moving averages of span M

2. 

For each t, compute the raw seasonals = Yt / Mat

3. 

Sj = average of raw seasonals belonging to season j (normalize to ensure that seasonal indexes have average=1)

De-seasonalized (=seasonally-adjusted) series: •  •  •  • 

If done appropriately, de-seasonalized series will not exhibit seasonality If so, examine for trend and fit a model This model will yield de-seasonalized forecasts Convert forecasts by re-seasonalizing, i.e. multiply them by the appropriate seasonal index © 2013 ExcelR Solutions. All Rights Reserved

The seasonally-adjusted sales for Q1-86 are in the range 1.  $1500-$1700 (million) 2.  $1700-$1800 (million) 3.  $1800-$1900 (million) 4.  $1900-$2000 (million)

1734.83 / 0.8785 = $1974.8 mil

© 2013

Quarter Sales Q1_86 1734.83 Q2_86 2244.96 Q3_86 2533.80 Q4_86 2154.96 Q1_87 1547.82 Q2_87 2104.41 Q3_87 2014.36 Q4_87 1991.75 ExcelR Solutions. All

Centered MA with W=4

Rights

raw seasonal 2143.76 1.18194269 2102.82 1.02479749 2020.32 0.76612585 1934.99 1.08755859 1954.74 1.03050222 2021.05 0.9855033 Reserved

s(j) seasonal index 1.062660535 1.063872992 0.963134878 0.96423378 0.878528598 0.879530967 1.09567599 1.096926116

Advanced Exponential Smoothing

© 2013 ExcelR Solutions. All Rights Reserved

Holt’s / Double Exponential Method Forecasts = most recent es+mated level + trend ^ Yt+k = Lt + k Tt Lt = αYt + (1-α)( Lt-1 + Tt-1)

Tt = β (Lt -Lt-1) + (1- β) Tt-1 •  • 

Global Trend = Linear Regression Model Local Trend = ExponenHal Model

•  It is always beKer to choose default ‘α’ & ‘β’ values (0.2, 0.15) •  What happens when α = 0 ? •  What happens when β = 0 ?

© 2013 ExcelR Solutions. All Rights Reserved

Winter’s Method Forecasts = most recent es+mated level + trend + Seasonal ^ Y t+k = (Lt + k Tt) * St-k+M •  St = seasonal index of period ‘t’ •  M = number of seasons Yt Lt = α + (1- α )( L t -1 + Tt -1 ) Level: St − M Trend (same as Holt’s): Tt = β ( L t − L t -1 ) + (1- β ) Tt -1 Seasonality (mulHplicaHve): S t = γ Yt + (1- γ ) S t -M Lt

© 2013 ExcelR Solutions. All Rights Reserved

All 3 models – Generic All three smoothing constants (α, β, γ) will be in the range: 0 to 1 It is always beKer to choose default ‘α’, ‘β’, ‘γ’ values (0.2, 0.15, 0.05) IniHalizaHon (technical): •  L1 = Y1 or L1=a from esHmated model Yt = a + bt •  T1 = Y2-Y1 or T1 = (YT-Y1) / T (avg overall trend) •  IniHal seasonal indexes = MA indexes (that we saw earlier) © 2013 ExcelR Solutions. All Rights Reserved

AR(1) model •  Yt = φ0 + φ1Yt – 1 + εt , εt white noise

ACF plot © 2013 ExcelR Solutions. All Rights Reserved PACF (parHal ACF) plot

AR(p) model •  •  •  • 

Yt = φ0 + φ1Yt – 1 + φ2Yt – 2 + … + φpYt – p + εt , εt white noise Such a model has non-zero ACF at all lags However, only the first p PACFs are non-zero; the rest are zero If PACF plot shows large PACFs only at a few lags, then AR model is appropriate

•  If an AR model is to be fitted, the parameters φ0, φ1, φ2,…, φp have to be estimated from the data, under the restriction that the estimated values should guarantee a stationary process

© 2013 ExcelR Solutions. All Rights Reserved

MA(1) model •  Yt = θ0 + εt + θ1 εt – 1 , εt white noise θ1 = 0.8

θ1 = – 0.8 ACF plot

PACF (parHal ACF) plot

© 2013 ExcelR Solutions. All Rights Reserved

MA(q) model •  Yt = θ0 + εt + θ1 εt – 1 + θ2 εt – 2 + … + θq εt – q , εt white noise •  Such a model has non-zero PACF at all lags •  However, only the first q ACFs are non-zero; the rest are zero •  If ACF plot shows large ACFs only at a few lags, then MA model is appropriate •  If an MA model is to be fitted, the parameters θ0, θ1, θ2,…, θq have to be estimated from the data © 2013 ExcelR Solutions. All Rights Reserved

ARMA(p,q) model •  Yt = φ0 + φ1Yt – 1 + φ2Yt – 2 + … + φpYt – p + εt + θ 1 εt – 1 + θ 2 εt – 2 + … + θ q εt – q ,

εt white noise

•  Such a model has non-zero ACF and non-zero PACF at all lags •  If an ARMA(p,q) model is to be fitted, the parameters φ0, φ1, φ2,…, φp, θ1, θ2,…, θq have to be estimated from the data, under the restriction that the estimated values produce a stationary process •  AR(p) is ARMA(p,0) •  MA(q) is ARMA(0,q) © 2013 ExcelR Solutions. All Rights Reserved

ARIMA(p,d,q) model •  If d-times differenced series is ARMA(p,q), then original series is said to be ARIMA(p,d,q). •  ARIMA stands for ‘Autoregressive Integrated Moving average’. •  If Wt is the differenced version of Yt, i.e., Wt = Yt – Yt – 1, then Yt can be written as Y t = W t + W t – 1 + W t – 2 + Wt – 3 + … . Thus, the series Yt is an ‘integrated’ (opposite of ‘differenced’) version of the series Wt. •  If Yt is ARIMA(p,d,q), it is non-stationary. •  However, its d-times differenced version, an ARMA(p,q) process, can be stationary. © 2013 ExcelR Solutions. All Rights Reserved

Box-Jenkins ARIMA model-building •  Model identification –  If the time plot ‘looks’ non-stationary, difference it until the plot looks stationary –  Look at ACF and PACF plots for possible clue on model order (p, q) –  When in doubt (regarding choice of p and q), use the principle of parsimony: A simple model is better than a complex model

•  •  •  • 

Estimate model parameters Check residuals for health of model Iterate if necessary Forecast using the fitted model © 2013 ExcelR Solutions. All Rights Reserved

THANK YOU

© 2013 ExcelR Solutions. All Rights Reserved

Loading...

data science course in hyderabad

Forecasting Time Series My Introduction Name: Bharani Kumar Educa+on: IIT Hyderabad Indian School of Business Professional cer+fica+ons: PMP Project ...

7MB Sizes 2 Downloads 0 Views

Recommend Documents

data science course Hyderabad
ExcelR is a proud partner of Universit Malaysia Saravak (UNIMAS), Malaysia’s 1st public University and ranked 8th top un

data science course fee in hyderabad
Data Science is all about mining hidden insights of data pertaining to trends, behaviour, interpretation and inferences

Data Science training in Hyderabad
ExcelR offers Data Science course in Hyderabad, the most comprehensive Data Science course in the market, covering the c

data science training in hyderabad
ExcelR offers Data Science course in Hyderabad, the most comprehensive Data Science course in the market, covering the

best data science training in hyderabad
ExcelR is considered to be the best Data Science training institute in Hyderabad which offers a gamut of services starti

data science course in bangalore
Data Science certification training course from ExcelR equips you with essential Data Science skills to make you a succe

data science course in mysore
Business Analytics or Data Analytics or Data Science certification course is an extremely high-in-demand profession wh

data science course in gurgaon
Data Science is all about mining hidden insights of data pertaining to trends, behaviour, interpretation and inferences

data science course in bangalore
Data Science certification training course from ExcelR equips you with essential Data Science skills to make you a succe

data science course
Data Science is all about mining hidden insights of data pertaining to trends, behaviour, interpretation and inferences