Process for Data analysis

Multi variate Data Analysis: a process I use to understand and quantify brand response from regression
Generally the aim is to answer a question posed by a client, holding other factors or variables constant.
Examples might be:
To understand if the month or weather has an influence on sales
To estimate the contribution to sales from advertising spend
To estimate the contribution to response for television, and or a second medium

The tools to answer the questions I use are first looking at the data in a chart form, then applying R, a software system used by Auckland University Statistics department, and Excel Analysis pack, and my experience in advertising and media specifically.

The process to find the answer involves a number of steps.  Seldom does one process or solution fit all projects

Collect and check through all the data
I make sure I understand the data fully – is the data week ending or week commencing? – this  is a good question to start with
Look to see if adstock or response decay is a factor. I may for instance find that the half life for an advertisement in the NZ WW is longer than for Womans Day. That could be because only the latter has TV listings which date the issue more quickly and encourage readers to put the issue aside and read the “current issue”
I start with a simple multi variate regression – with all variables included – making sure the explanatory variables don’t explain each other.
When I have a good working model, I check the residual series – the difference between the estimated response (from the fitted model) and the actual response. The series needs to be centred around zero (the model with over and under estimate response) and should have no pattern left in it, otherwise I haven’t captured all the variables in the model.
If the series is not centred around zero, or has a pattern, is not “normally distributed” I run through a series of steps to transform either the response or one or more of the explanatory variables until a good looking residual series has been obtained.  Some typical transformations are to log the response, to use a quadratic, cubic, or very occasionally cubic transformation to one or more explanatory variables.

With a model that explains a good percentage of the response, or at least when the residual series is satisfactory, the final step is to interpret the results and write up a report in language the reader can understand.

A good residual series should look something like this…