python如何运用ols_使用OLS回归预测未来值(Python,StatsModels,Pandas)

I'm currently trying to implement a MLR in Python and am not sure how I go about applying the coefficients I've found to future values.

import pandas as pd

import statsmodels.formula.api as sm

import statsmodels.api as sm2

TV = [230.1, 44.5, 17.2, 151.5, 180.8]

Radio = [37.8,39.3,45.9,41.3,10.8]

Newspaper = [69.2,45.1,69.3,58.5,58.4]

Sales = [22.1, 10.4, 9.3, 18.5,12.9]

df = pd.DataFrame({'TV': TV,

'Radio': Radio,

'Newspaper': Newspaper,

'Sales': Sales})

Y = df.Sales

X = df[['TV','Radio','Newspaper']]

X = sm2.add_constant(X)

model = sm.OLS(Y, X).fit()

>>> model.params

const -0.141990

TV 0.070544

Radio 0.239617

Newspaper -0.040178

dtype: float64

So let's say I want to predict out "sales" for the following DataFrame:

EDIT

TV Radio Newspaper Sales

230.1 37,8 69.2 22.4

44.5 39.3 45.1 10.1

... ... ... ...

25 15 15

30 20 22

35 22 36

I've been trying a method I found here but I can't seem to get it working: Forecasting using Pandas OLS

Thank you!

解决方案

Assuming df2 is your new out of sample DataFrame:

model = sm.OLS(Y, X).fit()

new_x = df2.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']].values

new_x = sm2.add_constant(new_x) # sm2 = statsmodels.api

y_predict = model.predict(new_x)

>>> y_predict

array([ 4.61319034, 5.88274588, 6.15220225])

You can assign the results directly to df2 as follows:

df2.loc[:, 'Sales'] = model.predict(new_x)

To fill missing Sales values from the original DataFrame with predictions from your regression, try:

X = df.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']]

X = sm2.add_constant(X)

Y = df[df.Sales.notnull()].Sales

model = sm.OLS(Y, X).fit()

new_x = df.loc[df.Sales.isnull(), ['TV', 'Radio', 'Newspaper']]

new_x = sm2.add_constant(new_x) # sm2 = statsmodels.api

df.loc[df.Sales.isnull(), 'Sales'] = model.predict(new_x)