Predicting Calgary Rent Prices Using Regression Analysis

Chanon Sumpantapong
10 min readApr 13, 2024

--

Can You Guess How Much Rent Will Cost in Calgary?
Calgary,AB,Canada

If you live in Calgary or want to move there, you know that rent prices can be hard to understand. You might be wondering if they’ll keep going up. Or maybe there’s a way to predict what they’ll be in the future?

That’s exactly what I was curious about for my project. In this post, We’re going to talk about Calgary rent prices. We’ll be using some data from the Canada Mortgage and Housing Corporation (CMHC) to see if there are any patterns and if we can use those patterns to make some educated guesses about future rents.

So, are you ready to crack the code of Calgary rent prices? Let’s get started!

Step 1: Get Data and Clean it

  • The data sources include rental cost data from CMHC and Canadian interest rate data from the Bank of Canada. After cleaning and joining the data, we have a comprehensive dataset covering the years 2015–2023.
#import csv file and read it 
# name Historical Rental Market Statistics Summary - TableExport_1.csv
# name Historical Rental Market Statistics Summary - TableExport_2.csv
# name Historical Rental Market Statistics Summary - TableExport_3.csv
# name Historical Rental Market Statistics Summary - TableExport_bac.csv
# name Historical Rental Market Statistics Summary - TableExport_interest.csv
# and join data from all files into one file

import csv
import os
import sys
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# read csv file
def read_csv(file):
with open(file
, newline='') as csvfile:
reader = csv.reader(csvfile)
data = list(reader)
return data

# write csv file
def write_csv(file, data):
with open(file, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(data)

# read all csv files
file1 = 'Historical Rental Market Statistics Summary - TableExport_1.csv'
file2 = 'Historical Rental Market Statistics Summary - TableExport_2.csv'
file3 = 'Historical Rental Market Statistics Summary - TableExport_3.csv'
file4 = 'Historical Rental Market Statistics Summary - TableExport_bac.csv'
file5 = 'Historical Rental Market Statistics Summary - TableExport_interest.csv'

data1 = read_csv(file1)
data2 = read_csv(file2)
data3 = read_csv(file3)
data4 = read_csv(file4)
data5 = read_csv(file5)
#read data1
data1 = pd.read_csv(file1)
#Change Date to be datetime type show only year
data1['Date'] = pd.to_datetime(data1['Date'])
data1['Date'] = data1['Date'].dt.year
#Change Date to be index
#data1 = data1.set_index('Date')

#read data2
data2 = pd.read_csv(file2)
#Change Date to be datetime type show only year
data2['Date'] = pd.to_datetime(data2['Date'])
data2['Date'] = data2['Date'].dt.year

#read data3
data3 = pd.read_csv(file3)
#Change Date to be datetime type show only year
data3['Date'] = pd.to_datetime(data3['Date'])
data3['Date'] = data3['Date'].dt.year

#read data4
dataBac = pd.read_csv(file4)
#Change Date to be datetime type show only year
dataBac['Date'] = pd.to_datetime(dataBac['Date'])
dataBac['Date'] = dataBac['Date'].dt.year


#read data5
dataInt = pd.read_csv(file5)
#Change Date to be datetime type show only year
dataInt['Date'] = pd.to_datetime(dataInt['Date'])
dataInt['Date'] = dataInt['Date'].dt.year
Data 1: bed room
#data1 drop columns Availability Rate (%)_1,% Change_1,Median Rent ($)_1
data1 = data1.drop(columns=['Availability Rate (%)_1','% Change_1','Median Rent ($)_1'])
#change data in Average Rent ($)_1 column type to be integer
data1['Average Rent ($)_1'] = data1['Average Rent ($)_1'].str.replace(',', '').astype(int)
#change data in Units_1 column type to be integer
data1['Units_1'] = data1['Units_1'].str.replace(',', '').astype(int)
Sample Data1 : 1 Bedroom (Necessary columns)
#joint data1, data2, data3, dataBac, dataInt
data = pd.merge(data1, data2, on='Date', how='outer')
data = pd.merge(data, data3, on='Date', how='outer')
data = pd.merge(data, dataBac, on='Date', how='outer')
data = pd.merge(data, dataInt, on='Date', how='outer')
#add copy Date column to new column Year
data['Year'] = data['Date']
data = data.set_index('Date')
data
Join Table together

Step 2: Explore Data to search some insight

  • Start with Vacancy Rate : The vacancy rate is the percentage of all available units in a rental property, that are vacant or unoccupied at a particular time.
#plot Vacancy Rate (%)_1, Vacancy Rate (%)_2, Vacancy Rate (%)_3, Vacancy Rate (%)_Bac in one graph
data[['Vacancy Rate (%)_1','Vacancy Rate (%)_2','Vacancy Rate (%)_3','Vacancy Rate (%)_Bac']].plot()
plt.title('Vacancy Rate (%)')
plt.show()
Period 2015 -2023

Looking at the vacancy rates over time, we see that the Vacancy Rate (%)_Bac category has the lowest minimum of 0.7% (in 2023), while the Vacancy Rate (%)_2 category has the highest maximum of 7.9% (in 2016).

#plot Average Rent ($)_1, Average Rent ($)_2, Average Rent ($)_3, Average Rent ($)_Bac in one graph 
data[['Average Rent ($)_1','Average Rent ($)_2','Average Rent ($)_3','Average Rent ($)_Bac']].plot()
plt.title('Average Rent ($)')
#plot trend line
z = np.polyfit(data.index, data['Average Rent ($)_1'], 1)
p = np.poly1d(z)
plt.plot(data.index,p(data.index),"r--")
z = np.polyfit(data.index, data['Average Rent ($)_2'], 1)
p = np.poly1d(z)
plt.plot(data.index,p(data.index),"r--")
z = np.polyfit(data.index, data['Average Rent ($)_3'], 1)
p = np.poly1d(z)
plt.plot(data.index,p(data.index),"r--")
z = np.polyfit(data.index, data['Average Rent ($)_Bac'], 1)
p = np.poly1d(z)
plt.plot(data.index,p(data.index),"r--")

plt.show()
Line Chart of Average Rent price

On average rent prices for all types of rooms are moving in the same direction. In the future, the rent will be high due to the fact that it is on a rise🥲💸.

#plot bar interest rate 
data['Interest'].plot(kind='line')
plt.title('Interest Rate (%)')
plt.show()
Interest rates in Canada for 2015–2023 followed a U-shaped pattern.
  • Find Correlation of the data
#Make a correlation matrix between Average Rent ($)_1, Average Rent ($)_2, Average Rent ($)_3, Average Rent ($)_Bac and Vacancy Rate (%)_1, Vacancy Rate (%)_2, Vacancy Rate (%)_3, Vacancy Rate (%)_Bac,ineterest rate
corr = data[['Average Rent ($)_1','Average Rent ($)_2','Average Rent ($)_3','Average Rent ($)_Bac','Vacancy Rate (%)_1','Vacancy Rate (%)_2','Vacancy Rate (%)_3','Vacancy Rate (%)_Bac','Interest']].corr()
sns.heatmap(corr, annot=True)
plt.title('Correlation Matrix')
plt.show()

Refer the Correlation matrix :
An extremely strong positive correlation can be seen between the variables that indicate the various rent amounts. It would appear that there is a direct and favourable association between the average rentals that have been collected across a variety of time periods.
There are moderate-to-high positive relationships between the vacancy rates 1, 2, 3, and Bac. These correlations are positive. It would indicate that there is a connection between the rates of vacancies that existed during different time periods.
Relationships between interest and all other variables are characterized by negative correlations. Whenever there is an increase in the rate of interest, it is observable that there is a propensity for average rentals to decrease, while vacancy rates tend to increase.

#calculate the slope of Average Rent ($)_1, Average Rent ($)_2, Average Rent ($)_3, Average Rent ($)_Bac which one has the highest slope
slope1 = (data['Average Rent ($)_1'].loc[2023] - data['Average Rent ($)_1'].loc[2015])/(2023-2015)
slope2 = (data['Average Rent ($)_2'].loc[2023] - data['Average Rent ($)_2'].loc[2015])/(2023-2015)
slope3 = (data['Average Rent ($)_3'].loc[2023] - data['Average Rent ($)_3'].loc[2015])/(2023-2015)
slopeBac = (data['Average Rent ($)_Bac'].loc[2023] - data['Average Rent ($)_Bac'].loc[2015])/(2023-2015)

print('Slope of Average Rent ($)_1: ', slope1)
print('Slope of Average Rent ($)_2: ', slope2)
print('Slope of Average Rent ($)_3: ', slope3)
print('Slope of Average Rent ($)_Bac: ', slopeBac)


#plot slope1, slope2, slope3, slopeBac
#size
plt.figure(figsize=(10,5))
plt.bar(['Average Rent ($)_1','Average Rent ($)_2','Average Rent ($)_3','Average Rent ($)_Bac'], [slope1, slope2, slope3, slopeBac])
plt.title('Slope of Average Rent ($)')
plt.show()
Slope of Average Rent price 2015–2023

Slope Average Rent Price of 3 Bedrooms is the highest value , which means the price of 3 bedrooms is changed more than each other and Average Rent price of Bachelor is the lowest.

Step 3: Try to make Prediction model

Linear Regression

Once we go through our data, it is necessary to transform our target Average Rent ($)_1 into a feature. I will utilize a method that involves creating a lag by shifting our data. Thus, we will utilize the past average rent to forecast the present situation.

create Average Rent ($)_1_lag column
data['Average Rent ($)_1_lag'] = data['Average Rent ($)_1'].shift(1)
#create Average Rent ($)_2_lag column
data['Average Rent ($)_2_lag'] = data['Average Rent ($)_2'].shift(1)
#create Average Rent ($)_3_lag column
data['Average Rent ($)_3_lag'] = data['Average Rent ($)_3'].shift(1)
#create Average Rent ($)_Bac_lag column
data['Average Rent ($)_Bac_lag'] = data['Average Rent ($)_Bac'].shift(1)
#create Vacancy Rate (%)_1_lag column
data['Vacancy Rate (%)_1_lag'] = data['Vacancy Rate (%)_1'].shift(1)
#create Vacancy Rate (%)_2_lag column
data['Vacancy Rate (%)_2_lag'] = data['Vacancy Rate (%)_2'].shift(1)
#create Vacancy Rate (%)_3_lag column
data['Vacancy Rate (%)_3_lag'] = data['Vacancy Rate (%)_3'].shift(1)
#create Vacancy Rate (%)_Bac_lag column
data['Vacancy Rate (%)_Bac_lag'] = data['Vacancy Rate (%)_Bac'].shift(1)
#create Interest_lag column
data['Interest_lag'] = data['Interest'].shift(1)

data = data.dropna()
data
#correlation Average Rent ($)_1 add Average Rent ($)_1_lag column
corr = data[['Average Rent ($)_1','Average Rent ($)_1_lag']].corr()
sns.heatmap(corr, annot=True)
plt.title('Correlation Matrix')
plt.show()

It is evident that there is a significant connection between the events of the previous day and the events of today. Let’s imagine the correlation to gain a clearer understanding.

#scatter plot Average Rent ($)_1 and Average Rent ($)_1_lag
plt.scatter(data['Average Rent ($)_1'], data['Average Rent ($)_1_lag'])
plt.title('Average Rent ($)_1 vs Average Rent ($)_1_lag')
#plot trend line
z = np.polyfit(data['Average Rent ($)_1'], data['Average Rent ($)_1_lag'], 1)
p = np.poly1d(z)
plt.plot(data['Average Rent ($)_1'],p(data['Average Rent ($)_1']),"r--")

plt.show()

1 Bedroom prediction analysis by using the Regression model

#make time series regression model to predict Average Rent ($)_1 in future
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

X = data[['Average Rent ($)_1_lag']]
y = data['Average Rent ($)_1']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

#plot prediction
plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, y_pred, color='blue', linewidth=3)
plt.title('Prediction')
plt.show()

#show model score
model.score(X_test, y_test)

#show model coefficient
model.coef_

#show model intercept
model.intercept_

# show regression equation
print('y = ', model.coef_[0], 'x + ', model.intercept_)

#predict Average Rent ($)_1 if Average Rent ($)_1_lag = 1222
model.predict([[1222]])

  • Prediction Price Rent for Year 2023
  • y = 1.048917970941711 x + -10.785927860181346
    predict Average Rent ($)_1 if Average Rent ($)_1_lag = 1222
    y= 1270.99183263

Try to use OLS Regression

#check p-value,t-value and r-square
import statsmodels.api as sm
X = data[['Average Rent ($)_1_lag','Interest_lag','Vacancy Rate (%)_1_lag']]
y = data['Average Rent ($)_1']
X2 = sm.add_constant(X)
est = sm.OLS(y, X2)
est2 = est.fit()
print(est2.summary())
interest rate rate is insignificant

We try to drop Interest rate from model

# Make Average Rent ($)_1 prediction in future using linear regression by adding Vacancy Rate (%)_1 
Xl_NotI = data[['Average Rent ($)_1_lag','Vacancy Rate (%)_1_lag']]
yl_NotI = data['Average Rent ($)_1']

X_train, X_test, y_train, y_test = train_test_split(Xl_NotI, yl_NotI, test_size=0.2, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

#plot prediction
plt.scatter(X_test['Average Rent ($)_1_lag'], y_test, color='black')
plt.plot(X_test['Average Rent ($)_1_lag'], y_pred, color='blue', linewidth=3)

plt.title('Prediction')
plt.show()

#show model score
model.score(X_test, y_test)

#show model coefficient
model.coef_

#show model intercept
model.intercept_

# show regression equation model
# y = average rent = 0.96 * average rent lag + 0.01 * vacancy rate + 0.01

print('y = ', model.coef_[0], '*average rent1 lag + ', model.coef_[1], '*Vacancy rent1 lag + ', model.intercept_)

#predict Average Rent ($)_1 if Average Rent ($)_1_lag = 1222 and Vacancy Rate (%)_1 = 2.8
model.predict([[1222,2.8]])

y = 1.075924089473512 *average rent1 lag + -12.90963737360168 *Vacancy rent1 lag + 8.862337542442447

  • predict Average Rent ($)_1 if Average Rent ($)_1_lag = 1222 and Vacancy Rate (%)_1 = 2.8

y= $1287.49459023

Check New model with OLS method again

#check p-value,t-value and r-square
import statsmodels.api as sm
X = data[['Average Rent ($)_1_lag','Vacancy Rate (%)_1_lag']]
y = data['Average Rent ($)_1']
X2 = sm.add_constant(X)
est = sm.OLS(y, X2)
est2 = est.fit()
print(est2.summary())
All P-Value is significance

Similar models were developed for the 2-bedroom, 3-bedroom, and bachelor rental types, all of which showed strong predictive power when incorporating the previous year’s average rent and the current vacancy rate

2 Bedrooms

# Make Average Rent ($)_2 prediction in future using linear regression by adding Vacancy Rate (%)_2
Xl2 = data[['Average Rent ($)_2_lag','Vacancy Rate (%)_2_lag']]
yl2 = data['Average Rent ($)_2']

X_train, X_test, y_train, y_test = train_test_split(Xl2, yl2, test_size=0.2, random_state=0)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

#plot prediction
plt.scatter(X_test['Average Rent ($)_2_lag'], y_test, color='black')
plt.plot(X_test['Average Rent ($)_2_lag'], y_pred, color='blue', linewidth=3)


plt.title('Prediction')
plt.show()

#show model score
model.score(X_test, y_test)

#show model coefficient
model.coef_

#show model intercept
model.intercept_

# show regression equation model
# y = average rent = 0.96 * average rent lag + 0.01 * vacancy rate + 0.01

print('y = ', model.coef_[0], '*average rent2 + ', model.coef_[1], '*Vacancy rent2 + ', model.intercept_)

#predict Average Rent ($)_2 if Average Rent ($)_2 = 1463 and Vacancy Rate (%)_2 = 2.4
model.predict([[1463,2.4]])

y = 1.0754616635059382 *average rent2 + -14.247934526854305 *Vacancy rent2 + 7.7857480476772025

  • predict Average Rent ($)_2 if Average Rent ($)_2 = 1463 and Vacancy Rate (%)_2 = 2.4

y = $1546.99111889

3 Bedrooms

y = 1.0975165394607072 *average rent3 + -17.75455468560559 *Vacancy rent3 + -6.359263807561092

  • predict Average Rent ($)_3 if Average Rent ($)_3 = 1579 and Vacancy Rate (%)_3 = 1.7

y = $1696.43660904

Bachelor room

y = 1.0559348158113775 *average rentBac + -4.616822111022729 *Vacancy rentBac + 5.1777635791374905

  • predict Average Rent ($)_Bac if Average Rent ($)_Bac = 972 and Vacancy Rate (%)_Bac = 3.2

y = $1016.77257379

Comparing the average actual price with the average predicted price from the model

Rent price Prediction 2023 (using 2022 data)

1 Bedroom : different around 13%
2 bedrooms: different around 9%
3 Bedrooms : different around 6%
Bachelor room : different around 18%

Averange rent price prediction for 2024

For 2024 Average Prediction Price, compare with Actual average price 2023 based on the model :
1 Bedroom : $1546.7 different from 2023 ($1463), around 5.7%
2 Bedroom : $1807.8 different from 2023 ($1691), around 6.5%
3 Bedroom : $1944.8 different from 2023 ($1794), around 8.4%
Bachelor room : $1271.2 different from 2023 ($1202), around 5.7%

Here is my perspective and an example method based on the dataset that can help create a model and make price predictions. In the real world, there are numerous factors that are related to the rental prices.

That’s all I have to say about this blog. I appreciate you reading it, and perhaps you learned something. Until next time 👋

Photo by Kyler Nixon on Unsplash

--

--

Chanon Sumpantapong
Chanon Sumpantapong

Written by Chanon Sumpantapong

Business strategist | Design Engineer | Data analysis Engineer | interested in finance 💵 & Data journalism 📊

No responses yet