Commercial real estate professionals are increasingly looking for new ways to gain competitive advantages as tools and approaches adapt with the growth of digital technology. One of the biggest challenges facing firms is the sheer amount of information at their disposal. There is no shortage of data providers, listing sites, and eager brokers willing to share property data, but as most have experienced, the information from these sources can often be incorrect.
With this in mind, I want to provide a concise overview of how Enodo aggregates the breadth of information available in the public domain, cleans out bad data, and delivers insights that improve the analysis process. I'll specifically try to address the top three questions we typcially receive from real estate professionals:
- Where does your data come from?
- How do you determine "the market"?
- How do you know the rent premium is coming from that specific amenity?
1. The Data Pipeline
In order to provide insight in every market throughout the U.S., Enodo collects detailed data on property characteristics and amenities, market rents, and unit availabilities from 3 different property management software integrations, 10 different rental listing sites, and over 5,000 individual property websites.
Although data on amenities, square footage of units, etc. doesn't often change, our data pipeline pulls updated rent and unit availability data on a weekly basis for approximately 2 million multifamily properties, covering every single market throughout the country.
But the data we receive from listing sites isn't always perfect - it can be outdated, have the occasional key stroke or "fat finger" data entry error, or it may represent short term leases (which are often priced much higher) from revenue management software.
To address this, we built a series of algorithms to remove outliers and bad data, and then combine the remaining good data to train our amenity premium and price prediction algorithms.
For example, if our cleaning algorithms detect a price is 2.5 standard deviations above or below the average for the same unit type in the same building, we throw out that data. In addition, we analyze in detail whether chunk and per square foot rent for each listing are reasonable based on property characteristics and average market rents, and we automatically compare new data to historical data for the same property to detect whether a rental listing was inadvertently priced incorrectly.
After the listing data is thoroughly cleaned and the outliers have been removed, we use our parsing algorithms to pick out characteristics like building and unit amenities, time on market, floor, security deposit, etc. This provides us with detailed data on the supply of apartment units available in each market, which we can then aggregate at the property level and analyze.
2. Dynamic Market Clustering
The detailed supply side data we collected from the data pipeline is then geographically joined in our database with demand side (demographic) data from the Environmental Systems Research Institute (ESRI) at the census tract level. We primarily utilize Tapestry Segmentation data from ESRI, which integrates consumer traits with residential characteristics to identify markets and classify US neighborhoods.
Once these data sets have been joined, Enodo’s clustering algorithm utilizes both supply and demand side data to intelligently define market areas. This process actually happens live on the platform. Starting from the census tract within which a subject property is situated, we compute a statistical similarity score for each adjacent census tract based on the property and market characteristics of those tracts, and select the most comparable adjacent tracts from which to form a cluster.
The algorithm then looks to the next layer of adjacent tracts, and continues adding census tracts sequentially until sufficient data is accumulated to calculate rent and incremental amenity values. The data from census tracts are joined together to form markets based on the similarity of their supply and demand characteristics.
3. Predictive Model Training
Within these market clusters, Enodo trains a machine learning model to predict market rent and amenity premiums based on the demographic characteristics of the market and economic characteristics of the multifamily housing supply within that newly defined market area. There are often tens of thousands of units to utilize for model training.
By intelligently sub sampling properties in each cluster, Enodo's algorithm can determine the incremental impact of year built, number of bedrooms, number of bathrooms, and each individual amenity by holding everything else about the property and market constant.
The best way to describe how this process works is to imagine that we divide the market value of a particular unit into the value of the neighborhood, the property, and the unit itself, then calculate the proportion of each.
Neighborhood variables include the demographic and location amenity data. There is a certain value associated with just having a unit in a particular area, regardless of the amenities or characteristics at the building and unit levels. Enodo calculates the value of just being in a market, and allocates the value among each market variable.
Property variables include things like the year built, property type, number of units,and building amenities. There is a certain value associated with just being in a particular property, whether you are in the worst unit or the best unit within that property. The portion of market value not defined by the market is allocated to the building.
Unit variables include the size of the unit, number of beds/baths, and unit amenities. The final portion of the value comes from the relative competitiveness of each particular unit to other units in the same building.
When all three levels are taken into account, a complete picture of value is generated. This is the core of the Enodo platform.
Don't take our word for it though - schedule a demo below to see Enodo in action: