Data Methodologies

In this section of the website, you can familiarize yourself with how we collect the data and how certain calculations in our reports are performed.

The source data for the Custom Property reports is the very same one that is populated in the Raw Property reports, but with the addition of filtering, cleaning and filling missing data. For that reason we are going to split the content in the following format. Please navigate to the part that you are interested in:


I. Raw Property Data

Raw Property Data refers to the data that we store directly in our database in its original form. We fetch the data from the Airbnb website on a daily basis. It includes four sets of data - calendar/bookings, prices, reviews and details of single property. Each dataset is stored in a separate CSV file. The format gives the flexibility to send multiple months data into single file.

Calendar Data

The calendar data file holds the data for whenever a given property was available for booking or not. Each property is checked daily for same-day availability between 19:00 and 20:00 local time. Each host can set different cut off time for same day bookings, but this is beyond our reach. From our experience and observations most of the rooms do have the option for self check-in, which gives the flexibility to the host to allow late night bookings for the same day. If a booking occurs after 20:00 local time, the room may not be marked as occupied. We have left a window of 4 hours (before the next day starts), so we can react in the eventual event of issues.

Another important mark for the calendar data is that we are not able to differentiate between booked and blocked rooms. Hosts have the ability to block their rooms whenever they want and will not accept guests for the given date(s). Since we are not able to see such information, each unavailability to book the room for given date is considered as occupied room.

Price Data

Just like the calendar data, the price data is being collected on daily basis. However, unlike the calendar data, the price data can have missing days between two dates and usual frequency of observations is broken. There are two main reasons for this. First and foremost, the room might be booked for the given date, hence airbnb.com does not show price per night for the given date. The booking might be for a single day or multiple days in a row. Especially in the second case, we might be missing prices for several days in a row. The second reason why given date might be missing price is because of the room being blocked. As we described in the calendar data section above, we cannot differentiate between booked and blocked rooms.

In the price dataset there are a bit more fields than in the calendar dataset. We are trying to collect as much information as possible and analyse the price details of each room. Except the final price that a tenant will pay, we are collecting the data whether there are any discounts or additional taxes, the description for them, the required minimum nights and for how many people is the room. Last but not least, we specify the currency in which we store the data, usually US Dollars. In the last months Airbnb has reduced the transparency of additional charges and often this field can be empty/null.

Lastly, in the price dataset you will notice three price fields - 'price', 'price_total' and 'original_price'. The difference between these three is the following: sometimes the room require minimum nights per stay and in order to have a good comparison, we need to present the data in price per single night. The 'price_total' is the field that gives you the price with the minimum nights required. The 'price' field gives you the price per single night. In the cases of more than 1 night required per stay, we divide the 'price_total' by the 'min_nights' fields. The 'original_price' is filled usually whenever there was some discount applied.

Reviews Data

The reviews dataset is simpler than the price and calendar dataset. It contains the following fields - date, accuracy, checking, cleanliness, communication, location, value, overall, and review_count. Those are all scores that guests can give to the hosts and their room. The scores for each room are checked once per week and this gives the opportunity to spot some trend within the month itself. The reason to check on weekly basis, instead of daily basis, like the calendar and prices, is because the scores of the rooms is not something that fluctuates that much. Also, not every single day a room is being occupied and also not each guest leaves a review. The 'review_count' holds the number of reviews for the lifetime of the room, so between each week you can spot how many reviews were given for that room.

Details Data

The details dataset is the simplest. It is quite small and consists of several rows. However, it holds basic information about the room. In the file you will see fields such as name, title, city, country, street, postcode etc. The dataset also has a field 'active' which specify whether the room is active or not. Usually, non active rooms cannot be found in the website of airbnb, but if you have the room id of inactive room and we have in our database, we still can check what kind of data we have.

The details dataset also contains longitude and latitude of the airbnb room. The coordinates are coming directly from the website of airbnb. We are using those coordinates to get the fields regarding the room physical address, such as 'street', 'suburb', 'housenumber' etc. Have in mind that those may not be 100% accurate. We are using third parties and the so called reverse geocoding to fetch the address information.

II. Custom Property Report

The Custom Property Report is a report that is based on the data from the Raw Property Data. The very same data is taken, filtered and massaged. It builds visualizations and helps provide a clearer understanding of the data. While the Raw Property Data is something that might be more useful to data analysts or people who want to do their analysis, the Property Custom Report is ready to use out of the box for hosts, who can inspect how their rivals perform.

The report comes in two file formats - HTML and PDF. The HTML allows for a bit interaction from user side, for e.g. some of the graphs are zoomable and the hover gives more information, while the PDF report is more suitable for presentation or distributions.

Summary Page

As the name suggests, this page will give you a brief overview for a given room. On this landing page of the report you will find some quick KPIs like - Total Revenue, Occupancy Rate, Average Daily Price, Number of Reviews and Overall Score. The KPIs also include month over month (MoM) information so you can follow the trend. Sometimes this information might be missing. That usually would be the case, if the room has been in our database only for 1 month.

Following the KPIs you will find a section with details of the room, along with its cover photo. Below the listing details you will also find Revenue Trend which goes for the last 6 months. It is possible that we do not have full 6 month history for the given room due to various reasons, so please have in mind that the revenue history might be less. Last but not least, is the Guest Review Breakdown which allows you to see how the overall score of the room is being formed, with each category being graphed.

Calendar Page

On the Calendar Page you will find a graph that represent the availability of the room throughout the month. Each day can have a state of Booked and Available. Just like in the Raw Property Data report, we are not able to make the difference between blocked and occupied room. We simply do not have acceess to such information. For that reason, each unknown, or missing, state for given date is assumed to be occupied.

Sometimes hosts may block their rooms/apartments for prolonged period. In some cases this can be spotted very easily. If the report shows fully booked apartment for extended period of time (like one full month), but there is not even a single review in the last month, most likely there were no tenants at all.

In addition of the graph there is a quick analysis on the calendar. As we like to be as transperant as possible, this analysis is done by AI. This analysis is generated using AI. Specifically, we use the Gemini 2.5 Flash model by Google. The model is provided with the chart and a carefully designed prompt to produce an informative and context-aware interpretation.

Price Page

The price page follows the calendar page and it is built in the same way. You have a graph where you can see the daily price of the property throughout the month, the average price for apartment/room in the very same suburb and the minimum nights required per stay (on the right axis). Again, there is AI analysis generated next to the graph which helps reading the graph in the correct way.

Reviews Page

Last but not least, in the Custom Property Report you will find the Reviews Trend page. Just like the previous two pages, you will find graph and analysis. The graph will allow you to see how the (overall) scores of the checked room/apartment are trending.

III. Properties Update/Addition

In order to keep our database up to date we are checking all properties in the cities we cover. On a regular basis, we are updating the existing properties, by updating their name, activity and other information from the details data. In addition, we are scanning for rooms/apartments that we did not have in our database or rooms/apartments that are new to short-term renting market (new listings on the airbnb website), so we can add them. If you want us to add specific room/apartment or have any additional questions feel free to reach out to us at contact@abnbdata.com.

IV. Booking.com and other similar websites

Currently we are not collecting any data apart from airbnb.com, including booking.com. This can lead to several differences. Perhaps the main one is the price of the room/apartment, hence the earned revenue for the given month. However, from our experience we have noticed that usually whenever a single host list their room/apartment on both websites - airbnb and booking, the price difference is not that big. And usually such discrepency is due to the discounts you get in Booking.com, for e.g. if you have Geniue account. At the end, the host is earning more or less the same.

In terms of the occupancy and availability - the same room/apartment cannot be rented out twice at the same time, so if the room is occupied via airbnb, then it will not be available on booking.com and vice versa.