Introduction to Large Language Models
Large Language Models can be used for various tasks, including regression tasks. In this article, we will explore how these models can be used to predict the points of a dwelling based on its description.
Background
In the Netherlands, there are rules that determine the maximum rent allowed for a dwelling based on its properties and quality. These rules are complex and can be found on the Huurcommissie’s website. To simplify the process, an open-source Python package called woningwaardering was built to calculate the amount of points a house is worth based on these rules.
What is Woningwaardering?
Woningwaardering is a point system in which a dwelling is awarded points based on its properties and qualities. The maximum rent for a dwelling is directly related to the amount of points it is awarded.
Inspiration from Previous Work
Ed Donner fine-tuned a Llama-3.1–8B model to predict Amazon product prices based on their descriptions. This inspired the author to try to predict the points of a dwelling based on its description, instead of using the woningwaardering package.
Data Collection
At the social housing organisation Woonstad Rotterdam, there is a large dataset of information about 60,000 dwellings. Data quality checks were implemented to determine which dwellings have near-perfect quality data using the open-sourced pyspark-testframework package. Of the approximately 50,000 self-contained dwellings, around 25,000 have a near-perfect data quality.
Limitations of the Dataset
Even though the data quality of these 25,000 houses is very good, there are still limitations to the dataset. For example, there are some missing data points that could affect the accuracy of the predictions.
Conclusion
Large Language Models can be used for regression tasks, such as predicting the points of a dwelling based on its description. While there are limitations to the dataset, the use of these models can simplify the process of determining the maximum rent allowed for a dwelling. Further research and development are needed to improve the accuracy of these predictions.
FAQs
- What is woningwaardering?
Woningwaardering is a point system in which a dwelling is awarded points based on its properties and qualities. - What is the purpose of the woningwaardering package?
The woningwaardering package is used to calculate the amount of points a house is worth based on the rules that determine the maximum rent allowed for a dwelling. - How can Large Language Models be used for regression tasks?
Large Language Models can be used to predict the points of a dwelling based on its description, instead of using the woningwaardering package. - What are the limitations of the dataset used in this study?
The dataset has some missing data points that could affect the accuracy of the predictions.