Przewidywanie cen mieszkań w Niemczech z wykorzystaniem modeli uczenia maszynowego i metod eksploracji danych

Main Article Content

Chong Dae Kim
Nils Bedorf

Abstrakt

Przewidywanie cen nieruchomości jest popularnym problemem w dziedzinie uczenia maszynowego i często przedstawianym w literaturze. W przeciwieństwie do innych podejść, które koncentrują się na rynku amerykańskim, niniejszy artykuł bada największy niemiecki zbiór danych dotyczących nieruchomości, zawierający ponad 1,5 mln unikatowych próbek i ponad 20 cech. W tym artykule wdrażamy i porównujemy różne modele uczenia maszynowego pod względem wydajności i możliwości interpretacji, aby uzyskać wgląd w najważniejsze

Downloads

Download data is not yet available.

Article Details

Jak cytować
Kim, C. D., & Bedorf, N. (2024). Przewidywanie cen mieszkań w Niemczech z wykorzystaniem modeli uczenia maszynowego i metod eksploracji danych. Kwartalnik Nauk O Przedsiębiorstwie, 71(1), 107–122. https://doi.org/10.33119/KNoP.2024.71.1.7 (Original work published 30 marzec 2024)
Dział
Dział główny

Bibliografia

Bedorf N. [2021], XAI–Modellagnostische Verfahren zur Erklärbarkeit von Machine Learning Algorithmen, mimeo.

Bundesamt für Justiz [2019], Federal Data Protection Act, https://www.gesetze-im-internet.de/englisch_bdsg/englisch_bdsg.html (accessed: 9.10.2021).

Chen T., Guestrin C. [2016], XGBoost: A Scalable Tree Boosting System, DOI: 10.1145/2939672.2939785.

Choi S. [2009], Performance evaluation of RANSAC family, British Machine Vision Conference, BMVC, London, UK, September 7–10, DOI: 10.5244/C.23.81.

Daniel S. [2021], Difference between RMSE and RMSLE, https://www.datascienceland.com/blog/difference-between-rmse-and-rmsle-656/ (accessed: 4.10.2021).

Empirica ag [2019], General Information, https://www.empirica-institut.de/thema/regionaldatenbank/datenbank-regionaldaten/ (accessed: 9.10.2021).

Fawcett A. [2021], Data Science in 5 Minutes: What is One Hot Encoding? https://www.educative.io/blog/one-hot-encoding (accessed: 22.09.2021).

Federal Statistical Office Germany [2021], Public datasets, https://www-genesis.destatis.de/genesis/online (accessed: 18.09.2021).

Géron A. [2019], Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition – O’Reilly, ISBN: 9781492032649.

Gonzalez S., Garcia S., Del Ser J., Rokach L., Herrera F. [2020], A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities, DOI: https://doi.org/10.1016/j.inffus.2020.07.007.

He L., Zhang H. [2018], Kernel K-Means Sampling for Nyström Approximation, DOI:10.1109/TIP.2018.2796860.

Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., Ye Q., Liu T. [2017], LightGBM: A Highly

Efficient Gradient Boosting Decision Tree, NIPS.

Laerd Statistics [2021], Pearson correlation, https://statistics.laerd.com/statistical-guides/pearsoncorrelation-coefficient-statistical-guide.php (accessed: 12.09.2021).

Li L. [2018], CMU, Massively Parallel Hyperparameter Optimization, https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/ (accessed: 2.10.2021).

Magiya J. [2019], Kendal Rank Correlation Explained, https://towardsdatascience.com/kendallrank-correlation-explained-dee01d99c535 (accessed: 20.09.2021).

OpenStreetMap contributors [2017], Planet dump, https://www.planet.osm.org (accessed: 9.10.2021).

Pargent F. [2021], Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features, arXiv:2104.00629 [stat.ML].

Park B., Bae J. K. [2015], Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data, DOI: https://doi.org/10.1016/j.eswa.2014.11.040.

Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay E. [2011], Scikit-learn: Machine learning in Python, “Journal of Machine Learning Research”, no. 12 (Oct), pp. 2825–2830.

Pow N., Janulewicz E., Liu L. [2014], Applied Machine Learning Project 4 Prediction of real estate property prices in Montreal.

RUser4512 [2018], Computational complexity of machine learning algorithms, https://www.thekerneltrip.com/machine/learning/computational-complexity-learning-algorithms/ (accessed: 4.09.2021).

Seger C. [2018], KTH, EECS, An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing, OAI:DiVA.org:kth-237426.

Shashanka M. [2019], What is a Pipeline in Machine Learning? How to create one? https://medium.com/analytics-vidhya/what-is-a-pipeline-in-machine-learning-how-to-create-one-bda91d-0ceaca (accessed: 14.09.2021).

The Economist [2017], On almost every indicator, Germany’s south is doing better than its north, https://www.economist.com/kaffeeklatsch/2017/08/20/on-almost-every-indicator-germanyssouth-is-doing-better-than-its-north (accessed: 25.09.2021).

Viktorovich P. A., Aleksandrovich P. V., Leopoldovich K. I., Vasilevna P. I. [2018], Predicting Sales Prices of the Houses Using Regression Methods of Machine Learning, DOI: 10.1109/RPC.2018.8482191.

Wu J. Y. [2017], Housing Price prediction Using Support Vector Regression, DOI: https://doi.org/10.31979/etd.vpub-6bgs.

Yang T., Li Y., Mahdavi M., Jin R., Zhou Z. [2012], Nystroem Method vs Random Fourier Features: A Theoretical and Empirical Comparison, Advances in Neural Information Processing Systems.

Yeo I. K., Johnson R. A. [2000], A new family of power transformations to improve normality or symmetry, “Biometrika”, vol. 87 (4), pp. 954–959.

Zhang C., Ma Y. [2012], Ensemble Machine Learning, Methods and Applications, Springer, ISBN: 978-1-4419-9326-7.