Note: This homepage is dedicated to my personal, non-professional academic work: research and teaching. For professional inquiries, please visit the agency website.
Dennis Oliver Kubitza
Data Scientist • Researcher • Founder
Hi. I am Dennis. I am a Computer Scientist with a background in statistic and economics. Since 2022 I am a PhD Candidate in Regional Economics & Labor Market Statistics at Maastricht University. In 2026 I founded the Dr. Lisa Baum & Dennis Kubitza GbR, a full service online marketing agency. This page is about me and my research. If you are interested in my agency, visit the agency homepage:
Over the past few years, I have programmed software, gave programming courses for statistical software, analyzed labor market data, and managed research projects.
Fig. 1: The author with his mobile work setup enjoying the mild German summer of 2025.
Data Analysis & Research (2021–2025): Worked at the Federal Institute for Vocational Education and Training (BIBB). I ran statistical economic analyses on labor market data.
Software Development (2019–2021):Worked as a Research Engineer at Fraunhofer IAIS: Built Java connectors, User Interfaces, managed some projects and aquired new ones.
Education (2012 - 2019): I earned an M.Sc. in Computer Science , as well as, B.Sc. degrees in Economics and Mathematics from the University of Bonn.
Tech Skills:I code in R, Python, Java, HTML/CSS/React, and SQL. I also set up and use tools like Docker, Git, WordPress, and different CRM systems.
As I have worked not only in one discipline but in several, my research output is not limited to publications in academic journals, but also includes whitepapers, conference proceedings, and short articles.
Journal Articles
Another one rides the bus—Moving versus commuting decisions during the transition to higher education (with K. Weßling; Review of Regional Research; 2025)
Abstract The present study explores students’ decisions to either relocate or commute to their place of study depending on the availability of student tickets. By leveraging regional variations in the coverage of these subsidized public transport tickets, we explore whether their availability decreases students’ likelihood of moving. We investigate how the importance of subsidized tickets changes with commuting time and how their relevance varies based on students’ financial resources, social backgrounds, and risk attitudes. To do so, we use the MESARAS 2013 (Mobility, Expectations, Self-Assessment, and Risk Attitude of Students) survey, which queries university entrants in the field of economics at seven German universities. We link locational identifiers from the survey with the Google Distance Matrix API to assess commuting times and regional administrative data. Our logistic regression models suggest that the availability of student tickets may decrease the likelihood of moving. The association appears to be less pronounced as commuting time increases and seems to depend on parental academic status as well as students’ budget. Thus, our study offers three contributions: First, we provide policy-relevant evidence on the importance of affordable public transportation. Second, we address (social) inequalities in the moving versus commuting decision. Third, we demonstrate how accurate spatial modeling of commuting times and integration of survey data with external data sources benefits socio-spatial analysis.
Match Me Up Before I Go-Go!—Matching Functions for Spatially Connected VET Labor Markets (forthcoming; 2026)
Abstract Local labor markets in Germany are interconnected through mobile workers who commute or migrate across borders. Therefore, the aggregated number of new matches between employers and employees in one region is influenced by spillovers from neighboring markets, but might also be disproportionately higher due to easier access to suitable jobs in agglomerated areas. Vocational education and training (VET) students in company training, so called apprentices, are younger and therefore less mobile than the general workforce, suggesting that regional spillovers may be less pronounced when looking at VET labor markets. In addition, occupational heterogeneities in the spatial distribution of employers might further amplify or decrease these spillovers. This paper provides the first estimates of spatial matching functions for VET students in general and for multiple profession groups. Using a matching function model with spatially lagged stocks of applicants and vacancies, it analyses efficiency, elasticities, regional influences, and spatial spillovers. To do so, it develops a novel set of spatial weights and ways to measure agglomeration. The findings show that matching efficiency for VET is higher in more agglomerated regions and that measurable spillovers do exist, even for the group of less long-distance mobile adolescents. This underscores the importance of accurate spatial modeling: only indices based on realistic travel times or commuting data show these influences. Finally, this paper finds significantly lower elasticities in the stock of unemployed applicants for regions that are well connected to their surroundings, leading to a more balanced matching process.
Whole Lotta Training—Studying School-to-Training Transitions by Training Artificial Neural Networks (with K. Weßling; forthcoming; 2026)
Abstract At the end of compulsory education, the decision to either continue school or pursue training represents a critical juncture widely analyzed in the social sciences. Choices for or against a career direction are shaped by interdependent individual, institutional, and regional-level factors. To disentangle these relationships, researchers have typically relied on (multilevel) regression techniques, focusing on mediating and moderating effects among distinct predictors. In this study, we employ artificial neural networks (ANNs) to analyze survey data from the German National Educational Panel Study, enriched with regional indicators. Focusing on school-to-vocational training transitions as established research topics in social sciences we build on recent studies that demonstrate how machine learning methods can identify complex, non-linear, and higher-order patterns among explanatory variables. To enhance the interpretability of complex patterns based on ANNs, we employ explainable artificial intelligence (XAI) techniques, specifically Shapley Additive exPlanations (SHAP) and Rule Extraction (RE) algorithms. These methods allow us to identify multiple non-linear interactions within and across analytical levels. We argue that the adoption of ANNs in the social sciences provides the potential to inspire new substantive research questions, generates novel insights into established relationships, and makes complex patterns of influence more accessible.
Conference Proceedings & White Papers
Mobility data space—first implementations and business opportunities (with H.Drees, J.Lipp, S.Pretzsch, C.S. Langdon; ITS World Congress; 2021)
AbstractData spaces are open and decentral ecosystems, which ensure a trustworthy, safe and secure data exchange. Within the mobility sector, trusted data exchange and processing are emerging as key enablers of digitalization and new mobility offerings. Improved services and new business opportunities, such as more seamless travel with intermodal solutions, for example, require a willingness of stakeholders to share data, which, in turn, depends on harmonized and commonly accepted solutions to govern such data exchange. Fortunately, a first implementation, the Mobility Data Space, is untying this knot by adopting data space concepts from the International Data Spaces Association. Core components complement the decentral concept and foster discoverability, accessibility, interoperability, and trustworthiness. In a German project, a minimal viable demonstrator has been implemented, which includes core components, various data sources and first applications. It proves the feasibility of the data space concepts for the mobility sector and paves the way for its business evolution.
GDPR Compliant Multi-Application Data Governance: Enhancing the Open Integration Hub with IDS and SOLID (with M.Böckmann, S.Höffler, J.Knoop, H.Schmidt; ITS World Congress; 2021)
AbstractThere is an enormous value in gathering and processing decentralised data occurrences in traffic and mobility systems. Exchanging and integrating such data requires a trustworthy infrastructure and mechanisms to protect shared information. The commercial usage of personal data often needs to comply with special restrictions like GDPR. Small and Medium Enterprises (SMEs) are challenged with this task, as they rely on external technologies, when processing and collecting information from location-based services, floating car data, etc. Especially in the mobility domain, new data driven Use Cases are unthinkable without including protection worthy personal data. In this contribution we analyse how Data Governance Frameworks like SOLID or the International Data Spaces usage policies can be reused to leverage open-source Data Integration Frameworks like the Open Integration Hub (OIH) for GDPR compliant and trustworthy data processing of sensitive data.
Specification: IDS Meta Data Broker v1.0 (with S Bader, F Bruckner, G Böge, J Langkau, R Nagel, 2020)
Abstract This document describes the minimal features of an IDS Meta Data Broker, as an index-service running in conjunction with an IDS Connector. The description of the IDS Connector itself is not part of this document. This document does not describe data brokerage functionality for the International Data Spaces, but Meta Data Brokerage. This document contains the specification of the IDS Meta Data Broker and acts as a foundation for the Certification criteria for the IDS Meta Data Broker. In addition to the minimal requirements, also two advanced Broker Profiles are described, enhancing the standard Broker functionalities by improved information management and usage policies.
SemanGit: A Linked Dataset from git (with M.Böckmann, D.Graux; International Semantic Web Conference; 2019)
AbstractThe growing interest in free and open-source software which occurred over the last decades has accelerated the usage of versioning systems to help developers collaborating together in the same projects. As a consequence, specific tools such as git and specialized open-source on-line platforms gained importance. In this study, we introduce and share SemanGit which provides a resource at the crossroads of both Semantic Web and git web-based version control systems. SemanGit is actually the first collection of linked data extracted from GitHub based on a git ontology we designed and extended to include specific GitHub features. In this article, we present the dataset, describe the extraction process according to the ontology, show some promising analyses of the data and outline how SemanGit could be linked with external datasets or enriched with new sources to allow for more complex analyses.
Short Articles
Same same but different–Was Regionen mit Ausbildungswünschen und-chancen zu tun haben, und warum das nicht für jede/-n gilt (with K Wessling, J Detemple, D Kubitza, I Loll, N Theuer; Berufsbildung in Wissenschaft und Praxis; 2024)
AbstractDer Beitrag befasst sich mit dem Zusammenhang zwischen Regionen und Ausbildung. Es wird einerseits aufgezeigt, dass nicht alle Merkmale einer Region gleichermaßen wichtig sind, wenn es darum geht zu erklären, was Ausbildungswünsche und-chancen von jungen Menschen beeinflusst. Andererseits wird dargelegt, dass regionale Merkmale nicht für alle jungen Menschen gleichermaßen wichtig sind und ihre Wirkung mit soziodemografischen oder persönlichen Eigenschaften variieren. Der Beitrag zielt darauf ab, Forschung und Praxis für diese selektive Bedeutung von regionalen Merkmalen für Ausbildungswünsche und-chancen zu sensibilisieren, um mit zielgruppenspezifischen (Beratungs-) Angeboten den Herausforderungen am
Towards Semantically Structuring GitHub. (with M.Böckmann, D.Graux; International Semantic Web Conference; 2019)
Abstract With the recent increase of open-source projects, tools have emerged to enable developers collaborating. Among these, git has received lots of attention and various on-line platforms have been created around this tool, hosting millions of projects. Recently, some of these platforms opened APIs to allow users questioning their public databases of open-source projects. Despite of the common protocol core, there are for now no common structures someone could use to link those sources of information. To tackle this, we propose the SemanGit ontology, the first ontology dedicated to the git protocol, which also describes GitHub’s features to show how it is extensible to encompass more git-based data sources.
Course Material
From 2015 to 2020 I gave several programming courses for the Department of Economics of the university of Bonn. The Course Material is published in GitHub and accesible for further usage. Both courses can be tought as a 5-day or 3-day intensive course.