Using hub-and-spoke models for real-time data integration
By Tyukavkina Ekaterina (entiukavkina@edu.hse.ru)
Introduction
In a century of dig data, organizations are facing increasing challenges in effectively managing and utilizing data. Traditional monolithic data architectures are struggling to keep pace with the heterogeneous nature and high velocity of generated data, making it necessary to explore alternative paradigms. A hub-and-spoke model can help organizations to handle growing data.
What is hub-and-spoke model?
The hub and spoke model is a system that resembles a bicycle wheel. Data is shipped to a central location (the hub) and distributed outwards to multiple destinations (the spokes) (Fig. 1).
Figure 1. The template of hub-and-spoke model [1]
By using this model, companies can maximize the efficiency of their data integration by consolidating shipments in one central location before sending them out to their final destination.
This reduces the amount of time and resources needed for data transportation. It is used in various fields: in transportation systems, logistics, marketing, aviation, telecommunications, healthcare, gaming. Although, is it that effective and useful in real-time data integration?
Advantages and disadvantages hub-and-spoke model
When examining data integration from an architectural perspective, it is common to encounter three primary models:
- point-to-point interaction,
- hub-and-spoke,
- publish-subscribe (Fig. 2).
Figure 2. Data integration patterns [2] These models represent architectural patterns for designing and implementing data integration systems. The point-to-point interaction model is a relatively simple approach that involves connecting each system with every other system with which it needs to exchange data (Fig. 2a). While this model is relatively straightforward to implement, it can become increasingly complex and difficult to maintain as the number of systems increases.
The hub-and-spoke model, on the other hand, centralizes data integration through the use of a central hub that acts as an intermediary between all connected systems (Fig. 2b). This model allows for greater control and management over data integration processes, but it can also become a bottle-neck if not properly implemented and scaled [2].
The hub-and-spoke model requires fewer routes than point-to-point model. For a network of n nodes, only n−1 routes are necessary to connect all nodes so the upper bound is n−1, and the complexity is O(n). That compares to the (n*(n−1))/2)which would be required to connect each node to every other node in a point-to-point network.
For instance, in a system with 5 destinations, the spoke–hub system requires only 4 routes to connect all destinations (Fig. 3, the left side), and a true point-to-point system would require 10 routes (Fig. 3, the right side) [2].
Figure 3. Hub-and-spoke model VS Point-to-point model [4]
In terms of data integration, fewer routes means not only increased speed of data delivery, but also fewer lines of code, better managed and understandable architecture, no data duplication, advanced security. A hub-and-spoke offers an opportunity to improve overall system architecture for centralization of the business processes, as well as more cost efficient long-term maintenance.
But hub-and-spoke model has its own minuses. Such kind of integration makes use of just one system which is the central hub, the moment the hub experiences downtime, it disrupts the data flow to all connected spokes.
The publish-subscribe model is a more advanced approach that allows systems to subscribe to specific data events (Fig. 2c). This enables more flexible and scalable data integration, but it can also be challenging to implement and requires specialized infrastructure.
It is important to consider that each model has its own set of advantages and disadvantages and the choice of architecture will depend on the specific requirements and constraints of the organization. Furthermore, best practices and pat-terns such as data quality, data security, and data governance should be taken into account in order to ensure that the integrated data is accurate, consistent and secure [2].
Example of real-time data integration based on hub-and-spoke model
In this paragraph, two examples of hub-and-spoke model implementation for the purpose of real-time data integration are be presented. The first one is a hub-and-spoke model, which was aimed to help developers to cope with the overload of channels and the fragmentation of information coming from their usage, while also increasing their overall situational awareness [5].
Figure 4. The proposed hub-and-spoke model [5]
The second one is a model for organization data governance. In this model, data teams can remain in control of data quality and storage, while end users can flexibly manipulate data without breaching security (Fig. 5). It is crucial to point out, that in terms of data architecture, hubs and spokes are virtual locations, like data warehouses (hubs) and BI tools or other operational tools (spokes). In data governance, hubs and spokes are competencies [6].
Figure 5. Data integration for data governance [6]
Hub teams have advanced competencies to understand and work with data. These teams include data engineers, data scientists, and data analysts. They build and maintain data architectures, develop data models, build pipelines, cleanse data, set rights and accountabilities, and build dashboards.
Spoke teams, for instance, Sales, HR, Customer support, Marketing, are composed of non-technical professionals like marketers, salespeople, recruiters, and accountants. Depending on the style of data governance, their competencies could range from determining metrics, to viewing dashboards created by hub teams, to actually building and managing pipelines. In a bank, for example, where data security is paramount, the competencies of spoke teams would be very limited. Although in an e-commerce or retail business, their competencies may even overlap with those of the hub teams.
Conclusion
The hub-and-spoke model can be implemented not only in real world fields such as transportation or supply chain development, but also it is a great base to use in a digital world: for ETL integrations, data warehouse designing and data governance. The model can increase maintainability, security and data quality of a company data integration. Therefore, it makes a data integration easer to manage and update and lowers cost for its development.
References
1. How To Use the Hub-and-Spoke Model To Scale Agency Services, 2024. https://agencyanalytics.com/blog/hub-and-spoke-model
2. Data Virtualization Enabling Distributed DataArchitectures: Data Fabric and Data Mesh, 2023. https://cspub-ijcisim.org/index.php/ijcisim/article/view/511/491
3. Spoke–hub distribution paradigm. https://en.wikipedia.org/wiki/Spoke%E2%80%93hub_distribution_paradigm
4. Airline Economics: An Essay on International Airline Alliance, 2019. https://www.researchgate.net/figure/Hub-and-Spoke-System-and-Point-to-Point-Network_fig2_331345165
5. A Hub-and-Spoke Model for Tool Integration in Distributed Development, 2016. https://www.researchgate.net/publication/308822996_A_Hub-and-Spoke_Model_for_Tool_Integration_in_Distributed_Development
6. The Data Governance Hub and Spoke Model: Why it Works. https://solutionsreview.com/data-management/the-data-governance-hub-and-spoke-model-why-it-works/