Mathematical foundations for UML2 semantics

By Kuznetsov Mikhail (makuznetsov_4@edu.hse.ru)

Design and specification are an integral part of the development of any software product, which cannot be done without modeling languages. One of these is UML (The Unified Modeling Language). Designed to meet real-world engineering needs, the language allows you to reach a new level of abstraction, providing a common understanding based on models and the physical world. Thanks to this, this approach has a clear advantage over code-based. Therefore, research in this area is extremely relevant. The first version of UML was introduced a long time ago, but had a number of serious problems at its core. For example, for SD (Sequence Diagram), their ambiguity due to the lack of formal semantics stands out [1]. Version 2 of UML had to fix this kind of problem and many others - more mathematics in semantics, more formalism. A lot of work was done, as a result of which it was possible to obtain a language that corrects the shortcomings of the previous version. UML2 has incorporated achievements based on a large body of research and practice. However, have all the problems been fixed? And has the semantics of the language become completely formal? This paper will address this issue.

The key change in UML2 affected the processing of abstract syntax and, in particular, the semantics of the language. A lot of work has been done to formalize and introduce mathematical foundations. Among the changes we can highlight the Activity diagram, the semantics of which are now similar to colored Petri-nets. In the previous version of UML, diagrams were defined as state machine interpretations, which some call a compromise [2]. If we talk about changes in the metamodel, we can see that it has been significantly redesigned. It is now characterized by its enhanced purity and completeness. But perhaps one of the most important is that now elements of the notation can be combined with much fewer restrictions compared to the previous version [3]. Speaking of changes in Sequence Diagrams (SD), we immediately notice the ambiguity of semantics in UML1.x, the obvious difference in the interpretation of SD from MSC (Message Sequence Charts) and conflicts when reading the diagram [1]. The new version contains changes to the abstract definition of SD. Now the semantics of SD are similar to those contained in MSC. The theoretical basis of UML 2.0 semantics is partial order theory [1].

The introduction of mathematical foundations into semantics and the general direction towards formalizing UML has a beneficial impact. The use of strict formalisms and abstractions makes it possible to improve the verification of constructed models, as well as introduce automation (for example, code generation) [3]. The chosen formalisms open the door to the use of existing analysis tools, which has a beneficial effect on the use of the language in many tasks. Also, from the advantages obtained in the new version of the language, researchers highlight the high level of suitability for specifications that will be used by engineers during development [2]. In addition, the mathematical foundations and changes in semantics mean that UML2 is now easier to scale [2]. Innovations in the language make it possible to model complex systems of various types. In the meantime, it is important to point out that some researchers and users note the extreme complexity and volume of the language, which will be discussed in the next paragraph, but changes in semantics were accompanied by the development of modularization of the language into independent sublanguages. Also, restrictions and semantic specifications can be added to the language through the use of Profiles [2]. Thanks to this, the language can be customized for your own task, which, of course, will not solve the key problems, but it will make the language easier in the area used.

Updating such a popular modeling language as UML could not avoid numerous criticisms. Although the most important changes in semantics were actually presented, some researchers note that the language does not contain semantics at all, which gives rise to a separate heated debate [2]. It is noted that in real development programmers use UML simply ignoring its semantics. In addition, despite the presence of formal semantics, there is a whole range of interpretations, as well as duplication and synonymy [4], which leads to an increase in vocabulary and duality. Steve Cook, in his discussion [2], gives an example of basic notations that have not been formalized and the meaning of which cannot be clearly defined. In fact, this arises due to a kind of obviousness of some constructions that are very difficult to formalize [4]. Their formalization leads to cumbersome and complex models. And the lack of widespread formalization and ambiguity complicates the use of the advantages of UML2 in automated tools [3, 5], and together with the volume of models also complicates their management [4]. Some studies also show the particular difficulty in understanding and learning the language due to its duality, volume, and other problems of UML [6]. Some even talk about language overload [7]. Also, researchers especially note that the most important problems of the language arise from the actual absence of semantics for a large part of modeling structures [2,4].

Undoubtedly, the formalization of UML in version 2 only benefited him. Important work has been done to rework the semantics of the language. The presence of discussion is only a natural outcome of the developmental progress in the right direction, as more and more researchers are attracted to this important area. The increasing popularity of a language leads to the formation of new requirements for it and to criticism, which allow for improvements and evolutionary improvements in semantics. In my opinion, mathematics in UML2 semantics, despite obvious shortcomings, is the strongest basis not only for design, but also for subsequent automation. Many of the problems illustrated are solvable. The main thing is the correct use of this powerful tool. Personally, UML with its semantics is the basis for specifying the language with profiles in a certain area (also the use of Frameworks). Meanwhile the volume of the language and the high entry threshold can be offset by the creation of an intermediate, non-strictly intuitive language that can be iteratively specified by UML details.

  1. B. Henderson-Sellers, “UML - the Good, the Bad or the Ugly? Perspectives from a panel of experts.,” Softw. Syst. Model., vol. 4, no. 1, pp. 4–13, 2005;
  2. H. Störrle, “Semantics and Verification of Data Flow in UML 2.0 Activities,” Electr. Notes Theor. Comput. Sci., vol. 127, no. 4, pp. 35–52, 2005;
  3. D. Diskin, “Mathematics of UML: Making the Odysseys of UML less dramatic,” in Baclawski, K. and KilovPractical, H. (Eds.), Foundations of business system specifications, 2003, pp. 145–178;
  4. M. Broy, M. Crane, J. Dingel, A. Hartman, B. Rumpe, and B. Selic, “2nd UML 2 Semantics Symposium: Formal Semantics for UML,” in Models in Software Engineering, 2007, pp. 318—323;
  5. K. Siau and P.-P. Loo, “Identifying Difficulties in Learning Uml.,” IS Management, vol. 23, no. 3, pp. 43–51, Dec. 2008;
  6. K. Berkenkötter, “Using UML 2.0 in Real-Time Development. A Critical Review.” Accessed: Oct. 22, 2023. [Online]. Available: https://www-verimag.imag.fr/EVENTS/2003/SVERTS/PAPERS-WEB/04-Berkenkoetter-UMLRT-critic.pdf.