Formal concept lattice for data modelling

In Formal Concept Analysis (FCA), a concept lattice graphically portrays the underlying relationships between the objects and attributes of an information system [1]. This approach has numerous applications, including data mining for machine learning, data analysis, and other fields of data science. In data modeling, this method has not yet found wide application; however, theoretically, it could accelerate and optimize this process.

Formal Concept Analysis has been developed since 1979 as part of applied mathematics based on a mathematization of concept and concept hierarchy. “Concept Lattice” is the central notion of “Formal Concept Analysis”, an area of research which is based on a set-theoretical model for concepts and conceptual hierarchies. Formal concept lattices are described from a mathematical standpoint by the theory of formal languages [2]:

Here, G is the set of objects, M is the set of attributes, relation I ⊆ G × M, such that gIm, holds true if and only if object g has attribute m. K = (G, M, I) is called a formal context.

A formal concept is a pair (A, B): A ⊆ G, B ⊆ M, A′ = B, B′ = A. A is called the formal extent, and B is called the formal intent of the concept (A, B). Concepts, ordered by the relation (A1, B1) ≥ (A2, B2) ⇐⇒ A1 ⊇ A2, form a complete lattice, named the concept lattice B(G, M, I). The operator (·) ′′ is the closure operator (idempotent, monotonic, extensive).

For example, a formal concept lattice for geometric figures with the following attributes can be depicted: a - exactly 3 vertices, b - exactly 4 vertices, c - has a right angle, d - all sides are equal.

Figure 1. Table of correspondences between objects and attributes.

Figure 2. Formal concept lattice based on Figure 1

Formal concept lattices have other terms and possibilities; however, for a general understanding of the approach, the basic definitions and examples provided are quite sufficient.

Data modeling is the process of creating an abstract data structure to represent information and support business processes. It includes defining data types, relationships among data, and rules that the data must follow. Data modeling is a key tool for ensuring accuracy, integrity, and efficiency in data handling. It consists of three stages [3].

  1. Conceptual Modeling: At this level of modeling, the primary entities, attributes, and relationships between them are defined. The aim of conceptual modeling is to create a high-level map of data that can be easily understood by direct data users.
  2. Logical Modeling: At this level of modeling, more detailed data structures such as tables, fields, keys, and indexes are defined. Logical modeling also includes defining constraints on the data, such as uniqueness constraints and referential integrity.
  3. Physical Modeling: At this level of modeling, the actual data structures that will be used for storing and processing information in a database or other data storage system are defined. Physical modeling includes defining storage characteristics, such as block and segment sizes, and optimizing data access processes.

Several approaches to data modeling are also distinguished. Below are just some of them.

  1. Hierarchical data models represent “one to many” relationships in a tree-like format. In this type of model, each record has a single root or parent element, which is associated with one or more child tables.
  2. Relational models: Relational modeling does not require a detailed understanding of the physical properties of the data storage used. In it, data segments are combined using tables, simplifying the database.
  3. In ER data models, diagrams are used to represent relationships between entities in the database. The ER model is a formal construct that does not prescribe any graphical means of its visualization.

Although it seems that concept lattices are closer to conceptual modeling, this is not the case: for data conceptual modeling, it is necessary to identify entities, each of which needs to define attributes. That is, the relationship already exists, and it does not need to be additionally defined and visualized.

Formal concept lattices are mainly used to search for semantically similar objects by combining them based on certain attributes. This is very similar to identifying keys in tables and building relationships between them, which refers to the logical stage of data modeling.

In the case of identifying foreign keys, it is necessary to pay attention to the relationships between tables, which is easiest to depict graphically. To do this, you need to:

  1. Ordered List Item Define tables
  2. Denote table fields
  3. Highlight their primary keys
  4. Make tables – objects, and their primary keys – attributes
  5. Compile a table of correspondences between objects and attributes
  6. In the table, mark for each object whether such an attribute is necessary
  7. Compile a conceptual lattice according to the filled table, combining cells by marks
  8. In objects for which the marked attribute is not a primary key, this attribute will be a foreign key.

The most significant drawbacks of this method are high labor intensity and a very small advantage over standard approaches to data modeling. The edges of the lattice for small data structures in standard databases rather slow down the work than accelerate it since in standard approaches, for example, in ER-diagrams, clarity will not be hindered by a large amount of data.

The advantage is the grouping of data, that is, in large structures, it will be easier to navigate with the help of a lattice, since everything necessary for decision-making will be nearby and provide the greatest clarity.

Despite the convenience of applying formal concept lattices for clustering and classifying data, they are poorly suited for data modeling, as data modeling itself does not assume a separate set of attributes and objects, but implies the identification of its attributes for each of the objects. And this does not depend on the chosen type of data modeling. However, for designing large data structures, compiling concept lattices by keys will be quite convenient in terms of compactness and defining relationships between tables.

1. Rudolf Wille (1992). CONCEPT LATTICES AND CONCEPTUAL KNOWLEDGE SYSTEMS. https://www.sciencedirect.com/science/article/pii/0898122192901207 2. Shokoofeh Ghorbani (2015). A Note On Vague Formal Concept Lattice. https://www.sid.ir/FileServer/SE/341E20154673 3. André Ribeiro, Afonso Silva, Alberto Rodrigues da Silva (2015). Data Modeling and Data Analytics: A Survey from a Big Data Perspective. https://www.researchgate.net/publication/288872507_Data_Modeling_and_Data_Analytics_A_Survey_from_a_Big_Data_Perspective 4. Daria Ryzhova, Sergei Obiedkov. Formal Concept Lattices as Semantic Maps https://ceur-ws.org/Vol-1886/paper_10.pdf 5. Dmitriy Ignatov (2015). Introduction to Formal Concept Analysis and Its Applications in Information Retrieval and Related Fields https://www.researchgate.net/publication/287483929_Introduction_to_Formal_Concept_Analysis_and_Its_Applications_in_Information_Retrieval_and_Related_Fields