====== Future of programming languages: 4GL vs 5GL vs Natural Languages ====== By Maksim Kuzmin ===== Introduction ===== The concept of 4th generation programming languages has been introduced in 1981 by James Martin as a reference to non-procedural, high-level languages describing a system rather than writing the code for it directly. Fifth-generation languages are an unrelated concept, referring to languages that attempt to make the computer solve a problem with a given formal description without the programmer directly developing an algorithm for the problem. Both of those types of languages have an aim of making programming more accessible to non-programmers, so they can also be compared with natural language used as a prompt for an LLM that generates code. The aim of making computer programming more accessible to non-programmers has been known for almost as long as there has been computer programming itself. High level programming languages like FORTRAN, LISP and ALGOL were thought to "virtually eleminate coding and debugging"[2] back in the 1950s. Coding and debugging obviously haven't been eleminated over the 70 years since the report on FORTRAN was written, but in this essay we will review the different ways in which 4th generation languages, 5th generation languages and LLMs using natural language for code generation try to achieve that goal. ===== Overview of 4GL, 5GL and LLM popularity ===== There are a few different types of 4th generation languages. The ones most relevant to this essay are database query languages(with SQL being the best known example), data manipulation, analysis and reporting languages(such as R or Wolfram Language) and low-code development platforms(such as 1C). We can see that successful 4th generation languages are domain-specific languages: it's impossible to write, say, a Linux kernel driver in SQL or Wolfram Language. The narrow scope of the languages allows them to use higher-level abstractions than the ones used in general-purpose programming languages as these languages can make more assumptions about their environment than general-purpose programming languages. This means each of those languages has a niche in which they have an advantage over general-purpose programming languages, and they're thriving within that niche. 5th generation languages attempt to replicate the success of 4th generation languages within their niches without limiting the scope of the programs that can be written in them. They attempted to create a way to solve a wide range of problems problems by describing their constraints and using the compiler to derive an efficient algorithm from those constraints. However, the wide range of problems 5th generation languages claimed to be able to solve meant that they couldn't use the scope of the problems they solve to derive information useful in the solution of those problems. In practice, that meant that, as programs got larger, they took exponentially longer to compile and were much less efficient than programs written using general-purpose language, because deriving an efficient algorithm for a problem from the constraints of that problem is a difficult problem in itself, which requires the skills of a human programmer. Unlike 4GL and 5GL, which have been formulated as concepts since the 1980s, using LLMs to process natural language and generate code is a relatively new invention. LLMs have been used for code generation in a production environment by many software engineers and they have achieved good results in benchmarks, sometimes eclipsing human performance[3]. However, LLMs are limited by data that's available for training, and the vast majority of code that's publicly available(and can thus be used in LLM training) comes in short code snippets rather than in code bases describing complex systems. The benchmarks are similarly based on algorithmic tasks rather than complex problems that would require the LLM to generate large segments of code. Still, LLMs aren't a mature technology, so there is a possibility that their capabilities for code generation will improve in the future. ===== 4GL, 5GL and LLM limitations and potential ===== 4th generation languages are mostly limited by the scope of the problems they intend to solve. A language like SQL, while technically Turing-complete[4], is impractical for domains outside database queries. This limitation is also a part of the appeal of those languages, as they allow domain experts to utilize their knowledge in the process of software development as the domain constraints are aligned with the constraints of the language. These language can be used with great results in a specified domain, as the example of SQL shows: SQL is so prevalent in the domain of database query languages that database management systems are classified as SQL and NoSQL systems. 5th generation languages, in comparison to 4th generation languages, have failed to achieve the same influence. Most 5GL projects were terminated in the 1980s, and their original domain of AI research switched from developing programming languages for AI to developing AI using neural networks and ML algorithms implemented in 3rd generation languages like Python and C++. Since there aren't any new 5GL projects, we can say that they have exhausted their potential without much influence on the further development of AI or programming languages. LLMs, unlike programming languages, don't have a compiler written by human programmers that explicitly describes the rules of converting its input(the prompt for LLMs, or the code for 4GL and 5GL) into machine code. These rules are derived from the data given to the LLM during training instead. That means that the performance of LLMs can be improved by simply giving them more or better data for training, which is easier than trying to create a new compiler for a new language from scratch using human programmers. LLMs are also a rapidly developing technology, so they have much potential in the domain of code generation. ===== Conclusion ===== Overall, LLMs with natural language look like they can become the future of programming. They still need to overcome the lack of data describing large codebases in their training datasets, but if they ever manage to do that, they can transform the jobs of software engineers into mostly overseeing and reviewing LLM-generated code. 4GLs are a mature technology that has been a major influence on the field of software engineering, but they are unlikely to truly disrupt software engineering in the future because, again, they are a mature and well-understood technology. 5GLs haven't been used for much ever since their development in the 1980s, and they have been replaced by neural networks in the field of AI development in the 21st century. So, natural languages are the most likely to be the future of programming languages out of them, 4GLs and 5GLs. ===== References ===== - Martin, James. Application Development Without Programmers. Prentice-Hall, 1981. ISBN 0-13-038943-9. - Backus et al, Specifications for the IBM Mathematical FORmula TRANslating system, https://www.softwarepreservation.org/projects/FORTRAN/BackusEtAl-Preliminary%20Report-1954.pdf - Josh Payne, Stanford University, Code Generation with LLMs. https://web.stanford.edu/class/cs224g/slides/Code%20Generation%20with%20LLMs.pdf - https://wiki.postgresql.org/index.php?title=Cyclic_Tag_System&oldid=15106