THE EMERGENCE OF INFORMATICS AS A NATURAL SCIENCE
Natural sciences can and should seek to answer questions about the essential elements (components, properties, functions) of natural phenomena, to understand them and to forecast their evolutions as much as possible. Each of the traditional disciplines (e.g., physics, chemistry, biology) answers questions about one of the components/properties of natural phenomena. The primary concern of physics is energy; for chemistry, the goal is to understand matter; and for biology, it is to understand life.
In ancient times, these areas were regarded as part of philosophy. Aristoteles, for example, wrote about physics (περíφúσεως means about nature), with philosophy considered to be a general reflection on nature. These disciplines were originally part of philosophy and emerged as distinctive subjects; furthermore, each continues to split with increasing speed and acceleration today. For example, we may now speak about nanotechnologies as a technical offshoot of modern physics and chemistry or genetics as an offshoot of contemporary biology.
Disciplines also fuse with one another: biochemistry and bioinformatics are examples. Their interactions are increasingly perceived, measured, and controlled, so no scientist wishing to be taken seriously may admit to ignoring these interactions if they want to be regarded as credible in understanding nature. This evolving picture illustrates a recognition of the extreme complexity of nature itself and an acknowledgment of the diverse and sometimes paradoxically contradictory qualities of real scientists, including ambition and modesty. Scientists should be ambitious to explore the complexity of natural phenomena but also modest because no single competence may comprehend the subtleties of interactions among all significant factors. This limitation may justify a strong interest in multidisciplinary studies supported by collective intelligence – but the reality is far from theory, as we all know.
The emergence of sciences over many centuries was marked, in the 19th century, by significant progress in understanding “what is energy” (physics) and “what is matter” (chemistry), and in the 20th century by progress in the “what is life” (biology). Another crucial property, steering all natural phenomena (and enabling our ability to study them), was only gradually emerging to be identified, isolated, and studied in this later period: information.
The invention of the computer had a similar triggering effect on the construction of knowledge as had been the case for the printing press centuries earlier. Indeed, essential computer scientists from 1940 to 1960 (e.g., Turing, Wiener, Shannon) paved the way for a foundational new discipline. A well-known aphorism reminds us that computer science is not more about computers than astronomy is about telescopes. In the second half of the 20th century, informatics became recognized as the science of information, a property of nature of the same importance as energy, matter, and life, the four being in perpetual interaction in all natural phenomena. Examples of extreme complexity and significance in the human body are the brain and the immune system (see, for example, debates on the meaning of informatics: https://en.wikipedia.org/wiki/Informatics). Being the computer programs (i.e., the control) stored in the same memory as data, programs are able to treat program code as data. Therefore the invented artefact – the computer – is much more than a machine, it is rather a meta machine, i.e.: a machine capable of learning, i.e.: able to produce totally different machines as a function of the course of interactions with the environment. This phenomenon is very clear and popular today, with the emergence of machine learning in deep neural nets with big data and large language models and consequently the emergence of interactive, generative artificial intelligence systems such as ChatGPT.
As colleagues at the University of Edinburgh (https://www.ed.ac.uk/informatics/about/what-is-informatics) have synthesized the subject, informatics, the science of information, consists of computer science (the components of modern computers, including robots), cognitive science (how animals, individually and collectively, perceive, understand, reason, decide and act) and artificial intelligence, which links the two complementary disciplines. This vision of informatics includes
- Technical/mathematical components, such as logic, complexity theory, algorithms, software, databases, hardware, and networks typical of computer science; the anatomy of artificial systems.
- Functional processes, such as perception, reasoning, communication in natural language, and action: the physiology of natural systems typical of cognitive science.
- The synergy between artificial and natural systems typical of artificial intelligence, for instance, in the approach called “multi-agent systems” pioneered by Norbert Wiener in his book on Cybernetics: (https://en.wikipedia.org/wiki/Cybernetics:_Or_Control_and_Communication_in_the_Animal_and_the_Machine).
- And, in recent years, emotions and social studies.
In other words, information and knowledge are essential elements of “agents,” whether the latter are living or artificial, engaged collectively in collaboration and competition (see, e.g., evidence of these processes in the development of species produced by Darwin), as has long been recognized as the case for animals, from single cells to complex heterogeneous living societies.
In this vein, the author will revisit his interactions with Prof. Krief over the last 20 years to demonstrate that the latter’s vision merits profound interest regarding the future of chemistry and natural sciences more broadly and, consequently, of related technologies.
THE ENCORE PROJECT: KNOWLEDGE CONSTRUCTION AND USE BY INTERACTIONS
Around 2003, the author met Prof. Krief, who was interested in our work and considered it somewhat different from other computer scientists he had encountered previously. At the time, Prof. Krief wanted to devote his experience, competence, and reputation to an enterprise that was attractive and simultaneously daunting – the EnCOrE project.
EnCOrE is an acronym for Encyclopedie Chimie Organique Electronique. The idea was to build a repository (an encyclopedia) in the domain of organic chemistry that could help scientists and students “understand and forecast” natural phenomena in organic chemistry (e.g., reactions).
As an informatician, the author was enthusiastic about the project, which could be a very concrete, compelling, and valuable testbed for constructing a series of interactive, knowledge-based systems but recognizing that critical bottlenecks would include using such systems once they existed and the construction itself. By 2005, we could include EnCOrE in the European Union (EU) 6th Framework Programme Project E-LeGI (European Learning Grid Infrastructure https://cordis.europa.eu/project/id/002205). The proposal convinced the EU that the interactions necessary for constructing and using EnCOrE were significant for human learning. As one of the critical application testbeds (or SEES: Service Elicitation Exploitation Scenarios), EnCOrE won part of a significant support gained by E-LeGI (€10 million for 23 partners) from the EU for a period of four years.
To express the needs adequately, the following is a paragraph quoting remarks by Prof. Krief, founder of EnCOrE, which he made when addressing the E-LeGI consortium during its meeting in Brazil in 2004[1,2]. The author highlights a few issues and phrases that he finds particularly valuable:
“Currently, the information is only delivered flat according to a single point of view dictated by tradition of ‘book organization’ following the ‘Johannes Gutenberg age’ (ca.1400-1468). In fact, chemists’ brains work differently, and the usual delivery message is context-oriented. There is a huge number of different contexts which are covered, and it is impossible using a book or even a lecture to describe them all (experimental-oriented, starting material-oriented, product-oriented, mechanism-oriented, stereochemical-oriented, calculation-oriented…)… Not only methods and tools needed for each context are different (flasks, molecular models, heavy calculation), but even the words used in each of these contexts are not properly defined.”
“The construction of the EnCOrE Dictionary is extremely important for our project. It will fix the language and the related ideas and will play an important role in questioning EnCOrE. Its production is an act of power. If this power is not well understood, the chemists will ignore it. For that purpose, it is extremely important that chemists accept and use it. For that purpose, it should be elaborated through a collaborative work implying discussions, contextualization, and consensus between the chemist’s community. We want toarchive the discussions in order to keep the dictionary always alive by reactivating the discussions on a single word from time to time according to new needs. We believe that the times where confirmed chemists, sent by their respective governments, were gathering in palaces sponsored by IUPAC (Union of Pure and Applied Chemistry, https://iupac.qmul.ac.uk/) to build the compendium of chemical terms in a non contextual manner, is over.”
The two fundamental messages from Prof. Krief were 1. the importance of words, language, and agreements about the semantics and 2. the strong dependence of these on the context. Both concerns remain at the core of modern informatics, thus encouraging a joint project. Prof. Krief’s vision of the future of chemistry was one of a pioneer, without any doubt.
A crucial point concerns the difference between “information” and “knowledge.” We were perfectly aware that it would be possible to build a repository of information, a kind of electronic library of concepts, properties of concepts, and relations among concepts and their properties, often called “ontologies.” There are many databases, particularly in chemistry; however, the goal of EnCOrE was much more ambitious. Prof. Krief explained that he wanted to identify and implement a process of construction and use of chemical knowledge, not merely chemical information.
For years, we discussed the difference between information and knowledge. We proposed a simple definition: knowledge is information necessary and sufficient in the context of a decision. Prof. Krief agreed, but simultaneously, problems emerged: the “context” where knowledge is exploited depends heavily on the individual and their previous knowledge, goals, strategies, and tactics, in the case of an expert chemist constructing the encyclopedia and when the knowledge is being exploited, whether by a student or an expert.
After months of work on a small subset of chemical concepts (e.g., chemical equation, chemical structure, element, functional group, named reaction, pure substance, retro-synthesis scheme, segment, reaction vessel), a small encyclopedia was built; nevertheless, many different points of view remained unresolved among the contributing senior chemists. However, the participants all progressed significantly in “learning,” as described below. In retrospect, this failure led us to revise our plans: Prof. Krief and the chemists were encouraged to investigate more deeply the tools, methods, and initiatives dedicated to human learning and dissemination of scientific knowledge, particularly in chemistry, while the informaticians started another project (ViewpointS) which continues (see, e.g., Refs.[2–6]). This latter project is the best testimony of the influence of the experience we gathered previously working with Prof. Krief in EnCOrE.
A synthesis view (2) of the interactive process is presented in Figure 1, in which the agents reach a consensus even in the presence of different points of view about the world.