|
|
|
|
Information nature of optimal classifications
Dr. David Kisets
The National Physical Laboratory of Israel,
Danziger "A" bldg., Hebrew University Givat-Ram, Jerusalem,
91904, Fax: 02-6520797, E-mail: kisets@netvision.net.il
Abstract: The paper discusses the information approach to the investigation of classification systems aimed at establishing the objective numerical conditions that enable to recognize the degree of system optimality and create new optimum systems. The principle of information cyclicity is used as the criterion in the investigation. The proposed approach corresponds to the best existing classification systems and is exemplified with some new ones. The paper focuses on the mathematically valid universal classification model.
Keywords: Classification, Optimality, Information, Statistical analysis.
1 Introduction
There is a vast number of created by mankind classification systems, such as systems of Elements, classes of quality for products and processes, tolerances and seats classifications, accuracy classes of measuring instruments and standards, measurement traceability chains, etc. The reliability of artificial classification system, however named, depends on its conformity with objective regularities in the nature having classification character. When considering the regularities as being optimal (owing to the optimum purposefulness of an evolution, in principle), the measure of the optimality of classification system is how far it meets the formal criterion forming the basis of the regularities.
Apparently, all parameters of a classification system, their structural relations and periodicity ought to meet the requirements of informational necessity and sufficiency in terms of information theory [1] that is the conception of informational optimality. The recently discovered principle of information cyclicity [2] is used as the criterion of creating the optimal classification systems.
The paper refers to some known classification regularities that conforms to information cyclicity and discusses some recently developed classification systems of evolutional nature. The discussion is aimed at recognizing how far an evolutionary developing phenomenon, being classified, mathes information optimality.
The mathematical consideration used regarding the universal classification model is aimed at developing optimal classification systems. The author believes that this paper may be of interest for the specialists in mathematics and other academic fields. The paper is also of immediate significance for metrology, linguistics and astrology – the fields, for which the references and examples of optimum classification are cited.
2 Preliminaries and major criterion
Quantitative classification regularities are built up horizontally (classification elements and subclasses) and vertically (classes, groups of classes and, theoretically, the formations of more complexity). Classification elements (e) are established in such a way that each two adjacent elements ej and ej+1 form the subclass Ej (or the class if a classification system is not divided by subclasses). Certain number (N - 1) of subclasses forms the class (C), where the number of elements is equal to N. The groups of classes (G) represent information series of hierarchy levels (classes) united by the assumption that each the group confines a specific completeness in classifying the certain kind of objects, etc.
The best information about the quality of a classified object is ensured when the classification hierarchy of the considered kind of objects is optimized in respect of:
parameters for the formations of more complexity.
The optimization may be achieved with the principle of information cyclicity establishing the so-called optimum accuracy coefficient ρo = 1/2π (where π = 3.1416), which has been initially intended for multipurpose application in qualimetry and metrology. The accuracy coefficient, being proved as the optimum uncertainty-tolerance ratio, has already proposed for the improvement of such classification hierarchies as measurement traceability chains and classes of accuracy [2]. Now the coefficient is proposed as the universal optimal ratio between minimum and maximum elements of a class. The example concerned is presented in section 5.
3 Optimum classification integers
For all other parameters of classification system (numbers of elements, subclasses, classes, groups, etc.) the criterion ought be presented as an integer, i.e. the optimum number (η) of the parameters. The information cycle expressed as an integral value h satisfies the condition of permissible information accuracy:
ê η - 2π ï /2π £ 1/2π (1)
In accordance with this condition, the optimum classification integer η can take a value 6 or 7 only, i.e. the information cycle as the period of classification may involve either six or seven classification parameters. Clearly the number of classification elements and subclasses in an optimal classification system is 7 and 6 respectively.
It may not be out of place to note here that the optimum classification integers correspond to many fundamental phenomena of classification nature, e.g. the six radiation classes of monochromatic radiation, the seven periods of the Periodic Table of Elements, the seven colors of sunlight spectrum, the seven naturals of the music notation, the seven series of categories in the biology classification, etc. On a large scale, for instance, the number seven is the rounding off ratio of the fine structure constant (137.036) and the constant of parsec (19.391), which are the characteristics of micro and macrocosm respectively.
Theoretically the unrolled classification hierarchy in its dimensionless universal presentation is the power series ranging from 0 to ¥ . Practically the two kinds of classification exist assuming number 1 as the starting point of classification, i.e. 1) the diminishing classification (ranges from 1 to 0), and 2) the increasing classification (ranges from 1 to ¥ ).
Irrespective of the kind of classification, with optimal classification integers the two boundary classification series can be developed, for which at any i classification level (i = 1, 2, . . .) the number of classification elements (Ni) may be determined in the following ranges: Ni1 = 6n-1 ÷ 6n, and Ni2 = 7n-1 ÷ 7n.
Within the boundary classifications any Ni meets the requirement of optimality. However, the difference between Ni1 and Ni2 is increasing as increasing i. The range Ni1 ÷ Ni2 can be represented as containing some intermediate number (N^i) that meets the requirements of mathematical harmonious relation and, therefore, may be considered as preferable. In so doing, the following proportion is true for each q informative group of classes:
N^iq / Ni2 = Ni1 / N^iq = foq, (2)
where: fo = 0.618 – the fundamental mathematical constant, called golden
section or golden mean, which is often used as the
measure of mathematical harmonious relation.
The optimum number noq of classes in a complete classification group is found as the rounding off (R) number calculated according to the following equation:
noq = arg [min ½ ¢ q - foq ½ = 0] = R(2π) = 6, (3)
where: ¢ = (6n / 7n)1/2 - the intermediation coefficient, which enables to calculate
N^i = Ni1 / ¢ = f *Ni2 at each classification level.
4 Universal classification scale
The optimum characteristics of accuracy hierarchy have enabled to develop the Universal Classification Scale (UCS), which consists of the set of dimensionless base-line numerical values plotted as system elements, subclasses, classes, and system groups that is presented in Table 1 for the diminishing classification, and in Table 2 – for the increasing classification. Each of the tables involves the module of relative numbers for a group of classes, which with multiplying factors (Fm) can be transformed into the classification system of any degree of complexity. Each of existing in practice classification systems might be identified in terms of UCS.
Table 1. UCS for diminishing classification

Table 2. UCS for increasing classification

The 18 classes placed in the tables are conventional. Formally, the UCS might be represented as the infinite number of hierarchy levels. However, application fields impose restrictions. In metrology (where Table 1 is useful as the unified scale of accuracy classification), the fundamental physical restriction to measurement exists that confines the levels of such a scale in principle. The restriction is known in physics as Heisenberg uncertainty. This smallest uncertainty has also the connection with information cyclicity and limits the scale of accuracy classification by the levels. As Heisenberg uncertainty imposes the absolute restriction on accuracy hierarchies, the hypothesis arises about the cyclical classification nature of intrinsic information metric in the universe; the brief discussion on the subject is given in the Appendix.
5 Practical examples
5.1 Information nature of the relative astrological scale of historical events
Here is the investigation of the astonishing regularity in appearing the significant historical events, discovered by E. Alan Meece [3], aimed at recognizing the degree of its informational optimality using the statistical treatment of the events and the principle of information cyclicity. Meece has supplemented the chronosequence of historical events in the development of civilization with relative estimates (g) of their importance. The scale is reproduced here in the form of chart covering the period from 600 BC up to 2600 AD (Figure 1). The events individually are not named here for simplicity. Each g represents the informational weight of respective event, thus it is possible to consider the normalized scale points as the estimates of probability of how the events influence civilization.

Figure.1 Relative astrological scale of significant historical events.
In terms of UCS the events occupy 13 subclasses: from C1E1 to C3E1 (see Table 2). The maximums (gmax) and minimums (gmin) of the scale points (g) may be considered as the spikes in manifestation of information against the average entropy. In terms of classification they represent the relative boundary hierarchy levels in the development of world civilizations. On the assumption of optimality in the nature, one can statistically expect the existence of the optimum ratio of boundary hierarchy levels. The ultimate objective of appropriate calculations is to prove that the ratio of the arithmetic means of the events M(gmin)/M(gmax) is as close as possible to the optimum accuracy coefficient provided the ratio of respective dispersions σ2(gmin)/σ2(gmax) is less then r o. It is meant that in this case the normalized estimates of probability P1 = M(gmax)/[ M(gmin)+M(gmax)] and P2 = M(gmin)/[ M(gmin)+M(gmax)] are considered as related to approximately independent events (P1 + P2 = 1). The frequently events are on the time-scale, the less estimation accuracy is due to ignoring the correlation between the events. However, for long time intervals, considering the maximum and minimum scale points as practically independent, this negative effect may be neglected. In so doing, the following conditions for rounding off informative values gmax and gmin from the totality of 320 scale points ought to be met in order to take them into account for the further calculations:
gmax > (1/320)[1 + 0.5(1/2π)]
= 23.2; (4)
gmin < (1/320)[1 - 0.5(1/2π)]
= 19.8 (5)
The 25 informative maximums and 24 informative minimums, extracted from Figure 1 in accordance with these conditions, are presented in Figure 2
Figure 2 Informative maximums gmax (36, 24, 44, 41, 40, 34, 55, 43, 28, 37, 42, 36,
41, 65, 44, 31, 46, 23, 36, 30, 33, 28, 53, 46, 29) and minimums gmin (7, 6,
5, 5, 9, 6, 0, 10, 8, 9, 7, 8, 18, 1, 10, 6, 8, 8, 7, 5, 1, 5, 6, 6) on the relative
astrological scale of events.
The arithmetic means and dispersions are calculated as follows:
M(gmax) = (1/25)
=
965/25 = 38.6; (6)
M(gmin)
= (1/24)
i = 151/24 =
6.21; (7)
σ
2(gmax) = (1/25)
2
= 122.1; (8)
σ
2(gmin) = (1/24)
2
= 12.4. (9)
The ratios of statistical characteristics to be found are as follows:
M(gmin)/M(gmax) = 6.21/38.6 = 1/2π (estimation error = 1%);
σ 2(gmin)/ σ 2(gmax) = 12.6/94.0 = 0.136 < 1/2π.
The obtained calculation results prove, firstly, the explicit correspondence between information cyclicity and the scale of historical events, and, secondly, the optimum classification ratio between their boundary values.
5.2 Information nature and classification of word-use frequency
Words of a language by the frequency of their use may be considered with systematic approach and can obey a hierarchical structure. The probabilities of the use of words that are proportionally to the frequencies have made it possible to apply information theory to prove the existence of this structure.
The modern well-developed language is the result of ancient evolution and, thus, a system, which can be expected as being of high degree of optimality. By this assumption, a hypothesis about the close agreement of an expected natural optimality of a language with a formal structure of optimum information classification of words-use (OICW) based on the principle of information cyclicity suggests itself.
If in a dictionary a word used for the explanation of other words occurs only ones, then the figure 1 may be accepted as the initial value and the argument of a minimum word frequency when forming the OICW. In so doing, six levels is true according to the principle of information cyclicity for a complete group of the number of times (Nw) the words are used in a dictionary. Therefore, Nw at each l hierarchy level (from 1 to 6) of OICW is to be found within the following limits (to be rounded off):
(1/ρo)l - 1 £ Nw £ (1/ρo)l or (2π)l - 1 £ Nw £ (2π)l,
which is tabulated in the part I of Table 1. This way of classification leads to the full word frequency hierarchy for an ideal dictionary of literary-conversational language (LCL).
Table 1. Optimum word list hierarchies

It should be noted that Nw is proportional to the normalized probability of using the word. When substituting a real diagram of probabilities [Nw max ³ (Nw max – 1) ³ (Nw max –2) ³ . . . ³ 1] by the linear diagram (D Nw = 1), it is possible to consider Nw as approximately equal to the number of words. The estimation error of such an assumption does not exceed ± 1/4π.
In terms of UCS (see Table 2), OICW occupies the complete group of classes (from C1E1 to C6E6). The increase of Nw beyond 61529 means that the ideal system is overfilling, and the consideration on the subject ought to be proceed to the new classification group, including colloquial, slang, and professional languages, namely from the LCL to an encyclopedic dictionary. On the other hand, the decreasing of Nw below 61529 is the natural characteristic of any real LCL dictionary.
As far as r o = constant, the real dictionary is always shifted by words frequency and the number of words in the direction of lesser levels. Meanwhile, the integrity of system demands the six-levels hierarchy to be assured, that is why the maximum Nw of real dictionary should not be less than min Nw = 9793 of the ideal one. In the framework of OICW and for the maximum estimation uncertainty this shifting is symmetrical and the medium values (med Nw) of OICW are very likely to be considered with the corresponding maximum and minimum Nwi of the some hypothetical average characteristic of some real (adequate to an “averaged user”) dictionary. In this case, clearly the number of levels remains to be the same (6 levels), and the number of components of the first level becomes to be not fully completed (less than 6). Incidentally, this criterion seems to be appropriate for the well-founded practical classification of existing and developing dictionaries by their quality in terms of qualimetry.
The Collins Cobuild Essential English Dictionary (CCEED) [4] was chosen for the exemplification of above proposed statements. The dictionary involves over 45000 references giving extensive coverage of current English and is recognized as being one of the best dictionaries of intermediate level. Besides CCEED has the list of all words used ten times or more in the dictionary explanations. The data of minimum and maximum NwC calculated according to the CCEED wordlist, and the medium hierarchy determined by the data of OICW are presented in the above table as parts (III) and (II) respectively, where med Nw = 0.5 (1 + 2π) (2π)l - 1. The CCEED data were obtained starting from 35056 down to 10 according to the natural wordlist, and from 22 to 1 - as an extrapolation.
Therefore, with the error no more than 3% the obtained result illustrates the coincidence of the data calculated for both the hierarchies, and thus the truth of the proposed hypothesis. The approximate equivalency of a word number and frequency enables developing optimal dictionaries and classifying them by quality grades in accordance with qualimetry.
6 Conclusions
Appendix: Hypothesis on the cyclical nature of information metric
In the universe
The information cyclicity and Heisenberg uncertainty enable to consider the information metric in the universe in terms of metrology. Any system can be controllable if it is provided with the measuring sections of appropriate accuracy, otherwise, the lack of measurement information necessary for a stable control may cause the collapse of a system with the probability increasing as approaching the highest hierarchy level of accuracy classification. Thus, hypothetically, the development and collection of information within the universe parts may like an accuracy hierarchy be limited for the universe existence.
Apparently, there is a balance between thermodynamic and information entropy and, figuratively speaking, between the non-living and living substances. The supposition is that the thermodynamic entropy in the expanded universe sharply falls due to the big explosion when the accumulation of information caused by a life activity becomes critical to ensure the stable control of measurement information. The critical level of an information hierarchy is characterized by
(b) the utter destruction of collected information, including the life and ordered structures,
(d) beginning the new large cycle of collecting information and increasing the thermodynamic entropy.
The large cycles infinitely recur, and, apparently, this infinite balancing process precludes the possibility of the thermal death of the universe. Largely, this scenario agrees with the so-called big-bang theory supplementing the theory with information approach. Besides, if local explosion and contraction cycles are possible, they may cause the existence of a number of universe parts where information is collected (that is to say a number of civilizations). All that may, possibly, dramatically change a notion of the origin of life, namely, instead of being either unique or accidental, it becomes a natural phenomenon. In a sense, the penetrating assumption of Wicken that the evolution and the origin of life are not separate problems [5] is of principle nature.
References
[1] C.E. Shannon, A mathematical theory of communication, Bell Syst. Tech, J.27 (1948), 379-423.
[2] D. Kisets, Optimum traceability type hierarchies. OIML Bulletin, 1997, XXXVIII(2), 30-36.
[3] G.G. Azgaldov, E.P. Raihman, On Qualimetry, Izdatelstvo Standartov, Moscow, 1973.
[4] Collins Cobuilt Essential English Dictionary, William Collins Sons & Co Ltd, London and Glasgow, 1990
[5] Wicken, Jeffrey S, Evolution, Thermodynamics and Information: Extending the Darwinian Program. Oxford University Press, 1987, p. 314.