Towards a simple mathematical theory of citation distributions

The paper is written with the assumption that the purpose of a mathematical theory of citation is to explain bibliometric regularities at the level of mathematical formalism. A mathematical formalism is proposed for the appearance of power law distributions in social citation systems. The principal contributions of this paper are an axiomatic characterization of citation distributions in terms of the Ekeland variational principle and a mathematical exploration of the power law nature of citation distributions. Apart from its inherent value in providing a better understanding of the mathematical underpinnings of bibliometric models, such an approach can be used to derive a citation distribution from first principles.

This task is completely left to scientometricists. The mathematical theory of CDs is used to investigate a mathematical substitute instead of a real process. For this mathematical substitute, the term mathematical structure has been introduced.
The objective of scientometrics is to bridge a gap between our insights of science and our knowledge of science Mingers and Leydesdorff (2015). A mathematical theory of citation can appear as an attempt to understand the structures that constitute the bases of scientometric models. To "understand" here means to bring a bibliometric structure into congruence with a mathematical structure. The purpose of a mathematical theory is fulfilled if it provides a structure of thought objects that allows us to relate bibliometric data sets and interpret the state of affairs in science by making mathematical deductions. A scientometric model attempts to create a heuristic explanation of an empirical data set. In contrast, a mathematical theory of citation is not concerned with bibliometric data per se and strives to construct a clear and coherent framework that accurately expresses some scientometric propositions in mathematical language. In this way, opportunities emerge for applying sophisticated mathematical concepts to bibliometric phenomena. The difference between a bibliometric model and a mathematical theory of citation is more apparent than real because, although the concepts of bibliometrics can be analyzed in terms of mathematics, they cannot be eliminated in favor of the latter without losing the understanding gained by bibliometrics. In particular, a firm foundation for a mathematical theory of citation can be obtained only phenomenologically by comparing the consequences of basic mathematical statements to bibliometric data.

Motivation
We will study the axioms on which a mathematical description of SCS can be based. The author risks asserting that a mathematical theory appears to be a systematic reformulation of the problem of cumulative CDs on a purely mathematical basis. That is the main intent of this paper. Before we proceed with the analysis, we remark that there are no strong arguments leading from the bibliometric facts to the axioms. However, as we hope to show below, one can obtain additional conceptual information (relating to SCS) that is not readily available from a conventional bibliometric model by means of the axioms.

Purpose
The purpose of the research reported in this article is to provide a simple and coherent presentation of CDs based on the Ekeland variational principle. We stress the elementary variational principle governing the state of SCS and have also attempted to provide enough technical detail to create a basis for potential future studies.

Methodology
The paper addresses the construction of structural hypotheses for "how SCS works" rather than statistical inferences from bibliometric data. We accept that the continuous reproduction of a scientific inequality is a conceptual basis for almost all SCSs (cf. Bourdieu 2004). An emphasis is placed on the role of the variational principle as a valid approach for describing the local behavior of an continuous SCS. We consider an SCS to obey the following scheme. Suppose an SCS is a sufficiently smooth "motion" to ensure the consistency and the integrity of citations. In phase space, this condition is equivalent to a variational principle that produces the Euler equation for the weak form of a CD. This variational principle asserts that, for an appropriate functional, one can add a small perturbation to make it attain a minimum.

Preliminaries
A mathematical theory of CDs cannot make sense of bibliometric models of CDs. However, this theory can make sense of mathematical models; therefore, it stipulate that bibliometrics must be presented in mathematical terms. We will call a function z � → N (z) giving the number n of scientific papers which have been cited a total of z times a citation distribution (CD). By construction, we define the event ω as the value of z, ω := z . Under mild assumptions, it can be assumed (somewhat non-rigorously) that, with respect to the Lebesgue measure, for any Borel set B, ζ being a corresponding random variable (RV for short), defined on a certain probability space (Ω, B, P). Because of this result, without any great error one can also view the quantity N(z) as the probability density function (or, in abbreviated form, PDF) f(z) for finding the citation process at the point z in phase space (see also Redner 1998;Gupta et al. 2005), ignoring, for now, the objection that z and n assume only non-negative integer values. We do not consider the case where supp f (·) is discrete separately. Although this is not mathematically rigorous, it is often useful to identify the CD N (·) with the PDF f (·) because a discrete SCS might be too complex to allow analytical results to be obtained.
All empirical CDs are different, but many have some statistical properties in common. In broad terms, power distributions of the form are frequently accepted without question. In the expression (1), z min is a threshold value, and by l(z), we denote a slowly varying function (for the precise definition, see Borovkov 2013) such that for any fixed k > 0, the expression lim l(kz) l(z) , as z → ∞, is equal to 1. More concretely, owing to complexities arising from the intricate citation dynamics of papers, the age distribution of references, the role of scientific journals, etc., the CDs are quite complicated in detail. However, to a reasonable approximation, a CD can be represented (in the long-time limit of the observation period) by the relation (1) (among an abundant literature, we refer to Brzezinski 2015;Egghe 2005;Radicchi and Castellano 2015;Redner 2005;Wallace et al. 2009). The systematic study of CDs' deviations from power laws is not the subject of this paper. However, the literature on this topic is currently growing; the reader can see, e.g., Golosovsky and Solomon (2012), Golosovsky and Solomon (2014); , Thelwall and Wilson (2014), Wang et al. (2013) and Yao et al. (2014).
Through the paper, we adopt the following notation: V is a real separable reflexive Banach space equipped with a norm � · �, and V ⋆ is its topological dual endowed with the natural norm � · � ⋆ . The duality mapping between V and V ⋆ is denoted by �·, ·�. In In the language of P(Z ∈ B), the PDF f (·) is (almost everywhere) given by formal differentiating; as a result of this, a rather simple interpretation of f (·) can be given in the framework of Sobolev spaces H k (R). (For the definitions and properties of Sobolev spaces, see Maz'ya 2011.) Unless specified, in the following, I is an open interval in R. For technical reasons, H k (I) is a good example of the space of RVs ζ such that In addition to the probabilistic treatment, one can say that an SCS acting on some function ϕ(·) yields an RV ζ. In other words, we can also state that an SCS allow us to bring one and only one well-defined RV ζ ∈ V into correspondence with each function ϕ(·) ∈ V ⋆ .

Results
Because ϕ(·) and ζ are so fundamental in this paper, it may seem strange that we have not explicitly defined them in formal mathematical terms. As with other primitive objects of the mathematical theory, the most one can do is to give the implicit definitions by postulating the properties that hold for ϕ(·) and ζ.
We shall attempt now to shed some light upon the relation between the function ϕ(·) and the quantity ζ. Pick any ε > 0; then, the partial order of Bishop -Phelps on V × R can be defined as follows (cf. Johnson and Lindenstrauss 2001): For a nonempty closed convex subset M ⊂ V × R which is bounded below, in sense that, there is a minimal element [v * , v ⋆ * ] in the partial order (2), according to the classical theorem of Bishop -Phelps (see, e.g., Deville and Ghoussoub 2001). In this connection, it is evident that the map ϕ � → ζ is nothing but the Riesz -Fréchet isomorphism from V ⋆ onto V. This means that we have characterization via where (·, ·) V indicates the scalar product in V. Based on this assertion, we will collect the basic properties of ϕ(·) and ζ in the following axioms: , convex, and bounded below (inf V ϕ > −∞) function from V , � · � to R + , satisfying the following condition: A 2 Among all admissible ζ, the quantity ζ * which actually describes a given CD, is assigned in such a way that the function ϕ(·) reaches its minimum. The axiom A 1 hinges on a corollary Aubin and Ekeland (2006), p. 262 of the Ekeland variational principle Ekeland (1974). The term "Ekeland variational principles" refers here essentially to a result stating that the function ϕ(v) possesses arbitrarily small perturbations such that the perturbed function will have an absolute (and even strict) minimum Ioffe and Tikhomirov (1997). More precisely, there exists v ε ∈ dom ϕ and v ⋆ ε ∈ ∂ϕ(v ε ) such that Loosely speaking, if v ε is at least as good as v, then v ε is almost the same function for which the minimum of ϕ(·) is almost achieved. At the same time, it is important to stress that the Ekeland variational principle does not guarantee that ϕ(·) attains its minimum. The essential features of our approach include the use of Sobolev spaces. Consider now the Gelfand triple where is a closed affine subspace of the second-order Sobolev space H 1 (I) (which is also a separable Hilbert space) and the embeddings are dense, continuous, and compact. Given a function v ε , we set From now on, we will use the transformed functions ṽ, but for convenience drop the tilde.

The space V := H (I) is endowed with the usual scalar product
We see that the associated norm satisfies the axiom A 1 . Before moving on to consider ζ, however, it is tempting to slightly generalize the definition of V. Modelization of an complex real SCS requires us to introduce some additional constructions. The differential operator K, is an isomorphic map K : V ⇆ V ⋆ . We will work on the so-called energetic space H E (I) such that Hereafter we use the notation E := V E = H E (I). To avoid overloading our presentation, we refer the reader to Zeidler (1999) for details, proofs and explanations; the interested reader can compare this approach to the one described in Kristály et al. (2010). Thanks to the notation introduced above, the energetic space E is equipped with the scalar product given by The induced energetic norm can be written as

By definition, put
That is, ϕ may be interpreted as measuring the average value of the weak derivatives. Recall from Attouch et al. (2014) that the statements listed below are equivalent.
S 1 There exists a unique ζ ∈ E such that The formula (5) reads as the "weak" Euler equation in the current setting.
S 2 ζ is obtained by It is well known from the theory for variational problems in Sobolev spaces that any local minimizer of ϕ(v) in the E topology is also a local minimizer of ϕ(v) in the C 1 topology, and it follows in the standard way that for the quantity ζ, we have where and c 1 , c 2 are constants.
To arrive at specific, relevant RV ζ one has to make an assumption in addition to the axioms A 1 and A 2 . Now let us choose the function z � → η defined in Eq. (4) according to the formula (1), i.e. in the form of a slowly varying function (see Borovkov 2013). Therefore, we have We then introduce the function η(z) given by To set up the problem, we eliminate the factor from Eq. (7) by rescaling the quantity ζ In an appropriate normalization, instead of the original RV ζ occurring in Eq. (7), we now have to deal with the renormalized RV ζ. Substituting expressions (11) and (12) into Eq. (8), we obtain At least for all practical purposes, it is possible to represent the relation (12) by means of a standard uniform RV U We will treat Eq. (13) in a broader sense -as the Wakeby distribution (WD) (for more details, see Katchanov and Markova 2015; the WD features may be found in Hosking and Wallis 2005). Introducing the continuous parameters β, γ, δ, which are called shape parameters in statistics, the continuous location parameter ξ, and the continuous scale parameter α, we may cast Eq. (13) into the following general WD form Johnson et al. (2010), p. 44-46 (7) ζ = (c 1 cosh η + c 2 sinh η), In a special, but very important case, when α = 0 or γ = 0, the WD in Eq. (14) reduces to the generalized Pareto distribution (GPD) To furnish a concrete illustration, we collected the sample of articles and reviews published in journals that put in print more than 100 documents per year and were indexed in Journal Citation Reports 2003 Science edition (Thomson Reuters). All data were downloaded from Web of Science (WoS, updated on August 8, 2013), with a 10-year time window. The number of citations z was counted as the total number of times a paper appears as a reference of a more recently published paper indexed in the Web of Science Core Collection. There are 31, 097, 160 citations among 1, 062, 961 papers. The WD, the Weibull, and the lognormal distribution were fitted these bibliometric data using the 'lmomco' R package. Goodness-of-fit was done based on Kolmogorov -Smirnov's statistic D and Anderson -Darling's statistic A. The values of the test statistics are reported in Table 1 (see also Figs. 1,2,3,4,5 and 6). Comparing the obtained values and goodnessof-fit statistics given in the Table 1, it will be seen that the WD offers a higher level of accuracy than the other probability distributions considered. We conclude that the CD obtained turns out to be similar to the WD.

Discussion
One of the most exciting and fruitful applications of mathematical methods in the natural sciences is the variational principle. The substantive aim of the present paper is the derivation of a variational principle, which makes it possible to interpret the empirical regularities of the CDs as a logical necessity. Starting from the famous Ekeland variational principle, we show that the derivation of the CDs given in this paper might be considered a step in indicated above direction. Using the variational principle (6) in the energetic space E together with empirical evidences about the existence of the slowly varying functions representing the right tail of the CDs allows us to introduce the WD (and the GPD) naturally.
Let us stress that modest mathematical means concerning some simple facts of functional analysis yield a simple mathematical theory of CDs from which, as its consequence, concrete CDs are immediate derived. It is remarkable that a first-principles  derivation of the CDs (e.g., GPD) in a bibliometric model is possible at the price of uncontrollable assumptions, which are justified a posteriori. On the contrary, in our derivation it is only assumed that Eq. (8) is relevant. This is, of course, more satisfactory. However, note that there are no proper bibliometric reasons for which the Sobolev spaces are preferred over any other, and, therefore there are also no reasons to give the vague bibliometric meaning of the consistency and the integrity of citations the mathematical form of Ekeland's variational principle. One must bear in mind that our result refers to properties of some "pure mathematical structure". Like any mathematical result, the Eq. (13) cannot give a completely accurate description of a empirical CD. Moreover, in the mathematical theory of CDs, "by construction", we have no direct knowledge of the statistical parameters. Thus, we can only measure the parameters that index the CDs, not compute them from the axioms.

Conclusions
In summary, the approach suggested here allows an interpretation of the Ekeland variational principle in terms of the standard uniform RV, which may have some interest. It is shown that in a sufficiently "smooth" SCS a power-law tail of the static CD can appear. However, there are no grounds to consider this a mathematical model underlying bibliometric theory. At the same time, the present study may be instructive beyond the specific research site and can contribute to a mathematical theory of CDs building.