Biomarkers: the link between the macro- and the micro-environment
In order to causally link the micro- and macro-environments, omic technologies provide a key set of instruments in cancer research: these allow us to connect exposure and disease by finding the “right” biomarkers. Biomarkers are key in causal analysis in cancer research and play a major role in our conceptualization of cancer causation. This is well expressed in the diagram that connects exposure markers, early effect markers and susceptibility markers in the classical “molecular epidemiology” paradigm, as described in Schulte and Perera’s [
1] book and further elaborated recently [
16].
In 1998, the National Institute of Health Biomarkers Definitions Working Group defined a biomarker as “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention.” Biomarkers are largely
constructed by cross-checking data that are generated by some machines (e.g. mass-spectrometry) and subsequently analyzed using other machines (computers and their algorithms). An important question therefore concerns the kind of
ontological status that we should give to biomarkers. Strictly speaking, they don’t seem to be just ‘objects out there’. Schulte and Perera [
1] describe biomarkers in terms of ‘events’ in the continuum from exposure to disease. But even within this continuum, such markers may represent a genuine event (e.g. direct exposure to a pollutant), may be correlated with such an event (the classical example of yellow fingers in heavy smokers), or even be a predictor of the event without being causally associated to it (like the association between two X chromosomes and the propensity to wear skirts). The fact that biomarkers are hardly corresponding to “causal” molecular entities does not imply that they cannot be measured. In fact, this is what molecular epidemiology routinely does. But, as Schulte noticed as early as 1993, there are multiple ways of defining and measuring biomarkers, which raises the question of their ontological status.
The issue gets even more complex because molecular epidemiology is not interested in finding biomarkers
per se, but in understanding the
continuum of disease development from early exposures, via finding biomarkers. Similarly, in other contributions, the technologies used to detect biomarkers (some of which are called omic technologies) are said to provide the ‘missing link’ between exposure and disease or, given the previous discussion, between the macro- and the micro-environment [
17‐
19].
This conceptualization of biomarkers search—i.e. as the continuum linking exposure and disease—emphasizes processes rather than things or objects. This calls for two remarks. On the one hand, biomarkers are not entities, things to which we can attribute some causal power, in the same sense as HPV virus has the power to initiate the onset of cervical cancer. Instead, biomarkers are clues, indicators, markers to detect in order to reconstruct the missing link. On the other hand, and related to the previous point, we need to say in which sense, if any, these continuous links, or processes, between exposure and disease are causal. This is all the more important because we seek to link heterogeneous levels as the macro- and the micro-environment. In sum, our approach aims to address two main questions: first, how to understand causal production from the macro- to the micro-environment, and second, why it is important to have a coherent conceptualization of such causal links. We discuss these two issues in reverse order: spelling out the second question will provide further motivation for our approach.
Finding a coherent conceptualization of the link between the macro- and the micro-environment is important for the following reason. The macro-environment consists of biological agents, pollutants and chemicals we are exposed to, but also of social interactions and “psycho-social factors”. The micro-environment, instead, is made of biochemical and molecular processes measured at different “omic levels”. How to make the causal link between the macro- and the micro-environment plausible, beyond a “coarse-grained” difference-making relation between the two?
By and large, traditional epidemiology has done this successfully for a long time: establishing robust associations between classes of exposures and classes of diseases. But with the advent of molecular epidemiology, these associations also relate factors at very different levels (the micro and macro environments). This rests on a change of the scale of measurement: environmental exposure has traditionally been assessed by measuring the levels of individual chemicals in, say, air or water. Thus newer finer-grained measurements initially try to restore some kind of “scale homogeneity”: measure the level of a pollutant or of a chemical externally and then measure changes at genomic, transcriptomic, proteomic, or metabolomic levels internally. Although ‘scale homogeneity’ is restored through making all measurements chemico-biological measurements, the problem is not solved.
In fact, measurements now taking place at the same level allow the researcher merely to establish another association or series of associations (difference-making relations), albeit at a much lower level now. For instance, we might establish a robust correlation between the level of a certain chemical in the air and the biomarker of early clinical changes of a targeted disease (lung cancer). But this doesn’t establish a
causal link yet. It only estimates a more precise measure connecting levels of hazards and levels of omic changes. On the one hand, to establish a causal link we still need to find the right “intermediate” biomarkers, the ones that are linked to exposure and to disease. To be sure, this search (finding appropriate biomarkers) obviously relies upon studying associations, e.g. via omics analyses. On the other hand, we need to place this reconstructed link into a plausible network of relations (i.e. the mechanisms of carcinogenesis described in the first part of the paper), and this is precisely the kind of ‘biological thinking’ mentioned earlier. It is important to note that linking, here, cannot be seen by the naked eye, and not even using sophisticated experimental set-ups. Instead, the scientist reconstructs the linking by putting together the pieces of the
evidential puzzle, just as a crossword puzzle [
20]. Biological theory needs to be complemented with the results of omic analyses, which in turn need sophisticated and complex statistical analyses. It is in this sense that cancer etiology needs a
plurality of evidence from which to make causal inferences. All this requires considerable empirical evidence and much
interpretation of the evidence using the appropriate concepts. One such concept is
information transmission, as we argue later.
A second, more important, reason why the problem is not solved is that although homogeneity in the scale of measurement is restored by using biological measurements, this makes the results harder to interpret, because the interpretation still has to identify causes at the macro level, i.e. the level of environmental exposure causing disease. We need this causal knowledge to design appropriate public health interventions. To sum up: we measure everything at the micro-level (level of pollutant, and level of metabolite) but ultimately what we want to know is how and to what extent environmental pollutants or psycho-social factors cause diseases. The problem molecular epidemiology faces is: how can we understand macro-factors causing micro-factors, or vice versa? What we have to establish is a continuous linking, not just (finer-grained) correlations at a different level of measurement. Continuous linking can be conceptualized as information transmission, as we explain next.
We mentioned earlier that causal claims about exposure and cancer involve statements about risks, i.e.
difference-
making: whether certain exposures are good predictors of disease, at different stages of disease development, or at different stages of life, etc. Simultaneously, we also look for evidence about
how exposure leads to developing disease. Typically, ‘how’ exposure leads to disease has been understood in terms of the mechanisms that produce disease, mainly with the study of biomarkers. Mechanisms provide us with information about how causes
produce effects. This position is called, in philosophy,
evidential pluralism, to emphasize the need for multifold (or multi-layered) evidence in order to establish causal claims [
3]. A prestigious example of evidential pluralism is the joint use of epidemiological evidence (difference-making) and mechanistic evidence (productive causality) in the Monographs of the International Agency for Research on Cancer [
21].
The difference-making component of evidential pluralisms is, in a sense, less controversial than the productive component, as even theorizers of agnostic data-driven approaches will agree that the search for robust statistical associations lies at the very heart of data-intensive science. What remains controversial is what biomarkers are marks of within a mechanistic understanding of cancer etiology. This is problematic because, as discussed before, we want to establish links between macro- and micro-factors. On the one hand, causal relations are not reduced to bio-chemical relations and, on the other hand, they are not mere (finer-grained) statistical associations among macro-variables.
If the causal link connects factors at different scales and of different types, then the notion of productive causality (i.e. how causes and effects are linked) needs reconceptualization. But the type of linking sought may be different depending on the scientific context or the purpose of the causal question.
There are several candidates for characterizing links; we mention the two most prominent here. First: Wesley Salmon’s “mark transmission theory” [
22‐
25]. In Salmon’s view, the central question is how to distinguish between
causal processes and non-causal (or pseudo) processes. Simply put, causal process transmit marks, while pseudo-processes don’t. Think about what happens when introducing a mark in a process: if the process is causal, the mark persists at a later stage. A stock example is denting a car, and observing that the dent is transmitted along with the movement of the car, while its
shadow will not further transmit the mark. This shows that the moving car is a causal process, while a moving shadow is not. However, not every process can be marked, and Salmon formulated the approach in counterfactual terms: a casual process is one that
could be marked and that
could transmit the mark. Causal processes, in this approach, are those transmitting physics quantities, such as energy or momentum (think of billiard balls colliding). However, this approach is tailored to physics and does not provide the conceptual tools to understand the macro–micro linking mentioned above. Second: the ‘complex-systems’ approach [
26]. According to this approach, to establish causal relations one needs to identify mechanisms, in the sense of complex systems that link causes and effects. Such approach, however, emphasizes the
organization of different components of a mechanism, rather than the
continuum linking exposure to disease. For instance, a mechanistic explanation sheds light on the way a gene normally is methylated, and how it is hypomethylated when exposed to tobacco smoking. We can shed light on these mechanistic aspects by identifying the relevant molecular entities and activities involved, and their
organization. But this is not very illuminating about the continuous link between exposure and disease, that is the
process initiated with exposure and that eventually leads to disease development, via several intermediate stages.
The link is instead better conceptualized using the notion of “information transmission”. Note that information transmission does not coincide with transfer of biological information between macro- and micro-factors. Instead, information transmission refers to how the scientist reconstructs the linking between macro- and micro-factors, putting together all the available pieces of the evidential puzzle. In other words, information transmission is at the level of epistemology, not of ontology.
In a previous article [
27] we suggest that we need to explore the prospects of the notion of information that comes from the way scientists themselves explain the role of biomarkers; in this context, the idea of ‘picking up signals’ recurs, for instance:
From these two parallel analyses [statistical analyses], we obtained lists of putative markers of (i) the disease outcome, and (ii) exposure. These were compared in a second step in order to identify possible intersecting signals, therefore defining potential intermediate biomarkers [
28].
What is the
signal that we have to pick up? In what sense will this give us the sought production-relation between exposure and disease? Our suggestion is to conceptualize the detection and tracing of signals in terms of
information transmission, as sketched above. This, we submit, is a generalization of Salmon’s mark transmission theory [
27].
The key difference with Salmon processes consists in the marking aspect. Salmon’s approach rests on the introduction of the mark. However, in most cancer research we look for existing marks from exposure to disease that transmit along the process, without introducing them ourselves. Cancer research is largely an observational rather than an experimental science.
This understanding of causal production as information transmission takes full advantage of a conceptualization in terms of mark transmission in processes, without being tied to the quantities of physics, say energy or momentum, being transmitted. It also takes full advantage of a conceptualization in terms of mechanisms, because knowledge of relevant molecular or biochemical mechanisms will indicate where to look for signals, for instance choosing appropriate omics levels for the analyses of biological specimens. In this sense we say that mechanisms are
information channels: “biochemical or molecular spaces” where we look for the flow of information that we try to intercept using biomarkers [
27].
Ultimately, we want to understand the whole phenomenon of carcinogenesis: all the relevant omics levels involved, how they interact, and build reliable models of the dynamic evolution of whole systems under many different exposure conditions. The concept giving the dynamic evolution is information transmission. The flow is in the link, and the link, as suggested, is best thought of as informational. More precisely, it is given by the scientists’ reconstruction of the information transmission through the different types of analyses, i.e. by putting together all the pieces of the “evidential puzzle”.
The question remains: what exactly does information mean? In Genome Wide Association Studies (GWAS), there is at least some possibility of a clear (univocal) definition of information, as genes are more clearly defined than in most omic measurements, and substantive informational concepts make sense when applied to genes. Instead, in Exposome Wide Association Studies (EWAS) information is still not well-defined [
27]. (Often omic “signals” are only “features”, i.e. they need to be decoded after discovery). However, the diversity and richness of informational concepts (many of which currently being developed and discussed), is not a weakness of an informational approach, but a virtue. This is captured, for instance, by philosophical accounts, especially those developing
qualitative notions of information. One such account is
semantic information, namely what the observer (here, the scientist) can process, looking at the data, omic analysis, biological theory, etc. It is in this sense that information transmission cannot be reduced to biological information, but it is certainly part of it.
One advantage of information transmission is that it is capable of offering a structure for thinking about how heterogeneous factors such as micro and macro-biological and social—are linked; this is a pressing issue in the light of results of omic studies and also for the design of public health policies.