In my work as a scientist, one of the possible tasks I can take upon
myself is to be a reviewer. This is important, but almost completely
non-valued; there is no pay, and no funding agency cares very much whether you do it or not. However, the review process is the most important place where scientific
discussions are taking place, because currently – and unfortunately –
this is virtually the only criteria for what constitutes a scientific finding
today: has it been published in a peer-reviewed journal? I therefore do
spend some time doing this. And I fight a lot of the time with papers
and authors who want to publish mathematical modelling works without any
comparison with experimental data. I strongly believe that such works
are not science, and that they should not be published. Today, I just
submitted such a response, and since the reply is written in a
completely non-specific manner to the paper in question – it could have
been written to any paper with the same problem – I also post it here.
My principle for the the next decade of my life, which I just entered,
is “going public, going deep”, and this publishing of this here on the blog, is a
part of me following that new principle.
Here is the review reply that I wrote:
“Thank you for your comments.
I do recognize the fact that you and others have published similar
papers in the past, where models have been developed and presented with
no comparison with data. There is nothing I can do about that. However,
that fact does not transform such works into science. Modern science is,
in my very firm opinion, the truth-seeking tool that was established by
Galileo, many hundreds of years ago: it builds on i) the mathematical
formalization of mechanistic hypotheses of the system that you study,
and then ii) usage of *data* to distinguish between those hypotheses.
The hypothesis that has the best ability at describing data – in the
first round estimation data, and in the second round independent
validation data, based on predictions and *then* experiments – is the
superior hypothesis. It is this formula for truth-seeking that
distinguished the science that started with Galileo, and the
church-driven epistemology that ruled science before him (note that the
prior Aristotelian science worldview also involved mathematics, and
data, but not in the same hypothesis-testing manner). If Galileo’s
formula is not followed, it is not science. That a paper has been
published in a scientific journal does not make that work science. It
does mean, however, that that work *should not* have been published. The
only exception to that principle exists within the field of
mathematics, which has other criteria for its judgements of a paper:
e.g. that what is presented should be i) previously non-proven and ii)
should hold true for a large family of equations/examples. Another type
of paper in mathematics can be that of a new method, e.g. for
optimization, that is proven to be superior to existing methods.
Unfortunately, this conception of what science is was lost in the field
of modelling of biological systems during a large part of the 20th
century. During this time, it was called mathematical biology,
complexity theory, etc. This was, to a large extent, rectified, during
the beginning of the 21st century, with the conception of systems
biology. However, unfortunately, much old-school data-free modelling is
still done. This has to stop! It is giving, and has been giving, the
field of modelling in biology a bad reputation, with the impression that
it has nothing to do with reality or biology – and rightly so, such
modelling has nothing to do with reality! At least not in any way that
has been demonstrated by science.
Two further clarifications and responses to your reply are in order:
i) You say that your model is based on data. That is true. The model structure is based on data. Your manuscript does
therefore function as a review of existing biology. But that is
something different than publishing an original research paper, with
novel results. That is something that is fundamentally different than
the kind of comparison between simulations of the *entire model
structure* and data that I am referring to above. It is a bit like
saying that the Ptolemaic worldview (with the sun in the middle) is
based on data because it includes the sun, the planets, and the earth;
which are observed in experiments. The question is not if they are
present. The question is which way of connecting them in relationship to
each other that is the correct one. To go beyond what can be said with
biology alone – i.e. to do mathematical modelling – requires that one
puts the structure together using competing hypotheses (e.g. one with
the sun in the middle, and one with the earth in the middle), and then
sees which of the two corresponding models that produces simulations
that best agrees with data (existing data and future data). That is how
science has functioned since Galileo, and that is how it should still
function today.
ii) You say that a model component
in your model – that has not been validated in any fashion whatsoever –
produces a prediction that a specific component is important; you then
also point to some papers that claim the same thing. That could, on
the surface, seem like a comparison with data. However, with the model
structure that you have put together – with the most well-known and most
often considered main players in the beta cell ethiology – you could
identify any component in your model as the most important one, and then
find many papers that claim that that component is the most important
one. That is, unfortunately, how biology is allowed to work today, with
many co-existing hypotheses, that are allowed to continue to co-exist,
where each lab focusing on one of the components is allowed to point to
limited results as to why their particular component is the most
important one, and without forcing anyone to challenge these claims with
respect to each other; without finding out what the big picture looks
like. That is where systems biology can and should come in and make a
difference: by putting up alternative hypotheses regarding what is the
most important component(s), and then letting data, systems-level data,
judge which of the hypotheses that is the most compelling one. This is
how systems biology has worked in many/most of the papers that are cited
in the review paper that I gave you. That way requires a model that
produces simulations (time-curves, typically), that agrees with
estimation data, and with validation data.
In summary, for me to
judge this or any paper as publishable, you need to produce (at least)
these two things: i) at least one curve, e.g. a simulation of a variable
as it progresses in time, that agrees with corresponding data; ii) a
prediction that is validated by another dataset, not used for estimating
the parameters in the model. In fact, apart from that you should also
demonstrate that your model is superior to other models, i.e. that it
can describe all data that one of the currently most important and
realistic models can, and then more data apart from that.
In
other words, there are many papers that are published for beta-cells,
including for their ethiology. These models can describe a lot of data,
in the above manner. Why not take one of those models, find a feature or
dataset that they cannot explain (there are many), and then go ahead
and improve the model to make it able to explain those data (while still
retaining the ability to explain all old data). If you then also show
that this is not due to overfitting w.r.t. all data, i.e. if you then
show that your new model also can describe some validation data, not
used for model fitting, then you will have contributed with an
improvement that follows the tradition of science. Then, and only then, I
will judge your – or any scientist’s – paper as publishable.