This multicentre study used for the first time multilevel images to report the intra- and inter-observer variability in the embryo evaluation. The use of these multilevel images allows embryologists to assess the embryo quality similarly as an exploration by using an inverted microscope.
The results showed a good to excellent intra-observer agreement for the evaluation of the position of the pronuclei on day 1, the number of blastomeres on day 2 and day 3 and the clinical decision. These results confirmed the results of our monocentre study [
6] and the results found by Arce et al. in a multicentre trial using 2D images [
9]. In contrast to our current observations, these two studies [
6,
9] reported also a good to excellent agreement for other characteristics (degree of fragmentation and size of blastomeres on day 2 and/or day 3). This can be due to differences in study design (monocenter study [
6] and 2D images [
9]).
Good to excellent inter-observer agreement was found for the evaluation of the position of the pronuclei on day 1, the number of blastomeres on day 2 and day 3 and the decision on final destiny of each embryo. This confirms the results reported in our monocenter study [
6] and those published by Arce et al. [
9]. In contrast, other investigators (Bendus et al. [
5]; Castilla et al. [
8]) reported a moderate to excellent agreement for the embryo grading on day 3. However, only supernumerary embryos were used by Bendus et al [
5]. In addition, different scoring systems were used by the centers included in these studies [
5,
8]. Moreover, agreement on a embryo score (optimal, moderate and poor, based on the combination of different individual characteristics) was measured whereas in our study individual embryo characteristics were evaluated. In our opinion, the use of supernumerary embryos [
5] or selecting embryos for the determination of intra- and inter-observer variability based on the embryo score [
8] is not fully representative for the routine embryo population. Therefore, in our study, embryos from routine practice were evaluated to have a representative dataset of the daily practice. Regarding the decision making process, a good agreement was found in our study. However, other investigators (de Assin et al. [
7]; Castilla et al. [
8]), reported moderate agreement in the clinical decision on final destiny of each embryo. This can be due to differences in the study design. In our study, embryologists were asked to decide for each embryo if the embryo would be transferred, cryopreserved or discarded. In the studies of de Assin et al [
7] and Castilla et al. [
8] two embryos, from a batch of embryos per patient, needed to be selected for transfer.
A moderate to poor inter-observer agreement was reported for the evaluation of the size of the pronuclei, the degree of fragmentation on day 2 and day 3 and for the evaluation of the symmetry of blastomeres on day 2 and day 3, which is in line with the results of our monocentre study [
6] and the studies of Arce et al. and Bendus et al. [
5,
9].