Background
There has been a long debate on whether the blastocyst stage (Day 5) embryo transfer should be routinely recommended to improve the implantation rate. [
1‐
3] The introduction and improvements of advanced cell culture techniques have suggested blastocyst stage embryo transfer over cleavage stage (Day 3) embryo transfer for in vitro fertilization (IVF). Many published studies have suggested that Day 5 transfer provides higher implantation, pregnancy, and live birth rates [
4‐
6]. However, others have argued that patients undergoing blastocyst culture are expected to have fewer transferable embryos available, and fewer embryos cryopreserved [
7,
8]. Although several studies including randomized controlled trials (RCTs) have already been conducted, many of which were underpowered, leaving the margin of benefit between Day 3 and Day 5 transfers unclear.
Observational studies without randomization serve as an alternative study design to collect more participants to increase statistical power. However, the large sample size in observational studies pays the price of potentially introducing bias into the study. For example, Day 3 and Day 5 transfers were usually prescribed for patients with different clinical characteristics [
9]. Without proper randomization, the difference in live birth rate or other associated outcomes is not necessarily attributable to the day of transfer. The phenomenon represents a typical confounding by indication [
10]. The indication of receiving a certain treatment may affect the outcome of interest and thus exerts an undue influence on the association of the treatment and the outcome. Therefore, a powerful causal inference approach is imperative to determine the effectiveness of cleavage stage embryo transfer versus blastocyst stage embryo transfer.
While RCTs are considered as the gold standard approach for estimating the causal effects of treatments or exposures on outcomes, they are largely limited by ethical considerations and other practical limitations. More importantly, the practical constraints of RCTs make it difficult to distinguish between a null effect and a promising finding limited by the statistical power due to its small sample size. To integrate the advantages of RCT and observational study, here we adopt a propensity score matching approach. “
Propensity score matching” section enables us to artificially construct an RCT from an observational study under certain assumptions [
11]. The notion of propensity score was first introduced to be the probability of receiving a particular treatment conditional on observed baseline variables [
12]. For the past few decades, different propensity score methods have been developed to remove the confounding effects under the potential outcome framework [
13‐
15]. “
Propensity score matching” section entails forming matched pairs of treated and untreated subjects that share similar value of propensity scores. Once the matched sample has been formed, the causal effect of the treatment on an outcome can be easily accessed by standard statistical procedures. To answer our scientific question by implementing this particular causal method, we had dichotomous embryo transfer day (Day 3 vs. Day 5) as treatment, implantation rate, clinical pregnancy rate, and odds of live birth as outcomes.
Our main purpose of this study was therefore to identify the causal effect of embryo transfer day on multiple major clinical outcomes in IVF. Additionally, we examine whether the causal effect of embryo transfer day on these outcomes can be modified by age and anti-Mullerian hormone (AMH) levels.
Discussion
The principal finding of this study was that, comparing the women who shared similar baseline characteristics, there was evidence of a small but significant difference in implantation rate favoring Day 5 embryo transfer and no difference in the probability of live birth and clinical pregnancy.
The outcome of implantation, which is defined as the number of gestational sacs per embryo transferred, is often considered as the earliest primary endpoint in the process of IVF, and may affect the success of delivery. Our result showed that the blastocyst stage group had a higher rate of implantation, a relative risk of 1.14, than the group of cleavage stage transfer. This finding is in support of the previous studies that reported Day 5 transfers have purported advantages based on its morphological features [
20,
21]. As the consequence of self selection, only the most viable embryos are expected to develop into the 64-cell blastocysts in the in vitro environment, eliminating those who had chromosomal abnormalities at an early age [
22,
23]. Meanwhile, studies have also shown that the predictive morphological criteria of the embryos expected to be transferred on Day 3 are limited [
24‐
26]. In addition, it is recognized that premature exposure of early stage embryos to the uterine environment may induce homeostatic stress, reducing the potential of successful implantation [
27]. This finding is also consistent with numerous trials reporting higher implantation rates for blastocyst transfer [
28‐
30].
Although blastocyst transfer is an effective procedure of increasing implantation rate, our result revealed that the transfer of embryos on Day 5 rather than on Day 3 did not change the overall probabilities of pregnancy and live birth. One of the potential cause for this phenomenon may be the existence of the post-treatment confounding similar to that of the lack of blinding effect in an RCT [
31]. Blinding refers to maintaining unawareness of the assigned interventions among trial participants, healthcare providers, and assessors. However, even the goal of our method is to mimic an RCT in the context of an observational study, none of the people mentioned above in our study were blinded, which may leave room for potential bias due to residual confounding. For example, the clinicians may reflect their attitudes, paying extra attention or making differential decisions on the management, based on the allocation. Likewise, the patients may be more likely to comply with the clinical care after knowing they are about to or already had been assigned to Day 3 embryo transfer. Our propensity score was constructed based on the variables before implantation. However, numerous post-implantation factors may still affect the live birth rate. Even though blinding may reduced bias, the length of culture and the different procedures required based on the day of embryo transfer make it impossible to blind which group the patients were in for the technicians and clinicians.
Recent paper has revealed that serum AMH level is positively correlated with ovarian responsiveness, embryo developmental competence, and cumulative live birth rate [
32]. Therefore, we further explored the effect of embryo transfer day on live birth and other associated outcomes by categorizing patients’ age and AMH. Interestingly, we were able to identify a trend that is consistent with our main finding: an increase in
p values of embryo transfer day on IVF outcomes as the gestational age increases from 14 days (implantation rate) to week 4 (clinical pregnancy). For instance, the group where patients’ age were under 35, the
p values of embryo transfer day increased from 0.23 to 0.48 as the outcome progressed from implantation rate to clinical pregnancy rate; as well as the group of patients aged between 37 and 38, the
p values increased from 0.19 to 0.33. Although this trend was preserved in both our main result and stratified results, there was no evidence of significant difference in the later.
A recently published systematic review has performed meta-analysis on 27 RCTs to determine whether blastocyst transfers improve live birth and other associated IVF outcomes compared with cleavage stage transfers [
33]. The study has revealed very similar results on clinical pregnancy rates with ours, both did not find difference between Day 3 transfer and Day 5 transfer. Yet, the review did report an evidence of a significant difference in live birth rate between the two groups, favoring Day 5 transfer. It is important to note that, although there is discrepancy between the two results on live birth, the intervention between the two studies are not the same. The previous study compared Day 2 and Day 3 transfers with Day 5 and Day 6 transfers, whereas our study only compared Day 3 transfers with Day 5 transfers.
Our results showing no evidence of increasing odds of live birth in Day 5 embryo transfer is consistent with a recent cohort conducted in the UK [
34]. Although our studies share similar study design, the patient population between the two studies are different. A growing body of existing literature have indicated the racial and ethnic disparities that may appear in assisted reproductive technology outcomes. Current evidence have demonstrated the predisposition of Asian women having worse IVF outcomes, which may be resulting from fundamental biological or genetic differences [
35,
36]. For example, studies have shown the distribution of FSH receptor allelic variants varies among Asians and Caucasians [
37]. Alternatively, behavioral and environmental differences may have also caused the decreased pregnancy rate in Asian women [
38,
39]. To our knowledge, our study is the biggest cohort in Asian population in assessing the differences between blastocyst stage transfer and cleavage stage transfer.
The goal of propensity score matching is to mimic an RCT from an observational study. The validity of this approach depends on a few assumptions. First, the propensity of receiving Day 5 transfer was properly modeled. The tendency of physician’s preference in prescribing a certain transfer protocol is unknown. On the other hand, to reduce the potential multi-dimensional confounding effect into a one-dimensional propensity score (PS) itself is a strong assumption. The assumption states that the unknown tendency is fully captured by the logistic model (model (1) in Supplement). It requires that all relevant variables are included and the feature of the variables (linear, quadratic, categorical, cross-product between variables, etc.) is correctly specified. However, limitations may arise in practice when we excluded certain highly correlated variables, such as endometriosis and ovarian cancer, in order to adjust the confounding effects and produce a homogeneous dataset. This may threaten the statistical generalizability of this study. Second, the tolerance of imperfect matching did not result in residual confounding. Numerically, we may not find a perfect match with an identical PS and systematic imperfect matches may lead to confounding. We address this issue by additionally adjust for propensity score in the “
Post-matching analyses” section.
As an RCT require a carefully selected participants to ensure internal validity, its generalizability to population that does not satisfy its inclusion criteria is of concern. A similar issue applies in the PS matching approach. As shown in Fig.
2., the propensity scores between Day 3 and 5 transfers were quite different before matching, indicating that there are patients that were very likely to receive Day 3 transfer and unlikely to receive Day 5 transfer and vice versa. After matching, we only preserve those whose tendency of receiving either transfer is not that high. This phenomenon reflects the clinical equipoise, a common basis for clinical trials. Therefore, focusing on the patients who are equivocal for their tendency of receiving either transfer is a consequence of mimicking an RCT. However, what inherits from the equipoise principle is its difficulty in generalize the result to the patients not included in our matched data. For example, the null effect of Day 3 and Day 5 transfers may not be applicable to patients who have strong clinical indications of receiving Day 3 or Day 5 transfer. We stress that our matched analyses results need to be cautiously interpreted.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.