Background
Angelman syndrome (AS) is a neurodevelopmental disorder first described in 1965 by Harry Angelman, with a birth incidence of approximately 1:20,000 [
1]. AS is caused by the functional loss of the maternal allele encoding an E3 ubiquitin-protein ligase (UBE3A) [
2]. Loss of functional UBE3A results in the core phenotypes of severe intellectual disability, motor coordination deficits, absence of speech, and abnormal EEG, as well as in high comorbidity of sleep abnormalities, epilepsy, and phenotypes related to autism spectrum [
3].
Currently, only symptomatic treatments are available for AS, primarily aimed at reducing seizures and improving sleep [
4]. The development of targeted treatments for AS heavily relies on the ability to test the efficacy of treatments in mouse models of the disorder. The success of such translational studies depends on three critical factors [
5]: (1) high construct validity, (2) high face validity, and (3) robustness of the behavioral phenotypes. First, the construct validity (shared underlying etiology between mouse models and patients) of the AS mouse model is very good, since AS mouse models recapitulate the patient genetics by carrying a mutated
Ube3a gene specifically at the maternal allele. However, it should be noted that the majority of the AS patients carry a large deletion (15q11-15q13) which encompasses also other genes besides the
UBE3A gene, and which may contribute to a more severe phenotype [
6]. Second, with respect to face validity (i.e., similarity of phenotypes between patient and the mouse model), the AS mouse model captures many neurological key features of the disorder really well (e.g., epilepsy, motor deficits, abnormal EEG), as well as some of the behavioral abnormalities (abnormal sleep patterns, increased anxiety, repetitive behavior) [
7‐
12]. Robustness of the behavioral phenotypes is the third important aspect to identify novel treatments, as it allows experiments to be sufficiently powered to detect the effect of the treatment, and meanwhile minimizes a type I error in which a drug is declared effective whereas it is not. Robustness, as well as face validity, also takes into account the sensitivity to genetic background and the extent in which a phenotype is also observed in independently derived mouse models. Notably, almost all behavioral testing described in literature has been performed using the original
Ube3atm1Alb mouse strain generated in the Beaudet lab [
7‐
9]; hence, it is unknown to what extent the reported phenotypes are actually specific to this mouse line.
We previously developed a series of behavioral paradigms in the domains of motor performance, anxiety, repetitive behavior, and seizure susceptibility, for testing the effect of
Ube3a gene reinstatement in the inducible
Ube3amSTOP/p+ (
Ube3atm1Yelg) mice [
13]. Here, we used these paradigms in a highly standardized way, to assess phenotypes in the independently derived
Ube3atm1Alb and
Ube3amE113X/p+ (
Ube3atm2Yelg) maternal knockout strains. We combined data of eight independent experiments across five experimenters involving 111
Ube3atm1Alb and 120 wild-type littermate control mice. Using a meta-analysis, we determined the statistical power of the different behavioral tests and the effect of putative confounding factors, such as the effect of sex differences. We further assessed the robustness of these phenotypes by comparing
Ube3a mutants in different genetic backgrounds. Finally, we employed this behavioral test battery to reassess the efficacy of minocycline and levodopa in the AS mouse model. Minocycline is a matrix metalloproteinase-9 inhibitor (MMP9), a tetracycline derivative which possesses antibiotic as well as neuroprotective activity [
14,
15]. Its antibiotic properties against both gram-positive and gram-negative bacteria are related to its ability to bind to the bacterial 30S ribosomal subunit, thereby inhibiting protein synthesis [
14].
Levodopa is the precursor of dopamine and was shown to be effective in treating Parkinsonism in two adults with Angelman syndrome [
16]. Moreover, it is able to reduce CAMK2 phosphorylation [
17], which was shown to be increased in a mouse model for Angelman syndrome [
18,
19]. Minocycline and levodopa were previously tested in the AS mouse model and based on the favorable outcome of these preclinical experiments, three clinical trials were performed [
20‐
22]. Unfortunately, none of these drugs showed a significant improvement in AS patients.
Discussion
Robust behavioral phenotypes with high construct and face validity in mouse models of disease are critical for the identification of novel treatments and the successful translation of these therapies to clinical trials. These preclinical studies may give us important information about the therapeutic dose, optimal age of treatment, and the best outcome measures to be used in a clinical trial. Given the high failure rate of clinical trials aimed at improving cognitive function [
44], it is absolutely critical that the preclinical data is robust (reproducible results across different mutant lines and different experimenters) and that the animal studies have high construct and face validity.
In this study, we investigated the robustness of a number of behavioral phenotypes, which we previously described using the inducible
Ube3amSTOP/p+ (
Ube3atm1Yelg) mice [
13]. These phenotypes were assessed in two independently derived
Ube3a lines: in the commonly used
Ube3atm1Alb line [
7] and the recently generated
Ube3amE113X/p+ (
Ube3atm2Yelg) line [
13]. Recently, we have tested two additional novel
Ube3a lines in this test battery with the same results; the
Ube3atm1.1Bdph line (MGI:5882092) and a novel (unpublished)
Ube3a line (
Ube3aem1Yelg). Thus, taken together, a total of five independently derived
Ube3a lines show phenotypes on all the behavioral tests of the test battery described in this study. In all cases, we used heterozygous
Ube3a mice in which the mutation was located on the maternally inherited
Ube3a allele. Therefore, we conclude that construct validity is very high. However, since the majority of individuals with AS carries a large chromosomal deletion of the AS critical region (15q11-q13) which encompasses also other genes besides
Ube3a and which may contribute to a more severe phenotype [
6], it would be of interest to test a mouse model of AS with large maternal deletion [
11] in our behavioral test battery.
In terms of face validity, we used behavioral paradigms that assess domains of motor performance, anxiety, repetitive behavior, and seizure susceptibility, which are all relevant clinical phenotypes of AS. Nevertheless, the clinical translational value of some of our tests (e.g., open field, marble burying, nest building, and forced swim tests) may be limited. Although it is notable that many of our tests involve a strong motor component, we think that it is unlikely that the phenotypes observed in the open field, marble burying, nest building, and forced swim tests are solely related to deficits in the domain of motor functioning. Most notably, we have shown that the critical period for rescuing these phenotypes is distinctly different compared to rescuing the rotarod deficit [
13] (and unpublished data). For instance, we found that gene reactivation in 3-week-old mice fully rescues the rotarod phenotype, but none of the other phenotypes [
13]. It is further noticeable that both WT and mutant mice behave significantly different when tested for a second time in the open field and marble burying tests, whereas no significant changes were observed in rotarod performance. This further indicates that the deficits in the open field and marble burying tests are indicative of deficits in other domains than motor performance.
An important clinical feature of AS that is lacking in our behavioral test battery is a paradigm that assesses cognitive function. Despite profound cognitive impairments in individuals with AS, learning deficits in the AS mouse model are rather mild. We and others have reported learning deficits in AS mice by using the Morris water maze [
8,
18,
45]. However, this paradigm is very labor intensive and hence less suitable for drug testing. Moreover, we found that a large number of mice are needed to detect significant differences and results varied strongly among experimenters (data not shown). A good learning paradigm that is highly suitable for drug testing is fear conditioning, in which animals are subjected to a single training session in which they are trained to associate a context (training chamber) or cue (tone) with a foot shock. However, we have not been able to get consistent results across experiments and experimenters (data not shown), and varying results are published in literature, with some studies showing a specific deficit in context conditioning [
7,
46] and others a specific deficit in cued conditioning [
8] or both [
47‐
49]. Notably, the two studies that investigated the behavioral deficits of
Ube3a mice across strains in great detail showed no context conditioning deficit in
Ube3a mice in the F1 hybrid 129-C57BL/6J background and C57BL/6J background, and either normal [
9] or impaired [
8] cued fear conditioning in
Ube3a mice in the C57BL/6J background. Collectively, these studies indicate that this phenotype is rather weak, and hence results, obtained with these tests should be interpreted with care.
By combining the data of eight independent experiments performed by five different experimenters, we were able to perform a meta-analysis of 111 Ube3am−/p+ (Ube3atm1Alb) and 120 WT littermate mice in the F1 hybrid 129S2-C57BL/6J background and determine the robustness of the phenotypes. In all eight experiments, we replicated Ube3a phenotypes observed on the rotarod test, open field test, marble burying test, nest building test, and the forced swim test. Deficits of Ube3a mice in rotarod performance, open field behavior, and marble burying have been reported by many other investigators, and hence, our results confirm the robustness of these tests. Impaired nest building behavior and impaired performance in the forced swim test of Ube3a mice have not yet been reported by other laboratories, but our study shows that these deficits are also very robust. In fact, a power analysis showed that these tests are among the most robust tests of the behavioral test battery. The open field paradigm was found to have the weakest power.
Our meta-analysis further shows that there is no major effect of sex on the behavioral phenotypes, which is in line with the general notion that such differences are also not present in AS patients. We did however find that female wild-type and mutant mice outperformed male wild-type and mutant mice on the rotarod. Improved performance of female mice on the rotarod has also been reported previously [
50] and emphasizes the need of using well-matched groups when groups of both sexes of
Ube3a mice are tested on the rotarod. Given that male mice are heavier than female mice, we investigated if the impaired performance of
Ube3a mice on the rotarod can be attributed to the increased weight of these mutants. However, we found no correlation between weight of the animal and performance on the rotarod. This observation is in line with other studies [
50‐
52] and indicates that the reduced performance of
Ube3a mice on the rotarod represents a bona fide impairment in motor performance.
Besides the reproducibility of the observed phenotypes and the high face and construct validity, there are two additional features that make the behavioral test battery for Ube3a mice highly useful for drug testing. We show that with the exception of the epilepsy test, all behavioral experiments can be performed with a single cohort of mice, which greatly reduces costs as well as the number of mice needed. In addition, we found that with the exception of the marble burying task, the behavioral test battery can be performed twice with the same cohort while maintaining a phenotype. This makes it possible to test the efficacy of a drug using a within-subject design.
We confirmed previous studies that the audiogenic seizure phenotype is a very powerful test to investigate seizure susceptibility in
Ube3a mice [
7,
13,
18]. With this study, this phenotype is now also confirmed in three independently derived lines: the commonly used
Ube3atm1Alb line [
7], the
Ube3amSTOP/p+ (
Ube3atm1Yelg) line [
13], and the recently generated
Ube3amE113X/p+ (
Ube3atm2Yelg) line [
23]. Since nearly all
Ube3a mice show this phenotype compared to less than 10% of wild-type animals, this test has very high power. Moreover, we showed that the phenotype is readily reversible with the anti-epileptic drug levetiracetam and that the test is highly suitable for dose finding. The only disadvantage of the audiogenic seizure test is that it cannot be performed on the same animals as used in the behavioral test battery, since the sensitivity to audiogenic seizures is exclusively observed in
Ube3a mice in the 129S2 genetic background.
We also observed an effect of genetic background on the tests of the behavioral test battery.
Ube3a mice in the C57BL/6J background showed a significant phenotype in the rotarod, nest building, and marble burying tests, but no effect of genotype was observed in the open field test. A significant effect of genotype was found in the forced swim test, but remarkably, this was in the opposite direction. In contrast,
Ube3a mice in the 129S2 genetic background showed only a significant deficit in the forced swim test (in the same direction as F1 hybrid mice) and no phenotype on any of the other tests of the behavioral battery. This confirms previous reports that many of the
Ube3a phenotypes are very sensitive to genetic background and not present in 129 lines [
8,
9]. There are however several common findings as well as a few discrepancies between these studies and our study. With respect to the rotarod [
8,
9] and marble burying phenotype [
9], our findings that only
Ube3a-C57BL/6J and
Ube3a-F1 hybrid mice show a phenotype are in full agreement with each other (Huang et al. only tested
Ube3a-C57BL/6J in the marble burying test). With respect to the open field test (distance traveled), the other two studies also found no phenotype in
Ube3a-129 mice, but in contrast to our findings, they both found a phenotype in
Ube3a-C57BL/6J mice. One major difference between their and our experimental design is the time the mice were placed in the open field. Indeed, when we left the
Ube3a-C57BL/6J mice for 30 min in the open field (instead of the 10 min we used), we found a nearly significant phenotype in
Ube3a-C57BL/6J mice (
p = 0.06; data not shown). With respect to percentage of time spent in the inner zone of the open field (which is another measure of anxiety), the other two studies showed no significant effect of genotype in any of the genetic backgrounds. Our meta-analysis did however reveal a significant difference between genotypes in F1 hybrid mice (WT 1.1% versus mutant 0.7% time in inner zone;
p < 0.01), which further indicates that
Ube3a-mutant mice are more anxious. However, we note that the observed difference was small and a significant effect was only observed in four out of the eight individual experiments. Hence, this measure is not very robust.
Taken all studies into consideration, it is clear that
Ube3a mice in the F1 hybrid 129S2-C57BL/6J background show the most robust phenotypes, with the notable exception of the audiogenic seizure susceptibility test, which is strictly seen in
Ube3a-129S2 mice. The question arises whether the observed differences between
Ube3a mice in different genetic backgrounds have any translational significance. The lack of phenotypes of
Ube3a-129S2 mice in most tests could simply reflect the passive/hypoactive phenotype of these mice, resulting in a floor effect. However, it could also be that the AS phenotype is sensitive to genetic background and that the changes that are observed between individuals with AS are in part caused by genetic modifiers, rather than the nature of the mutation. Detailed studies of individuals with recurrent or similar mutations could provide more insight in that question [
53].
To test the translational value of the behavioral test battery, we decided to re-evaluate the two drugs that previously were tested in clinical trials involving individuals with AS: minocycline (trial register NCT01531582 [
20] and NCT02056665 [
22]) and levodopa (trial register NCT01281475 [
21]). Both drugs were previously shown to rescue the rotarod impairment of
Ube3a mice (see NCT01531582 for minocycline, and [
21] for levodopa). In addition, minocycline rescued the hippocampal LTP deficit of
Ube3a mice [
20], whereas levodopa rescued the increased phosphorylation of CaMK2 observed in
Ube3a mice [
21]. We tested the effect of both drugs on all tests of our behavioral test battery, using the same drug administration protocols as used for the original studies. In addition, we also tested the effect of minocycline when administered from birth, as previously published for the Fragile X mouse model [
26]. However, in line with the clinical trials, we did not observe any efficacy of these drugs when tested on
Ube3a mice. Our finding that minocycline and levodopa are unable to improve performance on the rotarod is at odds with aforementioned previous preclinical studies. Failure of replication could be due to differences in strains or procedures, although there is full agreement between our labs with respect to performance of
Ube3a mice on the rotarod and the effects of different genetic backgrounds on this performance [
9]. We think it is more likely that the rotarod experiments used for the preclinical studies were underpowered, as our analysis showed that 14 mice per group are needed for a well-powered rotarod study using two groups. In the levodopa study, the authors used 6 different treatment groups and only 6 mice per group [
21]. Such small sample sizes make the test underpowered and also very vulnerable for the sex differences that we describe here. Since the details of the rotarod experiments of the minocycline treatment were not provided (NCT01531582), we cannot comment on these discrepancies.