To the editor:

Gene Expression Omnibus (GEO)1 is a public repository for gene expression data. While the amount of data in GEO has grown exponentially, the number of publications citing GEO has only grown linearly. The difficulty in data reuse is the mapping of probes in GEO datasets to established gene identifiers, which can change as annotations for the underlying sequences change2. Therefore, microarray results need to be reevaluated with the latest probe annotations. There have been several previous efforts to reannotate microarray probe identifiers3,4, but only for a few platforms and species.

We built a fully automated system, Array Information Library Universal Navigator (AILUN), to reannotate all types of microarrays in GEO periodically by relating every probe identifier to Entrez Gene identifiers. First, we collected all gene identifiers from Entrez Gene and UniGene and built a universal gene identifier table (UGIT). We then matched each column of every GEO platform with UGIT to find the best matching column and type of external identifier, and annotated each probe identifier with Entrez Gene identifiers. (Supplementary Methods and Supplementary Fig. 1 online).

UGIT contained 75 million gene identifiers of 90 types for 3,585 species. AILUN successfully reannotated 66% gene expression platforms, allowing reuse of 77% of samples across 79 species. The platform annotation coverage was 5 times greater than that in GEO (Table 1), and 94% identical for probes annotated by both AILUN and GEO. To validate the accuracy of annotation, we compared the annotations on Affymetrix U133A 2.0 across AILUN, GEO and NetAffx5 using Brainarray3 as the gold standard, which is based on probe-sequence matching. AILUN performed as well as NetAffx with 97% precision and 97% recall, and outperformed GEO with 98% precision and 86% recall (Supplementary Tables 1,2,3 and Supplementary Discussion online).

Table 1 Performance comparison

The server (http://ailun.stanford.edu) offers four functions to help users reannotate platforms. 'Platform annotation' adds the latest annotations to any uploaded result file. 'Cross-species mapping' maps platform annotations to other species. 'Platform comparison' compares any two platforms to find corresponding probes mapping to the same gene. 'Gene search' finds deposited platforms and samples in GEO for any list of genes.

Note: Supplementary information is available on the Nature Methods website.