nach oben

Erschienen in:

Open Access 01.09.2008

The NIF LinkOut Broker: A Web Resource to Facilitate Federated Data Integration using NCBI Identifiers

verfasst von: Luis Marenco, Giorgio A. Ascoli, Maryann E. Martone, Gordon M. Shepherd, Perry L. Miller

Erschienen in: Neuroinformatics | Ausgabe 3/2008

Abstract

This paper describes the NIF LinkOut Broker (NLB) that has been built as part of the Neuroscience Information Framework (NIF) project. The NLB is designed to coordinate the assembly of links to neuroscience information items (e.g., experimental data, knowledge bases, and software tools) that are (1) accessible via the Web, and (2) related to entries in the National Center for Biotechnology Information’s (NCBI’s) Entrez system. The NLB collects these links from each resource and passes them to the NCBI which incorporates them into its Entrez LinkOut service. In this way, an Entrez user looking at a specific Entrez entry can LinkOut directly to related neuroscience information. The information stored in the NLB can also be utilized in other ways. A second approach, which is operational on a pilot basis, is for the NLB Web server to create dynamically its own Web page of LinkOut links for each NCBI identifier in the NLB database. This approach can allow other resources (in addition to the NCBI Entrez) to LinkOut to related neuroscience information. The paper describes the current NLB system and discusses certain design issues that arose during its implementation.

Introduction

This paper describes a LinkOut broker that has been built as part of the multi-institutional Neuroscience Information Framework (NIF) project (Gardner et al. 2008), an NIH Neuroscience Blueprint initiative. The goal of the NIF LinkOut Broker (NLB) is to coordinate the assembly of neuroscience information accessible via the Web (e.g., experimental data, knowledge bases, and software tools) that can be linked to entries in the National Center for Biotechnology Information’s (NCBI’s) Entrez system. Each link is associated with an NCBI identifier (ID) which uniquely identifies the Entrez object (e.g., publication, gene, protein) to which it relates.

The NCBI’s collection of health sciences databases in Entrez represents a great resource for the biosciences. Their LinkOut service (Schott 2004; NCBI 2007) allows a user to link directly from NCBI entries to information outside of Entrez. For this information to be accessible from Entrez, each external resource must provide the NCBI with a list of links to items within that resource that relate to entries (e.g., publications, genes, etc.) in Entrez. When a user browsing Entrez finds an entry of interest (e.g., a PubMed publication) for which external information is available, the Entrez LinkOut service presents links that allow the user to access that information directly. Externally linked information may include full text articles, datasets, high quality images, tools, etc.

While working on the problem of data interoperation within the neurosciences, members of the Interoperability Subcommittee of the Society for Neuroscience Neuroinformatics Committee foresaw the value of leveraging this type of LinkOut service using a brokered approach. This approach has been implemented in the NLB. In the brokered approach, individual neuroscience resources do not need to interact directly with the NCBI’s LinkOut personnel. Rather, the NLB system collects links from a set of neuroscience resources, consolidates those links, and submits them to Entrez. This approach provides the following advantages.

Coordination by personnel familiar with the field In the absence of the broker, the NCBI must verify the authenticity and value added by a new resource that wishes to provide external links. NIF project members are familiar with a range of neuroscience resources, and can therefore facilitate the process of identifying useful resources to include, as well as the integration of the information in those resources into the LinkOut approach.
Better organized maintenance of the data relationships Having a single curated database of links for neuroscience data facilitates the collection and maintenance of this information and its inter-relationships. It facilitates both the process of passing links to the NCBI and the task of maintaining all the links in a organized fashion.

The NLB collects links to information in various neuroscience resources as described above, and stores those links in a database. Each link stored in the NLB database is associated with an NCBI ID, indicating the specific Entrez entry to which it relates. This NLB database can be utilized in several different ways.

One approach is to simply forward information from the NLB database to the NCBI, so that it can be incorporated into the Entrez LinkOut service, thereby allowing Entrez users to LinkOut directly to each item. This approach is currently operational.

A second approach, which is currently operational on a pilot basis, is for the NLB Web server, upon request, to create dynamically its own Web page of LinkOut links for each NCBI ID in its database. Linking out from the NCBI would involve first linking to a Web page of neuroscience links constructed by the NLB. From there the user could link to each resource as desired. This approach involves one extra level of linking compared to the direct NCBI LinkOut described in (1) above. One potential advantage of this approach is that the NLB team would be responsible for organizing and maintaining the collection of neuroscience links, which is quite dynamic, reflecting the rapid growth and evolution of the field.

If the NCBI utilized this approach, for example, it would only need to maintain a list of NCBI IDs for which neuroscience links were stored in the NLB. It would not need to maintain a list of the links themselves.

Both of the above approaches could also be used to link from other resources (i.e., not just from the NCBI) to related neuroscience information. For example, two SenseLab (Shepherd et al. 1997; Miller et al. 2001) databases (NeuronDB and ModelDB) could be linked, in the context of data they display to their users, to related information in other neuroscience databases using the LinkOut approach. For this purpose, it would be particularly helpful for NLB to provide a dynamic page of links for each relevant NCBI ID, as outlined in option (2) above. Indeed, this general approach could help integrate a great deal of neuroscience data in a centrally organized fashion.

NLB System Overview

The NIF initiative as a whole is derived in part from an earlier project, sponsored by the Society for Neuroscience, to create a searchable database of neuroscience databases, the Neuroscience Database Gateway (NDG; http://ndg.sfn.org). Using the NDG as a test bed, an initial version of the LinkOut Broker was developed. The current NLB builds directly on this previous work.

Figure 1 illustrates schematically the various ways in which the NLB can be used (these capabilities are described in more detail in the next section of the paper).

Data are collected from a set of federated resources and stored in the NLB repository using disco.ndg messages encoded in XML (as described below). The appropriate set of NIF databases to be included is discovered by querying the NIF Database Registry catalog using an specific BrainML (BrainML.org 2008) query (http://soma.med.yale.edu:8080/lb/nifcat.do). Once these resources have been identified, NLB uploads the LinkOut data stored at each site.

All data collected as described above is submitted to the Entrez LinkOut system on a regular basis via FTP, encoded in an XML format specified by Entrez. Entrez in turn uses this information to create Web pages with hyperlinks to the resources containing related data for each relevant Entrez entry.

The NLB information can also be accessed from other Web resources via NLB Gateway pages, which are created dynamically for each NCBI ID as described below.

In addition, a pilot search interface allows any user to access the NLB server directly to search for links on topics of interest.

The underlying architecture of the NLB system includes the following components.

The core system contains a Web server application, a database, and Web services. The server contains (a) import modules to retrieve data from NIF resources, (b) an export module to send data to NCBI, (c) a NLB gateway interface, (d) a NLB search interface, and (e) an administrative interface. Web server code and Web services are implemented in the Java language and run on a Tomcat Web server. The system uses a MySQL database as a backend.
External servers include the NIF Registry (used to discover the NIF resources supporting LinkOut) and the Entrez LinkOut servers (FTP machines used to upload the NLB data).
A set communication protocols (described later) link these components operationally.

The current NLB is available at http://soma.med.yale.edu:8080/lb.

Using the NLB

This section describes operational aspects of the NLB by explaining how three different types of users interact with the system: regular (information seeking) users, resource developers, and the NLB administrator

Information Seeking Users

Regular (information seeking) users may utilize the NLB in three ways.

One method involves using the brokered data sent via the NLB to NCBI Entrez. For example (see Fig. 2), a user interested in the data related to a PubMed publication first locates that article within Entrez (e.g. “Dichotomy of action-potential backpropagation in CA1 pyramidal neuron dendrites,” PubMed ID 11731556). From the main Entrez page of that article, the user follows the LinkOut hyperlink to an Entrez page that displays external LinkOut resources for that article. Among these hyperlinks we find a group of links grouped under the title “Neuroscience Information Framework.” These include a link to a computational model in ModelDB, a link to neuronal property data in NeuronDB, and several links to neuronal reconstructions in neuromorpho.org.

The second method involves connecting to a Web page at the NLB gateway from a neuroscience resource that has implemented the ability to link to the NLB. Each NCBI ID links to a different, dynamically generated page. This approach is illustrated in Fig. 3 and is explained in detail in the next section.

The third method involves using the NLB’s search interface to find NLB links related to a specified topic. For example, Fig. 4 shows the results of a search for “low-threshold calcium currents” that returns one link to neuronal property data in NeuronDB, one link to a computational model in ModelDB, and several links to neuronal reconstructions in neuromorpho.org.

Resource Developers

Resource developers (the personnel in charge of maintaining each resource) may incorporate NLB’s functionality into their applications by implementing a simple protocol that we have named “ndg.disco”. This simple format provides explicit information about the data links in their Web resources. The current format specification is defined at http://ndg.sfn.org/interop/protocols/disco/versions/v2/disco.xsd. Figure 5 shows an example of what an ndg.disco file looks like, and also shows how that information is incorporated into a resource to allow use of NLB. The main ndg.disco file is called “disco.xml” and is stored on the root directory of the resource.

Developers of resources who are interested in sharing their Entrez-related data via the NLB also need to implement a Web feed (implemented as a file or Web script) for their LinkOut data. Some examples of LinkOut Web feeds are http://ccdb.ucsd.edu/disco_entrez_objects.xml (file) and http://senselab.med.yale.edu/NeuronDB/disco_entrez_objects.asp (Web script).

The ndg.disco protocols represent an example of automated resource registration and interoperation protocols which are described in more detail in a companion paper (Gupta et al. 2008) in this issue.

The amount of effort needed to implement LinkOut interoperability depends on the complexity of extracting the LinkOut data from the resource and converting it to the ndg.disco XML format. Resource developers interested in allowing Entrez users to link to their resources using NLB should contact NIF staff for guidance.

Another way that developers connect to NLB is using the NLB gateway. This procedure requires that a neuroscience resource has implemented the ability to link to the NLB. Figure 3 illustrates this capability for a PubMed ID. To display this page, the gateway uses the following Web link format: http://soma.med.yale.edu:8080/lb/gateway.do?id=PubMed|11731556|

Another example of using the gateway can be seen using the AF1209005 ID for the “Mus musculus VNO olfactory cluster” in the following link: (http://soma.med.yale.edu:8080/lb/gateway.do?id=Nucleotide||AF129005[pacc]). The resulting gateway page shows links to three olfactory receptors in the Olfactory Receptor Database.

Linking to the gateway interface in this fashion is not restricted to users of NIF resources. Any application developer can incorporate linkage to NLB neuroscience information by creating a URL composed using the template as described below.

http://soma.med.yale.edu:8080/lb/gateway.do?id={entrez_db}|{entrez_object_id}|{entrez_query}

Replace {entrez_db} with any of the commonly known Entrez database names: (e.g.: “PubMed”, “Nucleotide”, etc.). The list is available on the Entrez site.
Replace either the {entrez_id} with the object_id given in the referred Entrez database, or the {entrez_query} with an Entrez query string. If both {entrez_id} and {entrez_query} are present, {entrez_query} will be ignored by the system. (Details as to how to construct an entrez_query can be found at the Entrez site.)

The NLB Administrator

The NLB also provides a Web-based management console that is available in read-only mode to the public (see Fig. 6). An authenticated administrator (a user with an administrator account that allows read/write access) can use this interface to update and coordinate the NLB contents.

In the main administrative page, clicking the update button first queries the NIF Database Registry to locate neuroscience resources that contain NLB links. The NLB then uses this list to query independently each of these resources to extract their Entrez LinkOut information (stored locally). This information is then stored in the NLB database and can be viewed from the administrative interface. In addition, the interface will generate LinkOut data for each resource which can be uploaded via FTP to NCBI’s LinkOut service. This process is currently performed manually upon request of the database owner. In future NLB versions, the LinkOut import process could be performed automatically, e.g., every 24 h. The upload process to NCBI could also be performed automatically if data has changed from a previous uploaded version.

Current Status

At the time of this publication the NLB is providing roughly 22,000 neuroscience links to Entrez users. These are distributed in the following databases:

ORDB has 15,561 links ((Crasto et al. 2002); http://senselab.med.yale.edu/ORDB).
Neuromorpho.org has 4,697 links ((Ascoli et al. 2007); http://neuromorpho.org/neuroMorpho),
Brain Architecture Management System has 326 links ((Bota et al. 2005); http://brancusi.usc.edu/bkms),
Cell Centered Database (CCDB) has seven links ((Martone et al. 2008); http://ccdb.ucsd.edu/CCDBWebSite),
Internet Brain Volume Database has 352 links (http://www.cma.mgh.harvard.edu/ibvd),
ModelDB has 401 links ((Hines et al. 2004); http://senselab.med.yale.edu/ModelDB), and
NeuronDB has 490 links (http://senselab.med.yale.edu/NeuronDB)

Discussion

It has become a common practice to create links to other databases using their unique identifiers. It is also common to see in many bioscience databases an Entrez ID for publications, genes, proteins, etc. While links to PubMed are rather easy to maintain (since URL paths are relatively stable and forwarding scripts are created in PubMed to redirect old referencing URLs to new ones), links to objects in many other databases, particularly research databases, are considerably more likely to change over time.

Whenever a database changes its URL referencing scheme, all sites that have links to that database need to update their scripts. In addition, unless the resource creates a redirecting script, all the referencing sites will have their links to that database broken.

Using the Web-based NLB approach described in this paper, one can have a single referencing scheme. Any application need only include links to the NLB gateway, passing in just an NCBI ID. Only the NLB needs to keep track of any changes made by participating databases. In addition, the ndg.disco approach allows the local database developers to change their local ndg.disco file to reflect any changes made to their database. The NLB can import the new ndg.disco file, and ideally update its URL links automatically.

Another potential advantage of the NLB approach is that it could facilitate the use of back-up URLs. For example, if a resource’s Web site went down, one could readily instruct the NLB gateway to direct users to a back-up Web server for that resource.

It is also worth emphasizing that the NLB approach need not be limited to NCBI IDs. This approach has also recently been used to semantically annotate life science data using Life Science Identifiers (LSIDs) (Martin et al. 2005). (Indeed, NCBI IDs and LSIDs are examples of Globally Unique Identifiers (GUIDs), a mechanism used to uniquely identify digital information (Clark et al. 2004)).

In summary, the NLB approach can be applied flexibly in several ways. We believe that it provides an organized paradigm for linking neuroscience information in a fashion that could be of great service to the neuroscientist user. The approach was originally developed to allow LinkOut from Entrez to neuroscience databases, but the same technique could be used to allow LinkOut in other bioscience domains. Using a dynamic Web-based NLB to provide a set of neuroscience links for each relevant NCBI ID provides an efficient approach to helping interlink a community of interrelated neuroscience resources.

The LinkOut broker application is freely accessible to the public at http://soma.med.yale.edu:8080/lb. The source code is also freely available. Please contact the first author.

Acknowledgments

This project has been funded in whole or in part through the NIH Blueprint for Neuroscience Research with Federal funds from the National Institute on Drug Abuse, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN271200577531C. This research was also supported by * NIH grants P01 DC04732 and R01 DA021253, * Volunteer consultant-collaborators and friends, and * The Society for Neuroscience.

We would like to especially acknowledge the work done implementing the automated resource registration and interoperability protocols, including ndg.disco, by Mihail Bota at UCLA for the Brain Architecture Management System and by David Kennedy at the Massachusetts General Hospital for the Internet Brain Volume Database.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Unsere Produktempfehlungen

e.Med Interdisziplinär

Kombi-Abonnement

Jetzt e.Med zum Sonderpreis bestellen!

Für Ihren Erfolg in Klinik und Praxis - Die beste Hilfe in Ihrem Arbeitsalltag

Mit e.Med Interdisziplinär erhalten Sie Zugang zu allen CME-Fortbildungen und Fachzeitschriften auf SpringerMedizin.de.

Jetzt bestellen und 100 € sparen!

Jetzt testen ¹

e.Med Neurologie

Kombi-Abonnement

Mit e.Med Neurologie erhalten Sie Zugang zu CME-Fortbildungen des Fachgebietes, den Premium-Inhalten der neurologischen Fachzeitschriften, inklusive einer gedruckten Neurologie-Zeitschrift Ihrer Wahl.

Jetzt testen ²

Ascoli, G. A., Donohue, D. E., & Halavi, M. (2007). NeuroMorpho.Org: a central resource for neuronal morphologies. The Journal of Neuroscience, 27, 9247–9251. doi:10.1523/JNEUROSCI.2055-07.2007.PubMedCrossRef

Bota, M., Dong, H. W., & Swanson, L. W. (2005). Brain architecture management system. Neuroinformatics, 3, 15–48. doi:10.1385/NI:3:1:015.PubMedCrossRef

Brain, M. L. (2008) Brain Markup Language. http://brainml.org

Clark, T., Martin, S., & Liefeld, T. (2004). Globally distributed object identification for biological knowledgebases. Briefings in Bioinformatics, 5, 59–70. doi:10.1093/bib/5.1.59.PubMedCrossRef

Crasto, C., Marenco, L., Miller, P., & Shepherd, G. (2002). Olfactory Receptor Database: a metadata-driven automated population from sources of gene and protein sequences. Nucleic Acids Research, 30, 354–360. doi:10.1093/nar/30.1.354.PubMedCrossRef

Gardner, D., Akil, H., Ascoli, G. A., Bowden, D. M., Bug, W., Donohue, D. E., et al. (2008). The Neuroscience Information Framework: a data and knowledge environment for neuroscience. Neuroinformatics, this issue.

Gupta, A., Bug, W., Marenco, L., Qian, X., Condit, C., Rangarajan, A., et al. (2008). Federated access to heterogeneous information resources in the Neuroscience Information Framework (NIF). Neuroinformatics, this issue.

Hines, M. L., Morse, T., Migliore, M., Carnevale, N. T., & Shepherd, G. M. (2004). ModelDB: A Database to Support Computational Neuroscience. Journal of Computational Neuroscience, 17, 7–11. doi:10.1023/B:JCNS.0000023869.22017.2e.PubMedCrossRef

Martin, S., Hohman, M. M., & Liefeld, T. (2005). The impact of Life Science Identifier on informatics data. Drug Discovery Today, 10, 1566–1572. doi:10.1016/S1359-6446(05)03651-2.PubMedCrossRef

Martone, M. E., Tran, J., Wong, W. W., Sargis, J., Fong, L., Larson, S., et al. (2008). The cell centered database project: an update on building community resources for managing and sharing 3D imaging data. Journal of Structural Biology, 161, 220–231. doi:10.1016/j.jsb.2007.10.003.PubMedCrossRef

Miller, P. L., Nadkarni, P., Singer, M., Marenco, L., Hines, M., & Shepherd, G. (2001). Integration of multidisciplinary sensory data: a pilot model of the human brain project approach. Journal of the American Medical Informatics Association, 8, 34–48.PubMed

NCBI. (2007) Entrez LinkOut Service. http://www.ncbi.nlm.nih.gov/projects/linkout.

Schott, M. J. (2004). PubMed enhancements: fulfilling the promise of a great product. Medical Reference Services Quarterly, 23, 1–11. doi:10.1300/J115v23n04_01.PubMedCrossRef

Shepherd, G. M., Healy, M. D., Singer, M. S., Peterson, B. E., Mirsky, J. S., Wright, L., et al. (1997). Senselab: a project in multidisciplinary, multilevel sensory integration. In E. H. Koslow, & F. M. Huerta (Eds.),Neuroinformatics: An Overview of the Human Brain Project (pp. 21–56). New York: Erlbaum.

Titel: The NIF LinkOut Broker: A Web Resource to Facilitate Federated Data Integration using NCBI Identifiers
verfasst von: Luis Marenco
Giorgio A. Ascoli
Maryann E. Martone
Gordon M. Shepherd
Perry L. Miller
Publikationsdatum: 01.09.2008
Verlag: Humana Press Inc
Erschienen in: Neuroinformatics / Ausgabe 3/2008
Print ISSN: 1539-2791
Elektronische ISSN: 1559-0089
DOI: https://doi.org/10.1007/s12021-008-9025-y

Leitlinien kompakt für die Neurologie

Mit medbee Pocketcards sicher entscheiden.

^{Seit 2022 gehört die medbee GmbH zum Springer Medizin Verlag}

Kostenlos registrieren

Neu im Fachgebiet Neurologie

Nicht Creutzfeldt Jakob, sondern Abführtee-Vergiftung

29.05.2024 Hyponatriämie Nachrichten

Eine ältere Frau trinkt regelmäßig Sennesblättertee gegen ihre Verstopfung. Der scheint plötzlich gut zu wirken. Auf Durchfall und Erbrechen folgt allerdings eine Hyponatriämie. Nach deren Korrektur kommt es plötzlich zu progredienten Kognitions- und Verhaltensstörungen.

Schutz der Synapsen bei Alzheimer

29.05.2024 Morbus Alzheimer Nachrichten

Mit einem Neurotrophin-Rezeptor-Modulator lässt sich möglicherweise eine bestehende Alzheimerdemenz etwas abschwächen: Erste Phase-2-Daten deuten auf einen verbesserten Synapsenschutz.

Sozialer Aufstieg verringert Demenzgefahr

24.05.2024 Demenz Nachrichten

Ein hohes soziales Niveau ist mit die beste Versicherung gegen eine Demenz. Noch geringer ist das Demenzrisiko für Menschen, die sozial aufsteigen: Sie gewinnen fast zwei demenzfreie Lebensjahre. Umgekehrt steigt die Demenzgefahr beim sozialen Abstieg.

Hirnblutung unter DOAK und VKA ähnlich bedrohlich

17.05.2024 Direkte orale Antikoagulanzien Nachrichten

Kommt es zu einer nichttraumatischen Hirnblutung, spielt es keine große Rolle, ob die Betroffenen zuvor direkt wirksame orale Antikoagulanzien oder Marcumar bekommen haben: Die Prognose ist ähnlich schlecht.

Update Neurologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.

Newsletter bestellen

Live-Webinar "Urologie und Sexualmedizin in der Praxis"

Springer Medizin