Introduction

Discrimination of tissue conditions, pathologies and critical structures from healthy surrounding tissue during surgery can be challenging given the fact that different body tissues appear similar to the human eye. While conventional intraoperative imaging is limited by mimicking the human eye, hyperspectral imaging (HSI) removes this arbitrary restriction of recording only red, green and blue (RGB) colors. HSI works by assigning each pixel of a conventional two-dimensional digital image a third dimension of spectral information. The spectral information contains the wavelength-specific reflectance intensity of every pixel. This results in a three-dimensional datacube with two spatial dimensions (x, y) and a third spectral dimension (λ). HSI has found application in diverse fields such as geology and maritime studies, agriculture, food industry, automated waste sorting1,2 and has recently been used during a NASA space mission on Mars.

Over the last few years, there have been extensive efforts to implement HSI technology in healthcare. Examples of potential future clinical applications comprise the quantitative evaluation of tissue oxygenation and blood perfusion3,4, inflammation and sepsis5, edema6 or malignancy7 as well as computer-assisted decision-making and automated organ identification8. These have the potential to support future developments such as intraoperative cognitive assistance systems or even automatization of robotic surgery. Despite the promising research, clinical translation of HSI-based automatic tissue differentiation has not yet been achieved. This may be attributed to a current lack in robustness and generalizability, which are the most important requirements for clinical application. In this regard, several open research questions remain. Specifically, variability of HSI measurements may result from the inherent differences between multiple tissue types under observation (desired effect), but also from inter-subject variability or variability in image acquisition conditions (both undesired). We are not aware of any prior work that has systematically investigated this important topic and we ultimately aim to provide a thorough understanding of hyperspectral organ data, illustrate the potential of HSI-based analyses and present solid baseline data that further studies can build upon.

Results

For automatic tissue characterization based on HSI data, the following two properties are highly desirable: First, spectra corresponding to different organs should differ substantially from each other. And second, spectra of the same organ should be relatively constant across image acquisition conditions and individuals. With this in mind and given the gap in literature pointed out in the section above, the contribution of this work is threefold:

  • Spectral fingerprints: We present the first comprehensive analysis of spectral tissue properties for a wide range of physiological organs and tissue types in a pig model. Based on 9059 images of 46 pigs and 17,777 annotations, we generate specific spectral fingerprints for a total of 20 organs.

  • Variance analysis: We show that the greatest part of spectral variance can be explained by organ differences.

  • Machine learning-based organ and tissue classification with HSI: We demonstrate that a neural network can distinguish between organ classes with high accuracy (> 95%), suggesting that HSI has high potential for intraoperative organ and tissue discrimination.

Different organs feature unique spectral fingerprints

This project provides insight on the spectral reflectance of 20 porcine organs in a total number of 9059 images within 46 animals (Fig. 1). Our data shows that different organs feature characteristic spectra, thus referred to as organ “fingerprints”. As seen in the gray pig-specific reflectance curves in Fig. 1, variation in the spectral measurements may result not only from the organ, but also from the individuals and/or the specific measurement conditions. A key aim of this work was therefore to quantify the effect of the different sources of variation.

Figure 1
figure 1

Tissue atlas comprising spectral fingerprints of 20 organs and specific tissue types. Stomach (A = 39; n = 849), jejunum (A = 44; n = 1546), colon (A = 39; n = 1330), liver (A = 41; n = 1454), gallbladder (A = 28; n = 526), pancreas (A = 31; n = 530), kidney (A = 42; n = 568), spleen (A = 41; n = 1353), bladder (A = 32; n = 779), omentum (A = 23; n = 570), lung (A = 19; n = 652), heart (A = 19; n = 629), cartilage (A = 15; n = 586), bone (A = 14; n = 537), skin (A = 43; n = 2158), muscle (A = 15; n = 560), peritoneum (A = 28; n = 2042), vena cava (A = 15; n = 353), kidney with Gerota’s fascia (A = 18; n = 393), bile fluid (A = 13; n = 362). A indicates the number of animals; n indicates the number of measurements in total. Graphs depict mean reflectance (ℓ1-normalized on pixel-level) of individual pigs (gray) as well as overall mean (blue) ± 1 standard deviation (SD) (black) with wavelengths from 500 to 1000 nm on the x-axis and reflectance in arbitrary units on the y-axis.

Spectral similarity between organs is heterogeneous

In order to illustrate HSI variability resulting from individuals and measurement conditions, t-distributed Stochastic Neighbor Embedding (t-SNE)9 was applied to our ℓ1-normalized data (Fig. 2). It shows that while certain tissue types such as spleen and liver form highly isolated clusters, other organs such as stomach, pancreas and jejunum have a tendency to overlap, indicating lower distinguishability.

Figure 2
figure 2

Visualization of spectral similarity with t-distributed Stochastic Neighbor Embedding (t-SNE) as a non-linear dimensionality reduction tool on the ℓ1-normalized data; one point represents the median spectrum within one region of interest (ROI) of one organ in one image of one pig. It can be seen that organs such as spleen and liver form isolated clusters, while other organs such as jejunum overlap with the rest.

Organ is the most influential factor on the reflectance spectrum

To quantify the effect of different sources of variation, we applied linear mixed models on a highly standardized subset of data obtained from 11 pigs (P36–P46 as illustrated in Supplementary Fig. 1). The analysis was performed at first for all organs (Fig. 3) and subsequently stratified by organ (Fig. 4). In the analysis for all organs, at each wavelength the proportion of explained variation10 in observed reflectance was decomposed into the components “organ”, “pig”, “angle”, “image” and “repetition”, where “angle” describes the proportion of variation explained by the angle between the organ surface and the camera optical axis, “image” describes the proportion of variation explained by different measurements taken from different organ positions in the same pig or variations in the annotated areas, and “repetition” describes the proportion of variation explained by multiple recordings of the same image under identical measurement conditions.

Figure 3
figure 3

Sources of variation of hyperspectral data. (Proportion of) variability in reflectance explained by each factor using linear mixed models. Factors include “organ”, “pig”, “angle”, “image” and “repetition”. For each recorded wavelength, an independent linear mixed model was fitted with fixed effects for the factors “organ” and “angle” as well as random effects for “pig” and “image”. Variation across repetitions was given by the residual variation. The greater the proportion of variability for “organ”, the more reflectance can be seen as organ-characteristic. Shaded areas depict 95% (pointwise) confidence intervals based on parametric bootstrapping. The numbers represent the median across wavelengths.

Figure 4
figure 4

Sources of variation of hyperspectral data stratified by organ. Explained variation analysis stratified by organ using linear mixed models. For each organ and wavelength, independent linear mixed models were fitted with fixed effects for “angle” and random effects for “pig” and “image”. Variation across repetitions is given by the residual explained standard deviation. Shaded areas depict 95% (pointwise) confidence intervals based on parametric bootstrapping. The numbers on each subplot represent the median across wavelengths.

Our analysis was performed for ℓ1-normalized median spectra. Our results suggest that the main influencing factor on HSI data variation across wavelengths was the factor “organ” with an average proportion of explained variability of 83.4%. The factor “image” explained 13.8% of the variation on average while the other factors only explained negligible variation with 2.3% for “pig”, 0.1% for “angle” and 0.2% for “repetition”. This suggests that HSI data is characteristic of organs much more than of the subjects under observation or other influencing factors. The percentage to which variance in reflectance was explained by the components varied slightly through different parts of the recorded electromagnetic spectrum. Variability explained by organs decreased for wavelengths below 900 while pig and image played an increasing role.

When stratifying by organ, the variance in reflectance was decomposed into the same components, except “organ”. According to Fig. 4, “angle” and “repetition” explain a negligible portion of the variance in all organs, except for spleen where some explained variation for angle can be observed for some wavelengths. For “pig” and “image”, differences between organs are present. For organs where all lines are relatively close to zero (e.g. stomach), there is little heterogeneity in reflectance between different images and pigs, thus these organs show the most pronounced organ-characteristic spectral signatures. On the other hand, organs with greater levels of explained variance for the components “pig” and “image” (e.g. gallbladder) consequently had less organ-characteristic spectral signatures. Organ classes with the highest cumulative levels of variance curves explained by factors other than “organ” and therefore the least organ-characteristic spectral signatures across observations were spleen and gallbladder (Supplementary Text 3 and Supplementary Table 1).

For some organs, such as the heart, reflectance varied strongly between pigs (value for “pig” comparatively high), but relatively little within a pig (value for “image” comparatively low). Thus, reflectances measured for hearts were heterogeneous across individual pigs. On the other hand, for other organs, heterogeneity within pigs (i.e. between images of the same pig) was somewhat larger (value for “image” high) than between pigs (value for “pig” low), e.g. kidney with Gerota’s fascia. Thus, reflectance measured for kidney with Gerota’s fascia tends to be homogeneous across individual pigs, but a single image of kidney with Gerota’s fascia may be unreliable due to the heterogeneity within one pig.

Machine learning can leverage spectral information to classify tissue with high accuracy

A deep learning-based approach was used to classify the annotations of 20 organ classes from the spectra presented above with an average accuracy of 95.4% ± 3.6% across pigs on a hold-out test set. Misclassifications are organ annotations that have not been assigned to the correct organ class, but to one of the other 19 classes and only occurred for 486 out of 9895 annotations in the test set (Fig. 5). While 16 out of the 20 organ classes were classified with an average sensitivity of ≥ 90% across all test pigs, the smallest average sensitivity across test pigs was obtained for the organ classes gallbladder (74.0%) and heart (73.9%), which were on average across pigs most often confused with bladder and kidney, respectively. Across all organ classes, the average sensitivity was of 93.0% ± 6.3%, while the average specificity was of 99.8% ± 0.2% and an average F1 score of 92.3% ± 6.5% was achieved.

Figure 5
figure 5

Results of deep learning-based organ classification. (a) confusion matrix which was generated for a hold-out test set comprising 9895 annotations from 5293 images of 8 pigs that were not part of the training data. Confusion matrices were calculated and column-wise normalized (i.e. divided by the column sum) per pig based on the absolute number of (mis-)classified annotations. These normalized confusion matrices were averaged across pigs while ignoring non-existent entries (e.g. due to missing organs for one pig). Each value in the matrix thus depicts the average fraction of annotations which were labeled as the column class and predicted as the row class. Numbers in brackets depict the standard deviation across pigs. Zero values are not shown in the confusion matrix in order to improve visibility. Since multiple organs can appear on the same image, the number of annotations exceeds the number of images. (b) Exemplary image with multiple organ annotations by an expert. (c) Organs classified through deep learning.

Discussion

Visual discrimination and evaluation of biological tissue is not trivial, as different tissues and body structures often appear similar to the human eye. Because conventional optical imaging during surgery only differentiates red, green and blue by mimicking human vision, its intraoperative benefit is sometimes limited. HSI, not being subject to this restriction and encompassing significantly more information, is an exceptional imaging modality with great potential for tissue identification and evaluation. Although its current use in medicine is on a constant rise, the full potential of this imaging modality has not been exploited. This may be attributed to open research questions concerning robustness and generalizability of HSI data.

Structural properties of tissue cause differences in spectral characteristics that might be significant enough for use in proper organ differentiation and other clinical applications. However, existing literature on spectral measurements has mainly focused on specific biological pigments such as hemoglobin, porphyrin and melanin11,12, and has hardly addressed the complexity of spectral characteristics across various tissues and organs. Still, there are copious publications on spectral characteristics of organs and they arguably formed the basis for the current application of HSI in surgery. However, current literature has limitations featuring measurements with lower wavelength resolution13,14, ex-vivo material15 or measurements with incompatible and incomparable technology that does not allow for comparison between different studies. Other publications are very detailed, but provide less intuitive data focusing on optical scattering instead of reflectance or absorbance16. Despite the great benefit that such literature sources have provided to our community16, until today, there is no systematic database or investigation of reflectance spectra for a variety of physiological organs in a larger cohort—neither for humans nor for animals.

Categorical requirements for such a spectral medical HSI database serving as a reference work are precision, uniformity and comparability of the measuring device, which has been demanded by the HSI community in previous years17. While in former decades HSI could not be found in medicine, there have been extensive efforts to implement this technology in healthcare over the last years. However, most of the initially developed HSI systems were self-made prototypes and homebuilt solutions from various institutions all over the world, varying in spectral resolution and range as well as utilized detectors and optical components17,18,19,20,21,22,23,24,25,26,27. While highly interesting insights for various medical applications were obtained with these provisional solutions, they were lacking standardization and comparability, as other research groups would have to get the individual parts and build these measuring devices on their own17. Also, there is a large variability in e.g. spectral ranges, illumination, optical components such as filters and spatial resolution. These aspects rendered sustainable large clinical trials and systematic multicenter research impossible. A great variety of devices could be observed in terms of spectral resolution, detectors, dispersive devices and spectral regions covered by different devices reaching from 200 nm up to 2500 nm17.

The HSI camera system used in this project is the first commercially available and medically certified system meeting most of the aforementioned demands, however, only a limited wavelength range can be recorded (500–1000 nm) and the visible range is not fully covered. While previous and less standardized HSI systems were efficient for the investigation of specific and isolated research questions, reproducibility and generalizability of commercially available systems noticeably promoted an increase in research efforts regarding HSI. An indicator of these increased research efforts can be seen in the rise of the number of research projects over the last few years including animal studies with rats28 and pigs29,30,31,32, conference papers33,34, narrative reviews35,36,37 and other publications38,39. With this new system and its advantages, special focus has again been put on early clinical trials with explorative character1,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55. However, there are novel possibilities that have not been exploited yet. These primarily include spectral characterization of biological tissue and the complementation of a large medical HSI database with machine learning and deep learning. Some studies have already spectrally characterized single aspects of biological tissue such as the differences between specific cancer entities and their related physiological tissue56. However, these studies were most often conducted with non-reproducible setups or sometimes done in-vitro without knowledge of tissue perfusion, which might be acceptable for specific bradytrophic tissues, but leads to limitations in applicability to the majority of typically well-perfused organs57. Moreover, most of the existing studies so far only highlight specific medical aspects and do not sufficiently broaden the general understanding of spectral tissue characteristics.

The principles of spectral tissue differentiation have already successfully been proven, however only in laparoscopic surgery with sparse multispectral information and, most importantly, in fewer organ classes58,59. The question driving the present study was whether these spectral differences would be strong enough to be detected by an HSI system and subsequently consistent enough to characterize organs and make organ differentiation feasible.

For the very first time, HSI was applied with the aim of (1) systematically characterizing spectral properties of different tissue types in a porcine model, (2) analyzing to which extent these spectra are influenced by organ or tissue type compared to undesired effects such as inter-subject variability and variations in image acquisition conditions and (3) demonstrating that automatic machine learning-based tissue classification even with an unusually high number of classes can be achieved with high accuracy. A total number of 20 different porcine organs were recorded with HSI. The resulting database comprises 9059 recordings with 17,777 annotated organ regions.

Spectral fingerprints of these organs were extracted in Fig. 1 and t-SNE was chosen to visually assess the distinguishability of the respective HSI spectra (Fig. 2). While Euclidean distances have to be interpreted cautiously in two-dimensional illustrations from high-dimensional data, clustering and overlap give a good hint at the differentiability of the underlying spectra. It was now essential to evaluate to what extent differences in reflectance could be attributed to the organ or alternatively to the individual pig or noise from other defined and undefined factors, as this would determine the general utility of HSI data.

Linear mixed models could show that the largest proportion of the spectral reflectance variability was attributed to the factor “organ” instead of “pig”, “angle”, “image” and “repetition”. This suggests that contributions from inter-individual differences and image acquisition conditions were dominated by organ differences. While image acquisition conditions such as illumination were highly standardized, artificial over-standardisation was consciously avoided in order to still comply with conditions in the real operative room. Of the other factors accounting for spectral reflectance variability, “image” was the most relevant one, which indicates that different regions on the same organ have spectral differences. Possible explanations for this finding are inhomogeneous distribution of connective tissues, blood vessels and fibrosis within each organ, different levels of contained blood volumes due to tension on the tissue surface or peristalsis. This insight explicitly for the influencing factor “image” is highly relevant when considering possible real-life intraoperative applications and trials, as it—depending on the depth of the analysis—implies the necessity to record different areas of the organ under investigation as we did.

A machine learning algorithm had an average accuracy of over 95% in an independent test set for identifying organ classes on pre-annotated regions, making this work a solid proof of concept for automatic tissue classification with machine learning based on HSI data. It is to be considered that automatic semantic scene annotation, however, might still present another challenge. Moreover, we only considered the median spectrum of one pre-annotated region in one recording, thus ignoring texture and context information, that may further improve organ identification. Notably, excellent classification results could already be achieved despite limiting the neural network input to organ reflectance without texture information. Organs with similar cellular composition such as stomach and jejunum showed similar reflectance spectra, but could still be differentiated well. Misclassifications mainly occurred between bladder and gallbladder, kidney and heart or vena cava and bone.

Besides the investigation of physiological organs, the systematic investigation of pathological states is of likewise importance and needs to include tissue ischemia, stasis, inflammation and malignancy. The fact that these unphysiological organ states cannot be purposely induced in patients for ethical reasons necessitates the use of a large animal model with human-like features and known spectral tissue properties and marked the reason for choosing a porcine model for the present study, which provides a baseline for future analysis of pathological tissue spectra.

For proper interpretation of the results of this work, certain limitations inherent to HSI technology have to be taken into account. One limitation is the relatively low temporal resolution of current HSI systems with only one recording every 30 s and around seven seconds of recording time each. While more compact and faster devices are under development60, currently this limitation narrows down possible fields of application. However, it does not undermine the validity of the data presented in this work. In fields of application that require a higher temporal resolution, but not necessarily fine-grained wavelength resolution, multispectral imaging (MSI) offers a solution61,62. MSI enables near video-rate imaging and can most probably be substantially refined when taking insights from HSI research into consideration.

Another limitation of HSI is the generally short and wavelength-dependent penetration depth of light in biological tissue. Increasing penetration depths between 700 and 1000 nm had to be taken into account when measuring tissue with a thickness of less than several millimeters such as the omentum. Therefore, it was ensured by visual inspection that the omentum was only measured at sites with sufficient thickness. Photoacoustic tomography, a technology that is able to penetrate more deeply into biological tissue, might help to yield additional information when used complementary to HSI63.

Further limitations arise due to the spatial resolution of only 680 × 480 pixels (width × height). Organs with smaller surface areas, e.g. the gallbladder, were harder to annotate than others, since fewer pixels were available. Besides the technological limitations, the presented tissue identification relied on pre-annotated regions of interest (ROIs). Semantic organ segmentation will have to be addressed in future studies.

This work is the first to systematically investigate spectral properties and relations of organs within a large cohort of organ classes and individuals. By using a highly standardized approach, we were able to extract the spectral fingerprints for each organ and investigate factors influencing the spectral properties. We were able to provide evidence that the tissue types and not the individual animal or the recording conditions were the most influential factor for the reflectance spectrum, which is of utmost importance when trying to assess the possible value of HSI for medical applications. This study can be seen as a reference work paving the way for further spectral organ evaluation (e.g. pathological tissue states), which requires precise knowledge of spectral characteristics of physiological tissue. Possible future applications based on these results include augmentation of computer-assisted decision-making, intraoperative cognitive assistance systems or even automatization of robotic surgery. It can be expected that our main finding of organ-dependent reflectance patterns will be confirmed in human data. To firmly establish HSI in clinical medicine, a translation of this study to human data will be essential.

Methods

Animal anaesthesia and surgical procedure

This animal study was approved by the Committee on Animal Experimentation of the regional council of Baden-Württemberg in Karlsruhe, Germany (G-161/18 and G-262/19). All animals used in the experimental laboratory were managed according to German laws for animal use and care, and according to the directives of the European Community Council (2010/63/EU) and ARRIVE guidelines64. Regular pigs (Sus scrofa domesticus) with a mean weight of 35 kg were chosen as model organism4,65,66,67,68. Data from 46 pigs was included in the analyses.

As per institutional standard and protocol, pigs were starved 24 h prior to surgery with free access to water. Body weight adapted pharmacological calculations are generalized for a 40 kg pig. Initial sedation was performed with weight adapted intramuscular injection of the neuroleptic azaperone (Stresnil® 40 mg/ml by Elanco®) with 6 mg/kg (≈ 6 ml = 240 mg) 15 min prior to further manipulation to decrease stress. Next, analgosedation was established by weight adapted intramuscular injection of a combination of short-acting benzodiazepine midazolam (Midazolam-hameln® 5 mg/ml by hameln pharma plus gmbh®) with 0.75 mg/kg (≈ 6 ml = 30 mg) and ketamine (Ketamin 10%® by Heinrich Fromme®) with 10 mg/kg (≈ 4 ml = 400 mg).

After transportation to the experimental operating room, two 18 G i.v. catheters were established in the ear veins and prevented from clotting with crystalloid infusion at 300 ml/h (Sterofundin ISO® by B. Braun®). Intubation was performed conventionally or via tracheotomy in case of reduced laryngeal visibility. Medication used during intubation in case of excessive sputum production or general backup medication included i.v. atropine and propofol 1%. After intubation, pressure-controlled ventilation was established and a minimum alveolar concentration of 1.0 achieved under sevoflurane®. Intraoperative anesthesia was achieved through balanced narcosis with sevoflurane® and the combination of i.v. 0.2 mg/kg/h midazolam (≈ 1.5 ml/h = 7.5 mg/h) and 8.75 mg/kg/h ketamine (≈ 3.5 ml/h = 350 mg/h) at a rate of 5 ml/h. No relaxant agents were applied.

Monitoring included pulse oximetry, capnometry and invasive blood pressure measurement via the femoral artery in order to prevent measuring false data resulting from impaired circulation. Body temperature was monitored and maintained with electrically controlled heat blankets.

Midline laparotomy was performed to access the abdominal cavity. Ligaments around the liver and the hepato-gastric ligament were dissected and visceral organs mobilized, including the removal of the coverage of the kidneys while carefully sparing vessels. Scissors, electrocautery and bipolar vessel-sealing devices were used. A suprapubic catheter was inserted into the bladder. After surgery, pigs were euthanized with a rapid i.v. application of 50 ml of potassium chloride solution. Death was pronounced upon an end-expiratory CO2 partial pressure below 8 mmHg.

Hyperspectral imaging

The hyperspectral datacubes were acquired with the TIVITA® Tissue system (Diaspective Vision GmbH, Pepelow, Germany), which is a push-broom scanning imaging system and the first commercially available hyperspectral camera for medicine. It provides a high spectral resolution in the visible as well as near-infrared (NIR) range from 500 to 995 nm in 5 nm steps resulting in 100 spectral bands. Its field of view contains 640 × 480 pixels with a spatial resolution of approximately 0.45 mm/pixel (Fig. 6). The distance of the camera to the specimen is controlled via a red-and-green light targeting system to about 50 cm. Six halogen lamps directly integrated into the camera system provide a uniform illumination. Recording takes around seven seconds.

Figure 6
figure 6

Hyperspectral camera system. (a) Visualization of a three-dimensional hyperspectral datacube with x and y as spatial dimensions and z as hyperspectral dimension. The recorded reflectance information content of one pixel is visualized as an example. (b) TIVITA® Tissue camera system.

Image acquisition, annotation and preprocessing

Images were recorded with a distance of 50 ± 5 cm between camera and organs. In order to prevent distortions of the measured reflectance spectra due to stray light, the tissue recordings were made while lights in the operating room were switched off and curtains were closed. While the majority of pig recordings was done in a generic approach in order to accurately represent intraoperative reality, recordings for the mixed model analysis were done with a highly standardized protocol for a subset of 11 pigs (8 to 9 pigs per organ) (between P36 and P46 as indicated in Supplementary Text 1). This standardized protocol includes recordings of 3 repetitions of exactly the same surgical scene (“repetition” effect) from 3 different angles (“angle” effect) (perpendicular to the tissue surface, 25° from one side and 25° from the opposite side) for 4 different organ positions/situs/situations (“image” effect) resulting in a total of 36 recordings for each of the 20 organs (8 to 9 pigs per organ) in a total of 11 pigs. Recordings for bile fluid were performed by soaking bile fluid onto 5 stacked surgical compresses, ensuring that there is no influence from the background. For a more extensive overview of the dataset and a schematic recordings protocol for the standardized subset please refer to Supplementary Fig. 1 and Supplementary Fig. 2.

All of the 9059 recorded images were manually annotated with 20 different organ classes resulting in 17,777 organ annotations (as several organs could be contained within one image). More details on the annotation strategy can be found in Supplementary Text 4. Annotations were done by one medical expert and then verified by two other medical experts. In case of improper annotation, the annotation was redone collectively for that specific recording.

Prior to analysis, spectral information was ℓ1-normalized at the pixel-level for increased uniformity. All analyses were based on median spectra that were computed across all pixels contained in an annotation.

Python 3 has been used for data organization, annotation, information extraction and analysis. Numerical data has been stored using Excel. GraphPad Prism 8.4.1 and Python have been used for statistical testing and visualization. Affinity Designer 1.10.5 has been used for figure design.

t-SNE

t-distributed Stochastic Neighbor Embedding (t-SNE)9 is a machine learning method commonly used to reduce the number of dimensions of high-dimensional data and was used to visualize the characteristic reflectance spectra of each pig organ. This non‐linear dimensionality reduction tool has already proven valuable for the analysis of HSI and mass spectrometry data69 and was chosen for visualization as it has shown particular promise for biological samples in the past70,71. The algorithm aims at modelling manifolds of high‐dimensional data, and produces low‐dimensional embeddings that are optimized for preserving the local neighbourhood structure of the high-dimensional manifold9. In comparison to linear methods like PCA72 and LDA73, t-SNE preserves more relevant structures of datasets that have non-linear features. For these reasons, t-SNE was used for dimensionality reduction.

Before optimizing the parameters of t-SNE, the entire dataset comprising 46 pigs (9059 images with 17,777 annotations) was prepared in the following manner: one characteristic reflectance spectrum was obtained for each annotation by calculating the median spectra from the (previously on pixel-level ℓ1-normalized) spectra of all pixels in the annotation. Consequently, each data point represents the reflectance of one organ in one image of one pig. The two-dimensional visualization of the reflectance spectrum of the complete dataset was optimized by performing a random search of the following parameters:

  • Parameter 1: The early exaggeration, which controls how tight natural clusters in the original space are in the embedded space and how much space will be between them. 50 random integer values were sampled in the range [5; 100].

  • Parameter 2: The learning rate, which is used in the optimization process. 100 random integer values were sampled in the range [10; 1000].

  • Parameter 3: The perplexity, which is related to the number of nearest neighbors for each data point to be considered in the optimization. 50 equidistant integer values were sampled uniformly in the range [2; 100].

The early exaggeration was the first parameter optimized by visual inspection of the two-dimensional representation of the dataset. The learning rate was then optimized in the same manner while keeping the early exaggeration constant. Subsequently, the perplexity was optimized by keeping the other two parameters constant. The optimal values for each of the parameters were 34 for the early exaggeration, 92 for the learning rate and 30 for the perplexity.

Linear mixed models

Independent linear mixed models were used for an explained variation analysis in order to evaluate the effect of the influencing factors on changes in the spectrum. The (proportion of) explained variance was obtained using the empirical decomposition of the explained variation in the variance components form of the mixed model10.

For the first approach, for each wavelength, an independent linear mixed model was fitted with fixed effects for “organ” and “angle” as well as random effects for “pig” and “image”. More precisely, for each wavelength the following model was fitted (suppressing the wavelength index):

$$reflectance_{ijk} = \alpha + organ_{ijk}^{T} \beta + angle_{ijk}^{T} \theta + \delta_{i} + \gamma_{ij} + \varepsilon_{ijk}$$

for repetition k = 1,…,3 of image j = 1,…,ni of pig i = 1,…, 11 (with ni the number of images of pig i ranging from 84 to 228 and \(\sum\nolimits_{i = 1}^{11} {n_{i} } = 1944\)). \(\alpha\) is an intercept, \(organ_{ijk}^{T}\) is a row vector of length 19 indicating the organ of observation ijk (with arbitrary reference category “stomach”) and \(\beta\) is a vector of corresponding fixed organ effects. Similarly, \(\theta\) are fixed effects for angle (“25° from one side” and “25° from the opposite side” for reference category “perpendicular to the tissue surface”). \(\delta_{i} \sim N(0,\sigma_{\delta }^{2} )\) and \(\gamma_{ij} \sim N(0,\sigma_{\gamma }^{2} )\) are random pig and image effects, respectively, assumed to be independently normal distributed with between pig variation \(\sigma_{\delta }^{2}\) and between image variation \(\sigma_{\gamma }^{2} .\) Residuals \(\varepsilon_{ijk} \sim N(0,\sigma_{\varepsilon }^{2} )\) capture the variability between repeated recordings of the same image.

The proportion of variability in reflectance explained by each factor was derived as in10. “Repetition” depicts the residual variability, which is here the within image variability (i.e. across replications). 95% pointwise confidence intervals based on parametric bootstrapping with 500 replications indicate the uncertainty in estimates.

For the second approach with stratification by organ, independent linear mixed models were fitted for each organ and wavelength with fixed effects for “angle” as well as a random effect for “pig” and “image”, i.e. for each organ and wavelength the same model as given above was fitted excluding covariate “organ”. The explained standard deviation of each factor was depicted10. “Repetition” depicts the residual explained standardard deviation, which is here the within image variability (i.e. across replications). 95% pointwise confidence intervals based on parametric bootstrapping with 500 replications indicate the uncertainty in estimates. All linear mixed model analyses were based on image-wise organ-specific median reflectance spectra that were obtained by calculating the median spectrum of all pixel spectra within one annotation.

Machine learning

Prior to training the deep learning network, we systematically split the dataset comprising 46 pigs (9059 images with 17,777 annotations) into a training dataset consisting of 38 pigs (3766 images with 7882 annotations) and a disjoint test set consisting of 8 pigs (5293 images with 9895 annotations) as indicated in Supplementary Fig. 1. These 8 test pigs were randomly selected from the 11 standardized pigs (P36–P46) with the only criterion that every organ class is represented by at least one standardized pig in the test as well as in the training dataset. This criterion could not be fulfilled anymore when selecting more than 8 standardized pigs.

The hold-out test set was used only after the network architecture and all hyperparameters had been fixed. Leave-one-pig-out cross-validation was performed on the training dataset and the predictions on the left-out pig were aggregated for all 38 folds (46 minus 8) to yield the validation accuracy. The hyperparameters of the neural network were optimized in an extensive grid search such that the validation accuracy was maximized. Once the optimal hyperparameters were determined, we evaluated the classification performance on the hold-out test set by ensembling the predictions from all 38 networks (one for each fold) via computing the mean logits vector (the input values to the softmax function, see below) followed by the argmax operation to retrieve the final label for each annotation.

The deep learning-based classification was performed on the median spectra computed from the ℓ1-normalized spectra of all pixels in the annotation masks resulting in 100-dimensional input feature vectors. The deep learning architecture was composed of 3 convolutional layers (64 filters in the first, 32 in the second, and 16 in the third layer) followed by 2 fully connected layers (100 neurons in the first and 50 in the second layer). The activations of all five layers were batch normalized and a final linear layer was used to calculate the class logits. Each of the convolutional layers convolved the spectral domain with a kernel size of 5 and was followed by an average pooling layer with a kernel size of 2. The two fully connected layers zeroed out their activations with a dropout probability of \(p\). All non-linear layers used the Exponential Linear Unit (ELU)74 as activation function.

We chose this architecture as it provides a simple yet effective way to analyze the spectral information. The convolution operation acts on the local structure of the spectra and we used a relatively small kernel size and stacked 3 layers to increase the receptive field while being computationally efficient75. The two fully connected layers make a final decision based on the global context. The advantage of this approach is that it combines local and global information aggregation while still being computationally efficient since the entire network only uses 34,300 trainable weights.

The softmax function was used to provide the a posteriori probability for each class. We used the Adam optimizer (β1 = 0.9, β2 = 0.999)76 with an exponential learning rate decay (decay rate of \(\gamma\) and initial learning rate of \(\eta\)) and the multiclass cross-entropy loss function. In order to meet class imbalances, we included an optional weight of the loss function according to the number of training images per class and sampled instances for the batches either randomly or oversampled such that each organ class had the same probability of being sampled. Both design choices were investigated in the hyperparameter grid search.

We trained 10,000,000 samples per epoch for 10 epochs with a batch size of N. In an extensive grid search, we determined the best-performing hyperparameters: dropout probability \(p* = 0.2(p \in \{ 0.1,0.2\} )\), learning rate \(\eta * = 0.0001\;(\eta \in \{ 0.001, \;0.0001\} )\), decay rate \(\gamma * = 0.9(\gamma \in \{ 0.75, 0.9, 1.0\} )\), batch size \(N* = 20,000\;(N \in \{ 20,000, \;40,000\} )\), a weighted loss function and no oversampling.

We evaluated the performance of our machine learning model by computing the following metric values: a micro-average, in which all annotations contribute equally to the obtained metric value, was computed for the accuracy. In order to balance inequalities in the number of recordings across organ classes (cf. Supplementary Fig. 1), macro-averaged metric values were additionally reported. More specifically, for the computation of average sensitivity, specificity and F1 score, metric values were first computed independently for each organ class and then averaged.