Introduction

As the senior citizen population grows, the number of the patients with knee joint arthritis, such as osteoarthritis (OA) is increasing. In OA, the muscles are weakened or cartilages are worn down, and consequently some bones are deformed or are irritated [1]. This situation brings about a motion of the femur, tibia, and patella different from that of healthy knees. Consequently, the motion analysis of these bones may provide useful information on the status of the patient knee joint.

X-ray fluoroscopy has been used for observing knee joint movement. It provides dynamic images, but each image is only two-dimensional (2D). Thus, it is difficult for physicians to imagine the three-dimensional (3D) motion of the knee joint exactly. Of course, they cannot measure a 3D distance between two arbitrary points on the knee joint from such an image.

A 2D/3D registration method that uses both fluoroscopy images and CT or MR images has been studied for acquiring knee joint motion [215]. This method was applied successfully for obtaining the 3D motion of a knee joint with an implant that had been attached by total knee arthroplasty [26]. The fluoroscopy imaging was possible because a clear shadow of the implant, which had a very high absorption coefficient, was obtained. This is not the case for the natural bone of a patient. Very careful registration is required for imaging of natural bones.

There have been several studies acquiring and analyzing the motion of a natural human knee joint. A few groups tried to acquire and analyze 3D motion for a natural knee joint by using single-plane fluoroscopy and CT or MR images [710]. However, the out-of-plane translational errors were very large in their studies. They attributed the large errors to the difficulty in extracting the natural human knee joint in the fluoroscopy images, unlike the case with a knee joint having a metallic implant. Asano et al. [11] analyzed the motion of the human knee joint from bi-plane X-ray images and CT images. Their study showed that errors were reduced if information on two directions was used. However, the motion of the knee joint that was analyzed was not a real motion; the analysis was performed only at several angles at which the knee joint was stopped.

The dynamic 3D motion of the human femur and tibia has been acquired by the 2D/3D registration method by use of fluoroscopy and CT or MR images [12, 13]. On the other hand, analysis of the static 3D position of the human patella was performed only at several angles [14, 15]. There has been no study that acquired the dynamic 3D motion of human femur, tibia, and patella. We suppose that the relative displacement and rotation between bones in a human knee joint in motion are not necessarily the same as those of staying bones because muscle may affect them. In this paper, we tried to acquire dynamic 3D motion of the human knee joint which included the patella by 2D/3D registration. Because acquiring the dynamic 3D motion of the knee joint needs a large amount of calculation time, we implemented the 2D/3D registration method with a graphics processing unit (GPU) and Compute Unified Device Architecture (CUDA). Prior to the experiment with real human knee joints, the accuracy of the 2D/3D registration method was evaluated in the experiment on a pig knee joint to which several fiducial markers had been attached.

Method

Outline

An outline of the 2D/3D registration method is illustrated in Fig. 1. Initially, both bi-plane dynamic fluoroscopy and static 3D CT images are acquired. Next, bone regions are segmented from the CT image and made ready to be moved according to rotation and translation parameters. Virtual fluoroscopy images called digitally reconstructed radiographs (DRRs) are generated by projecting the 3D CT image in the same geometry as that of the real fluoroscopy system. A cost function defined by the similarity between the DRR and the real fluoroscopy image is calculated.

Fig. 1
figure 1

Flow chart of the 2D/3D registration method

Segmented bones in CT data are rotated and translated until the cost function reaches an optimum value. In this study, we used the Powell method to optimize the rotation and translation parameters of each bone [16]. The resultant information provides the 3D knee joint movement.

Preprocessing

Calibration of projection system

In bi-plane X-ray fluoroscopy, a cubic object with a metallic ball on each vertex was captured for calibration as shown in Fig. 2. The relationship between the world coordinates of the object space and the image coordinates of the detector is given by the perspective projection matrix C p as:

$$ s_{p} \left[ {\begin{array}{*{20}l} {u_{p} } \\ {v_{p} } \\ 1 \\ \end{array} } \right] = C_{p} \left[ {\begin{array}{*{20}c} X \\ Y \\ Z \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {c_{p,11} } & {c_{p,12} } & {c_{p,13} } & {c_{p,14} } \\ {c_{p,21} } & {c_{p,22} } & {c_{p,23} } & {c_{p,24} } \\ {c_{p,31} } & {c_{p,32} } & {c_{p,33} } & {c_{p,34} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} X \\ Y \\ Z \\ 1 \\ \end{array} } \right],\quad p = 1,2. $$
(1)
Fig. 2
figure 2

Overview of calibration. A cubic object with metallic balls is set into the bi-plane fluoroscopy system, and its images are captured. The perspective projection matrix C p is calculated from the relationship between the world coordinates of the object space (X, Y, Z) and the image coordinates of the detectors (u p , v p ). The suffix p represents the fluoroscopy detector number

Here (X, Y, and Z) are the world coordinates of an object point, and (u p , v p ) are its image coordinates in fluoroscopy. The left-hand side of Eq. (1) is a homogenous coordinate representation of (u p , v p ). Originally, in the perspective projection, the world coordinates of an object point are not linearly mapped to the corresponding image coordinates. By introducing a new parameter s p which is given by the third row in Eq. (1), the perspective projection is represented in a matrix form which is convenient for operation. The suffix p represents the detector number of fluoroscopy. The matrix C p is determined from the pairs of the world coordinates and the image coordinates of the metallic balls by a least-squares method.

Bone region extraction

Bone regions, such as the femur, tibia, and patella were extracted and segmented from the 3D CT image of the knee joint as shown in Fig. 3. Here, the fibula was also extracted as a part of the tibia. A region-growing method was used for bone region extraction [17]. A seed point for each bone region was given manually. The upper and lower thresholds were set to 1250 and 2700 HU empirically. The other parts of the CT volume data, such as ligament and flab were eliminated. The respective rectangular parallelepipeds including the femur, tibia, and patella were defined, and the coordinate system was then defined for each bone. The origin of each bone was automatically set at the center of each rectangular parallelepiped.

Fig. 3
figure 3

Results of bone region extraction. Femur, tibia, and patella are extracted one by one from a 3D CT image. The coordinate system of each bone the origin of which is set at the center of each rectangular parallelepiped is defined

DRR generation

In the DRR generation, we calculated the projection by using a ray-sum method [18, 19]. The ray-sum method is the most basic projection technique. The DRRs were generated by casting a virtual X-ray through the CT image to each pixel of the virtual detector and integrating the CT image value (HU) along the virtual X-ray. Practically, the projection is performed as follows. For each pixel in the projection plane, the CT data along the ray connecting the center of the pixel and the X-ray source is integrated. The contribution is calculated at the equally sampled point on the ray by trilinear interpolation from CT voxel values at eight vertices of a rectangular parallelepiped in which the sample point exists, as shown in Fig. 4 [20].

Fig. 4
figure 4

Schematic illustration of interpolation in projection. A white point denotes a sampling point, and the black points denote the points neighboring at the sampling point. The contribution is calculated at the sampling point from CT voxel values at these eight neighboring points by trilinear interpolation

Correlation calculation

We used the gradient correlation (GC) in judging the match between the fluoroscopy images and DRRs [21]. Given the fluoroscopy image IF p and DRR ID p (p = 1, 2), the GC is given by

$$ {\text{GC}} = N\left( {IF^{\prime}_{1} ,ID^{\prime}_{1} } \right) + N\left( {IF^{\prime}_{2} ,ID^{\prime}_{2} } \right). $$
(2)
$$ N\left( {A,B} \right) = \left| {{\frac{{\sum\nolimits_{{i \in {\text{ROI}}}} {\left( {A_{i} - \overline{A} } \right)\left( {B_{i} - \overline{B} } \right)} }}{{\sqrt {\sum\nolimits_{{i \in {\text{ROI}}}} {\left( {A_{i} - \overline{A} } \right)^{2} } } \sqrt {\sum\nolimits_{{i \in {\text{ROI}}}} {\left( {B_{i} - \overline{B} } \right)^{2} } } }}}} \right|. $$
(3)

Here, N(A, B) represents the normalized cross-correlation (NCC) between two images, A and B. \( \overline{A} \) and \( \overline{B} \) are the mean values over a region of interest (ROI) of A and B. The ROI is defined by a circle which is the same as the fluoroscopy image region. By setting the ROI, the surrounding black region is ignored in the NCC calculation. IF′ p and ID′ p are edge-enhanced images of IF p and ID p . We used Gaussian and Sobel filters for smoothing noise and generating the edge-enhanced images. We empirically set the standard deviation and kernel size of the Gaussian function to 3 and 7 pixels, respectively.

Acceleration of 2D/3D registration

We implemented the DRR generation and edge enhancement with a GPU and CUDA [22] for fast calculation. Details of the work are described elsewhere [23]. The GPU can perform parallel calculations, and as a result, the total calculation time with GPU–CUDA becomes about 1/50 to 1/100 of that with a CPU only. CUDA is a unification development environment of the C language which NVIDIA offers [22]. With CUDA, users can write a source code without knowledge about the computer graphics.

Experiments and results

We performed two kinds of experiments. One was an experiment to evaluate the accuracy of the method using a pig knee joint with several fiducial markers. The other was an experiment to confirm whether the method was applicable to the human knee joints. In the experiment with the human knee joints, two kinds of CT scanners were used for 3D CT data acquisition. Ideally, the same CT equipment should be used in the experiment. However, it was difficult because of the condition of clinical use of the CT equipment. One was a LightSpeed Ultra16 (GE YMS, Tokyo, Japan), the resolution and slice thickness of the CT data of which were 0.39 and 0.63 (mm/pixel), respectively. The other was the Aquilion OneTM (Toshiba Medical Systems Corp., Tochigi, Japan), the resolution and slice thickness of the CT data of which were 0.35 and 0.20 (mm/pixel), respectively. In the accuracy evaluation experiment with a pig knee joint, Aquilion OneTM was used.

For bi-plane fluoroscopy image acquisition, an infinix Celeve CB (Toshiba Medical Systems) was used both for human knees and for the pig knee. Because an image intensifier was used as a detector, distortion took place in the images. Thus, a distortion correction technique was applied to those images [12]. The fluoroscopic image size was 512 × 512 pixels, and each pixel had 8 bits. The source-image distance (SID) was 110 cm. Both the right anterior oblique position (RAO) and the left anterior oblique position (LAO) were 55 deg. In the experiment with the human knee joints, the frame rate was 30 fps. The field of view (FOV) was a circle with a diameter of 304.8 mm (12 inches). The pixel size was 0.60 × 0.60 mm2.

As GPU, a GeForce GTX 280 (NVIDIA Corporation) was used, which equipped with 240 processer cores, a 1296 MHz processor clock, an 1107 MHz memory clock, and 1.0 GB of memory. The PC had a Core i7 965(3.2 GHz) as CPU and 6.0 GB of memory.

Accuracy evaluation experiment

An accuracy evaluation experiment was carried out on a knee joint of a pig carcass bought at a butcher shop. It was assumed that a pig knee was sufficiently similar to a human one. Metallic balls with a diameter of 1 mm were fixed on some suitable points of the femur, the tibia, and the patella as fiducial markers for evaluation. The pig knee joint was affixed to a jig, and fluoroscopic images were acquired at five different angles between the femur and tibia, which ranged from 60 to 120 deg at intervals of 15 deg. As shown in Fig. 5, the jig was made of acrylic plastic so as not to appear in the fluoroscopy image, and it could be used for adjusting the angle between the femur and tibia. Figure 6 shows some sample images. Here, edge-enhanced DRRs in red are superimposed on fluoroscopy images to show the quality of the registration. From visual inspection of the images, we confirmed that each bone was successfully registered.

Fig. 5
figure 5

Photo showing the pig knee joint set in the jig. The jig was made of acrylic plastic so as not to appear in the fluoroscopy image, and it could be used for adjustment of the angle between the femur and tibia

Fig. 6
figure 6

Example of registration results for the pig knee joint. Here, edge-enhanced DRRs in red are superposed on fluoroscopy images to show the quality of the registration that was achieved (color figure online)

The accuracy was evaluated quantitatively as well. Figure 7 outlines the accuracy evaluation, which consisted of three steps. First, from only the coordinates of the metallic markers in bi-plane fluoroscopy and 3D CT images, the optimal parameters for 2D/3D registration were obtained [24]. Second, by use of bone regions in fluoroscopic images and CT images, parameters for registration were estimated. Finally, the estimated parameters were compared with optimal parameters. The accuracy was evaluated by root mean square error (RMSE) and maximum error.

Fig. 7
figure 7

Outline of the accuracy evaluation. 1 The optimal parameters were obtained by use only of coordinates of fiducial markers in bi-plane fluoroscopic and CT images. 2 Parameters of registration were estimated by use of bone regions in fluoroscopic images and CT image. 3 Estimated parameters were compared with optimal parameters

The RMSE and maximum error for each bone and femorotibial (relative poses of femur with respect to tibia), patellofemoral (relative poses of femur with respect to patella), and patellotibial (relative poses of tibia with respect to patella) are summarized in Table 1. All RMSEs were within 1.0 mm and 1.0 deg. For the femur, tibia, and patella, the RMSEs were less than 0.4 mm and 0.7 deg. The RMSEs with respect to relative poses between each bone were within 0.6 mm and 0.7 deg. Because the resolution of the 3D CT data was 0.39 or 0.30 mm/pixel, these errors correspond to only 1 to 3 voxels of CT data at most.

Table 1 RMSE and maximum error between the optimal parameters and the estimated parameters for each bone and relative poses, such as femorotibial (relative poses of femur with respect to tibia), patellofemoral (relative poses of femur with respect to patella), and patellotibial (relative poses of tibia with respect to patella)

Experiment with human knee joints

An image-acquisition experiment with X-ray CT and bi-plane fluoroscopy was conducted on three healthy male volunteers under the approval of the Ethical Review Board of Chiba University. In 3D CT data acquisition, the LightSpeed Ultra16 was used for volunteer #1, and Aquilion One was used for volunteers #2 and #3. Figure 8 is a picture of the fluoroscopy acquisition experiment. Each subject flexed his knee joint from full flexion through full extension to full flexion as putting his weight mainly on his leg being tested during fluoroscopy imaging. From three hundred to four hundred and fifty consecutive fluoroscopic images (10–15 s) were captured during one flexion cycle, namely, full flexion, full extension, and full flexion. These images were used for matching with DRRs. Then the motion of each bone was estimated.

Fig. 8
figure 8

Photo showing the image-acquisition experiment with the fluoroscopy system

For visualization purposes, surface-rendering images of bones were generated for each frame, and a movie composed of these images was produced. Figure 9 shows four frames from the movie for each volunteer. The motion of each bone was confirmed visually with these images. Figure 10 shows 15 close-up frames of volunteer #1. The translation of the patella relative to the femur is observed. By use of bi-plane fluoroscopy and CT images, the 3D motion of each bone could be observed in detail. A detailed analysis of the 3D motions is currently being conducted.

Fig. 9
figure 9

Surface-rendering images of femur, tibia, and patella at four different flexion angles for 3 volunteers

Fig. 10
figure 10

Motion of patella of volunteer #1

The calculation time for 2D/3D registration was about 2.5 min/frame. The total computation time was around 12.5 h per subject because three hundred fluoroscopic images were acquired in this time.

Discussion and conclusions

In orthopedics, accuracy within 1.0 mm and 1.0 deg is desired as a goal of 2D/3D registration. In terms of the maximum error, the parameters of the patella showed relatively large errors, for example, the X- and Y-rotation were 1.14 and 1.04 deg, respectively. In the region where bones were in close formation, those edges were unclear, as seen in Fig. 11. The features of the patella were easy to lose in comparison with those of the other bones, and the error in the parameter estimation was larger. This could be one of the main reasons for the relatively poor registration.

Fig. 11
figure 11

Original fluoroscopy images (top) and edge-enhanced fluoroscopy images (bottom). In the region where bones are in close formation, these edges were unclear

Overall, the accuracy was sufficient in the experiment with a pig knee joint. In the experiment with human volunteers, however, any quantitative evaluation has not been done. Fiducial markers can hardly be used in a human experiment as done in the experiment with the pig knee joint. Instead, careful, manual evaluation of registration using feature points, such as edge or tip of bones is needed for each frame as an alternative evaluation method.

Application of the proposed method to aged patients would be a challenge because the density of their bones is generally low and it brings the difficulty in registration. We need to perform an experiment with such subjects and improve the method as required.

We evaluated the accuracy of the 2D/3D registration method with a pig knee joint which included the patella. Although some of the maximum errors were larger than 1.0 mm and 1.0 deg, the accuracy of the method was sufficient for analysis of the knee joint. In addition, natural motions of the femur, tibia, and patella were observed from movies of the surface-rendering images. Our future work includes study of the 3D motion of a patient’s diseased knee joint and comparison with the motion of a normal knee joint.