PCC-GAN: High-quality PET Image Reconstruction Using 3D Point-based Context-cluster GAN

GAN (Hostile Generation Network) 28/03/2024

3 main points
✔️ Reconstruction of standard dose PET (SPET) images from low dose PET (LPET) images
✔️ 3D point-based context cluster GAN, "PCC-GAN" proposed
✔️ PCC-GANoutperforms state-of-the-art reconstruction methods qualitatively and quantitativelyDemonstrated that PCC-GAN outperforms state-of-the-art reconstruction methods in both qualitative and quantitative terms

Image2Points:A 3D Point-based Context Clusters GAN for High-Quality PET Image Reconstruction
written by Jiaqi Cui, Yan Wang, Lu Wen, Pinxian Zeng, Xi Wu, Jiliu Zhou, Dinggang Shen
(Submitted on 1 Feb 2024)
Comments: Accepted by ICASSP 2024
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Introduction

In recent years, a number of methods have been proposed to reconstruct standard dose PET (SPET) images from low-dose PET (LPET) images in order to obtain high-quality positron emission tomography (PET) images with minimal radiation exposure.However, these methods rely heavily on voxel-based representations, which can pose the challenge of not fully accounting for precise structure (e.g., size and boundaries of each organ/tissue) and fine context (e.g., relationships and interactions between different organs/tissues).

Therefore, in this paper, a 3D point-based contextual cluster GAN, orPCC-GAN, is proposed to reconstruct high-quality SPET images from LPET images. This PCC-GAN achieves sharper reconstruction by using a point-based representation that can explicitly preserve the complex structure of 3D PET images.In addition, contextual clustering is applied to explore contextual relationships between points to alleviate the ambiguity of small structures in the reconstructed image.

Finally, experiments on both clinical and phantom datasetsdemonstrate that PCC-GAN, the method proposed in this paper, outperforms state-of-the-art reconstruction methodsboth qualitatively and quantitatively.

PCC-GAN

The overall picture of PCC-GAN consists of a hierarchical generator (Generator) and a point-based discriminator (Discriminator), as shown in Fig. 1.

The generator firstconverts theLPET imageinto pointsusing point construction, and then generates residual points between the LPET and SPET using four context cluster (CoC) blocks and four transposed context cluster (TCoC) blocks, respectively. These residual points are then added to the LPET points to generate the predicted PET points. The predicted points are then added back to the image via point regression, yielding the output of the generator, the estimated PET image (denoted EPET). Finally, the point-based discriminator takes a real/fake PET image pair as input and determines its authenticity in terms of points.

Points Construction

To account for the parallax between the image and the points, the input 3D LPET image is first transformed into a set of points by point construction. Let$x∈ℝ^{C×H×W×D}$ (where $C$ is the number of channels and $H, W, D$ are height, width, depth) be the LPET image.

First, we transform$x$ into apoint set $e_p∈ℝ^{C×n} (n=W×H×D)$. Then, to incorporate explicit structural information , we concatenate the3D geometric coordinates of the points $e_c∈ℝ^{3×n}$to $e_p$. This yields apoint set $e_0=\{e_p,e_c\}∈ℝ^{d_0×n_0} (d_0=C+3)$ corresponding to the input LPET image .

Thus, each point contains not only the original features (texture, edges, etc.) but also explicit geometric structure information. The resulting $e_0$ is further sent to the CoC block to unearth contextual relationships.

CoC Block

・Points Reducer

The number of points is reduced at the beginning of each CoC block to reduce computational overhead while facilitating the use of multiscale information.

For the $i$th CoC block, the Points Reducer takes as input the output $e_{i-1}∈ℝ^{d_{i-1},n_{i-1}}$ of the previous block and equally selects anchors $A (A=32,16,8,4)$ in point space.

Then, for each anchor, its $k$-neighborhood points are selected, connected along the channel dimension, and fused by linear projection.

Finally, a new point set $f_i∈ℝ^{d_i×n_i}$ is obtained. However, its number of points is equal to the number of anchors ($=d$).

Thus the number of points is multiplied by 8 and the dimension is doubled for each layer.

・ContextClustering

-Clusters Generating:

Given a point set $f_i$, group all points based on contextual affinity.

First, following the traditional SuperPixel method, SLIC, we propose the centers of cluster $c$ in the point space of $f_i$ and compute the pairwise cosine similarity between each point of $f_i$ and all proposed centers. Then, each point of $f_i$ is assigned to the contextually most similar center to obtain cluster $c$.

Since each point contains both original features and geometric structural knowledge, similarity calculations emphasize not only contextual affinity but also structural locality, thus facilitating the exploration of both contextual and structural relationships.

-Points Aggregating:

To further emphasize contextual relationships, all points within each cluster are dynamically aggregated based on their contextual affinity for the cluster center.

Assuming that the cluster consists of $M$ points and the center in the point space of the cluster is $v^c_i$, the points in the cluster can be expressed as $V_i=\{v_{i,m},s_{i,m}\}^M_{m=1}∈ℝ^{M×d_i}$.

The aggregate point $g_i∈ℝ^d_i$ is a context-similarity weighted sum over the center $v^c_i$ of points in the cluster, formulated as

where $\alpha$ is a learnable parameter for scaling and shifting the similarity, $sig(-)$ means the sigmoid activation function, and $C$ is the normalization factor.

Thus, by aggregating each point according to its contextual affinity, contextual relationships can be accurately described and a compact representation with fine-grained context can be obtained.

・Points Dispatching

Aggregated points $g_i$ are adaptively assigned to each point in the cluster by contextual similarity, facilitating point-to-point intercommunication and enabling the collective sharing of structural and contextual information across the cluster. In particular, for a point $v_{i,m}$ we update

These steps efficiently search for explicit structure and fine-grained context, finally yielding theoutput $e_i∈ℝ^{d_i×n_i}$of the $i$-th block .

TCoC Block

The structure of the TCoC block is very similar to that of the CoC block. The only difference is that the CoC block uses the Points Reducer to reduce the number of points, while the TCoC block uses the Points Expander to increase the number of points.

Asymmetrically to Points Reducer, Points Expander treats every point in the point set as an anchor. For each anchor, a linear projection layer is applied and its channel dimension is expanded by a factor of $k$. The points are then divided into $k$ points along the channel dimension and placed uniformly around the anchors to create an expanded point set that is subject to further processing by context clustering.

As described above, the introduction of the TCoC block allows for the expansion and restoration of points. In addition, a residual connection is used to add the output of the TCoC block to the corresponding CoC block in order to take advantage of the complementary information extracted by the CoC block.

Finally, the last CoC block outputs the residual point between LPET and SPET, which is added to the LPET point $e_0$ and further added back to the image to produce the final output of the generator, the EPET image.

Points-based Discriminator

To improve image quality, a point-based discriminator is incorporated to determine the authenticity of input image pairs.

Unlike conventional patch-based discriminators that identify 3D images in the form of voxel patches, the PCC-GAN discriminator determines the authenticity of an image in terms of points.

First, a real/fake PET image pair (a LPET image and its corresponding real SPET or fake EPET image) is used as input to transform the image into points by Points Construction, and then four CoC blocks are used to learn more discriminative structural knowledge Finally, the input is judged to be real or fake. Finally, a sigmoid function is applied to determine if the input is real or not.

By taking advantage of point-specific advantages, this point-based network can better identify structural discrepancies between real and reconstructed images and provide useful feedback to the generator.

Objective function

The objective function of PCC-GAN consists of estimated error loss and adversarial loss.

The estimated error loss is applied to increase the similarity between the reconstructed EPET image $G(x)$ and the real SPET image $y$, with the L1 loss expressed as

In addition, to maintain consistency in the data distribution between real SPET and EPET images, a hostile loss, defined as

Overall, the PCC-GAN objective function is formulated as follows However $\lambda$ is a hyperparameter to balance the two terms.

Experiment

Data-set

The clinical dataset includes PET images from eight normal control (NC) subjects and eight mild cognitive impairment (MCI) subjects; SPET images were acquired at 12 minutes and LPET images were acquired at 3 minutes to simulate a quarter of the standard dose The LPET image uses image data acquired in 3 minutes to simulate a quarter of the standard dose.

The phantom data set includes 20 simulated subjects taken from the BrainWeb database, with LPET image data obtained by simulating LPET images at one-quarter the normal count level.

The size of PET images in both datasets is 128 x 128 x 128, and 729 overlapping large patches of size 64 x 64 x 64 are extracted from each 3D image. In addition, leave-one-out cross-validation (LOOCV) is performed during training to obtain a more unbiased performance evaluation.

Result

PCC-GAN was compared to six state-of-the-art PET reconstruction approaches, including Auto-Context, Hi-Net, Sino-cGAN, M-UNet, LCPR-Net, and Trans-GAN, with quantitative comparison results on clinical and phantom data sets, respectively Table.1 and Table.2.

From Table.1 and Table.2, it can be seen that PCC-GAN achieves the best results among all evaluation criteria with relatively few parameters in both data sets.

Furthermore, the visualization in Fig. 2 shows that the image produced by PCC-GAN yields the best visual effect with the smallest error. All of these results indicate that PCC-GAN is the best at predicting accurate SPET images.

To prove the clinical value of PCC-GAN, this paper performs a further downstream task, an Alzheimer's disease diagnostic experiment on a clinical data set. as shown in Fig.3, the classification accuracy of PCC-GAN (88.9%) is the closest to that of SPET images (90.0%), indicating that PCC- GAN has great clinical potential in disease diagnosis.

Conclusion

In this paper, a 3D point-based contextual cluster GAN, PCC-GAN, was proposed to reconstruct high-quality SPET images from LPET images. By utilizing the geometric representation of points, PCC-GAN explicitly preserves the complex structure of the 3D PET image, resulting in crisp and clear reconstruction. In addition, by utilizing contextual clustering, PCC-GAN explored fine-grained contextual relationships and reduced ambiguities and omissions in small-sized structures. Further extensive experiments demonstrated the superiority of PCC-GAN.

We felt that what is particularly impressive about the proposed PCC-GAN is that the approach utilizing the geometric representation of points enables clear reconstruction, which has been difficult to achieve with conventional methods. In the future, we look forward to seeing how this method will progress toward practical application in clinical practice, and we feel that the results of this study have the potential to open the way for further development.