To generate training data for the machine learning framework, the simulations were run for a total of 300 tuples (
E, σ
E, s, α) within the set
S such that:
$$\begin{array}{c}S=\{\left(E,{\sigma }_{E},s,\alpha \right):E\in \{\mathrm{5.6,5.8,6.0,6.2,6.4}\},{\sigma }_{E}\in \{\mathrm{0.0,0.5,1.0}\},\\ s\in \{\mathrm{0.0,0.1,0.2,0.3,0.4}\},\alpha \in \{\mathrm{0,1},\mathrm{2,3}\}\}.\end{array}$$
(1)
At the first simulation stage, 10
8 histories (a history corresponds to a single electron of the virtual primary beam) were simulated for each tuple (
E, σ
E, s, α) and the phase-space file (PSF) above the secondary collimators was saved for further purposes. At this first stage, the splitting roulette variance reduction technique [
20] was used with the size of the splitting region set to the largest region, i.e. to the 40 × 40 cm
2 field. The saved PSFs were then used to simulate radiation transport to a homogeneous cubic water phantom for three fields: 3 × 3 cm
2, 10 × 10 cm
2, and 30 × 30 cm
2. The size of the phantom was set to 50 × 50 × 50 cm
3. The doses in the phantom were tallied within a regular grid of 0.5 × 0.5 × 0.5 cm
3 voxels. The respective faces of the phantom were set parallel to the respective main axes of the coordinate frame of reference of the accelerator. The main axis of the phantom coincided with the photon beam axis. The source-to-surface distance (SSD) was set at 100 cm, the isocentre being located at the front surface of the phantom. Splitting in the water phantom was selected as the variance reduction method [
20] at this simulation stage, with a splitting factor of 300. The uncertainty of the dose values tallied in the water phantom always remained within 1.5% (which corresponds to two standard deviations of MC calculated dose). The calculated 3D spatial distribution of doses within the phantom was saved to a text file, separately for each tuple (
E, σ
E, s, α) and for each field. A total of 900 3D dose files were collected. Each 3D dose file contained 10
6 dose values calculated by PRIMO at (
x,
y,
z) coordinates given by the following coordinate ranges:
$$\begin{array}{c}x\in \{-25+0.25+0.5*i,i=1...100\}\\ y\in \{-25+0.25+0.5*j,j=1...100\}\\ z\in \left\{0.25+0.5*k,k=1...100\right\},\end{array}$$
(2)
where the
z axis is parallel to the radiation field axis. To generate testing data for the machine learning framework, the simulations were run further for 25 tuples (
E, σ
E, s, α) with primary beam parameters sampled randomly from the following sets of values:
$$\begin{array}{c}E\in \{5.65+i\cdot 0.05,i=0...14\}\setminus \{\mathrm{5.8,6.0,6.2}\}\\ {\sigma }_{E}\in \{0.1+i\cdot 0.1,i=0...8\}\setminus \{0.5\}\\ s\in \{0.05+i*0.1,i=0...3\}\\ \alpha \in \{0.5+i*0.25,i=0...9\}\setminus \{\mathrm{1,2}\}\end{array}$$
(3)
Applying the above sampling scheme, it was assured that the primary electron beam parameters (
E, σ
E, s, α) in the testing set never coincided with parameters used for generating the training set, and consequently, that the electron beam parameters for the testing set were well separated from the electron beam parameter selected for training.
All simulations were run using the PlGrid infrastructure (Prometheus grid,
https://kdm.cyfronet.pl/portal/Main_page) and required a total real time of about 2.5 months. During the simulation period 12 Prometheus nodes run the PRIMO software, each node equipped with two Intel Xeon E5-2680v3 processors, 24 cores in total, and 128 GB RAM. The simulation of a single case, i.e., of three fields for a single tuple (
E, σ
E, s, α), required about 40 CPU hours. As the operating system installed on the nodes is Linux CentOS 7, while PRIMO is a Windows application,
wine software (
https://www.winehq.org/) was installed and configured in order to use PRIMO in graphic mode under Linux exactly as if Windows were the operating system.