Data source
The study was based on the meta-analysis database of smoking and related diseases in the Chinese population published elsewhere [
20,
21]. The PubMed, Embase, Cochrane, CNKI, WanFang and VIP databases were searched. All publicly published cohort and case‒control studies of smoking and related diseases in the Chinese population from the database establishment to June 30
th, 2021, were collected, including Chinese and English language studies. The included studies were (1) original research and full text available; (2) conducted in Chinese populations with representativeness; (3) case‒control or cohort studies (prospective, retrospective cohort studies and nested case‒control studies); and (4) reported odds ratios (OR), or RR, hazard ratios (HR). Excluded studies were: (1) duplicate articles or full text not available; (2) non-population studies: genetic or cellular studies, animal experiments, etc.; (3) special populations: pregnant women, newborns, psychiatric patients, coal miners, etc.; and (4) lack of key variables or abnormal values. A total of 12,998 papers were retrieved in the final stage, and the quality was evaluated using the Newcastle‒Ottawa‒Scale (NOS) [
22]. We included literature with NOS scores up to 5 and above.
Data extraction and processing
The following elements were extracted (1) basic information: research date, places, sample size, sex, age, study type and published journals; (2) outcomes: lung cancer; (3) smoking exposure: smoking status (smoking or not, current or past smoking), cigarettes smoked per day, pack-years and years of smoking or cessation; and (4) effect values and confidence intervals, models, and correction factors.
The data processing stage included (1) deleting the former results from the same study and (2) removing abnormal data, such as point values less than the lower limit of the confidence interval and RR less than 1 for lung cancer caused by smoking. For the pack-years of smokers, quit-years of quitters and RRs for lung cancer death or prevalence, the median of the upper and lower limits of the interval was taken for closed intervals and 1.2 times the lower limit of the interval was taken for open intervals [
23‐
25].
Dose‒response RR models
Referring to previous studies [
26], we assumed that the risk of lung cancer from smoking was similar in both sexes, and we did not distinguish the prevalence and mortality of lung cancer, as in GBD studies. This study focused on fitting the dose‒response relationship function RR(x) between pack-years of smoking and the RR of lung cancer. The following 10 linear and nonlinear candidate models were built to fit the dose‒response relationship.
The first and second alternative models assumed linear relationships in RR, and these two models were modified from Cohen et al. [
27].
Model 1 is a piecewise linear function, assuming a linear relationship between pack-years and RR, with a cutoff value of 30 pack-years, as 30 pack-years is mostly used as the maximum dose in current literature studies, assuming that RR is fixed after pack-years reach 30. The expressions are \(x<30,y=\alpha +\gamma \times x; x\ge 30,y=\alpha +\gamma \times 30\).
Model 2 is also a piecewise linear function. The difference from Model 1 is that the boundary value is taken as 45 package years, and the expressions are \(x<45,y=\alpha +\gamma \times x; x\ge 45,y=\alpha +\gamma \times 45\).
The third and fourth alternative models assumed power relationships in RR, and were modified from Cohen et al. [
27] and Ostro et al. [
28].
Model 3 assumes that RR grows exponentially as a power with the increase in pack years, and the power function expression is \(y={\{\left(1+x\right)\}}^{\gamma }\).
Model 4 is a power function, and the expression is \(y={\{\left[\frac{\alpha +x}{\alpha }\right]\}}^{\gamma }\).
Model 5 is also a power function with the expression \(y=1+\alpha \times {x}^{\gamma }\).
The sixth, seventh, and eighth models were modified from Pope et al. in 2009 and in 2011 [
29,
30].
Model 6 assumes that RR grows exponentially with the increase in pack years, and the exponential function expression is \(y=\alpha -\beta \times {\gamma }^{x}\).
Model 7 is also an exponential function with the expression \(y=\alpha \times {\beta }^{x}\).
Model 8 is also an exponential function, and the expression is \(y=\alpha +\beta ({\mathrm{e}}^{\{\frac{x+\gamma }{\theta }\}})\).
The last two models were built based on the integrated exposure response (IER) function [
31].
Model 9 is the IER function, which slows down the overall growth rate by increasing the parameter -γ, especially when x takes higher values, with the expression \(y=1+\alpha (1-{e}^{-\gamma \times x})\)
.
Model 10 is the IER function, which further uses β to restrict x. The expression of the function is \(y=1+\alpha (1-{e}^{-\gamma \times {x}^{\beta }})\).
In the above expression, x is the pack-years, y is the RR, and α, β, γ and θ are the parameters to be adjusted in the models. During the fitting process, the parameter range was initially limited by referring to the model established by GBD in the air pollution study [
31], the model parameters were limited according to the actual smoking exposure range, and the final values of the parameters were obtained after several iterations. The degree of model fit was judged according to the Akaike information criterion (AIC) and Bayesian information criterion (BIC) values. The model with the smallest AIC and BIC values was selected as the best-fit model.
For former smokers, we established the RR(y) function, borrowing ideas from GBD [
32], to avoid overestimating the RR of quitters with a lighter smoking history. The combined dichotomous RR from meta-analysis was used as the RR corresponding to the starting point (0 years of quitting), namely, the RR at the time of quitting is equivalent to the average RR of current smokers in the same population. Then, the final quit function was obtained by correcting the RR [
32].