Data sets
Empirical evaluation of our method requires two types of data. The first one is the efficacies with their known corresponding ESHGs as the gold standard data, and the second one is the prescriptions accompanying with their efficacies as the data to be mined for detecting ESHGs. Due to the lack of gold standard data widely accepted for our task, we intentionally collect efficacies having known reliable ESHGs from Internet , and invite a TCM expert to evaluate these collected ESHGs, finally obtain a data set of 14 efficacies with their corresponding ESGHs for our evaluation. For instance, the efficacy
“activating blood to resolve stasis(
)” has a known ESHG “
(Flos Carthami, Semen Persicae, Rhizoma Ligustici Chuanxiong)”. For the second type of data, we extract the prescriptions having the efficacies in the gold standard dataset from the prescription database
1 and thus construct the dataset to be mind to detect ESGHs for the efficacies in the standard dataset.And the numbers and the examples of ESGHs of 14 efficacies are given in Table
1.
For a specific efficacy, we treat prescriptions with the efficacy as positive samples and prescriptions without the efficacy as negative samples. Because of the data skew for positive and negative samples, we utilize an under-sampling method to obtain a balanced dataset in a ratio of positive to negative samples of 1:1 for each efficacy. And the numbers of prescriptions extracted for each of the 14 efficacies are listed in Table
2.
Evaluation method
The purpose of our work is to capture the essential herbs in a prescription for a given efficacy by a classification process and then discover ESHGs from the essential herbs in prescriptions. Hence, the evaluation metrics are the accuracies of the discovered ESHGs instead of the classification. Due to the lack of gold-standard data widely accepted for our task, we could not find out all possible ESHGs for calculating the recall metric.
For a given efficacy, there are multiple gold standard ESHGs, as shown in Table
1, and our algorithm is also likely to generate multiple ESHGs for it according to the support threshold. In order to evaluate the effectiveness of the generated ESHGs, we have to compare them against the corresponding gold standard ESHGs for the same efficacy. Obviously, the evaluation is a comparison between two sets, therefore, we propose to employ Dice coefficient for the evaluation. To be specific, suppose
e is an efficacy with its gold standard ESHGs
\(S^e\),
\(A^e\) is the ESHGs returned by our algorithm for
e, and the Dice coefficient on
e is
$$\begin{aligned} Dice_e=\frac{2|A^e\bigcap S^e |}{|A^e |+|S^e |} \end{aligned}$$
(7)
Because
\(|S^e|\) is fixed in the evaluation process, we ignore it and the formula (
7) becomes
$$\begin{aligned} Acc_e=\frac{|A^e\bigcap S^e |}{|A^e | } \end{aligned}$$
(8)
The identification accuracy on the whole gold standard test set is
$$\begin{aligned} Acc_e=\frac{1}{14}\sum _{e=1}^{14}Acc_e \end{aligned}$$
(9)
The above metric is in essence exact-matching-based, i.e. a pair of ESHGs from two sets respectively have to be matched exactly in order to contribute to the accuracy. The exact matching requirement is unreasonable because it treats equally both of the partial matching and mismatching situations, thus it is necessary to incorporate different matching situations into unified accuracy measurement. But unfortunately, the solution is not so obvious due to that we have no any knowledge about alignment between the two sets of ESHGs to be compared. In this paper we employ a greedy strategy with which an ESHG from A
\(^e\) is aligned with the ESGH from S
\(^e\) who has maximum overlapping with the ESHG from A
\(^e\) in their contained herbs. Formally, given an ESHG
\(h_g\) from A
\(^e\), its correctness relative to S
\(^e\) is defined as formula (
10).
$$\begin{aligned} Correctness(hg)=\max _{hg'\in S^e } \frac{2|hg\bigcap hg'|}{|hg|+|hg'|} \end{aligned}$$
(10)
where
\(|hg \bigcap hg'|\) is the number of the same herbs in
hg and
\(hg'\). |
hg| and
\(|hg'|\) are the herbal numbers in two ESHGs, respectively.
Based on formula (
10), the identification accuracy of our algorithm for a given efficacy
e, when it identifies
\(A^e\), but the corresponding gold standard ESHGs is
\(S^e\), is adapted from formula (
8) to formula (
11).
$$\begin{aligned} Acc_e=\frac{\sum _{hg\in A^e}Correctness(hg)}{|A^e|} \end{aligned}$$
(11)
We finally utilize the formula (
9) where the
\(Acc_e\) is calculated through formula (
11) to evaluate the effectiveness of our algorithm.
The effect of ESHGs detection
For a given efficacy, its positive and negative samples first pass through the hierarchical attentive neural network and with the model all herbs in these samples obtain their corresponding attentive weights relative to the sample prescription they belong to. After that, all herbs in a positive prescription are ranked in descending order of their attentive weights, and the top N herbs are chosen to form the corresponding distilled prescription. In order to improve the quality of the distilled prescriptions, the positive and negative samples are fed into the hierarchical attentive neural network 10 times with its different random initializations of the parameters, thus resulting in 10 groups of the attentive weights. Afterwards, all the 10 attentive weights for an herb in a positive prescription are summed, and the herbs in a positive prescription are ranked in descending order of their summed attentive weights, and the top N herbs are retained as the elements of the corresponding distilled prescription.
The resultant distilled prescriptions are then fed to the Apriori algorithm for mining ESHGs of the efficacy. We investigate different settings for the parameter N, and also for K, the size of an itemset in the Apriori algorithm, and the support threshold min_sup in the Apriori algorithm. Furthermore, we also run the Apriori algorithm with the same parameter settings (i.e. the K and min_sup) on all the positive prescriptions without the distilling process, and compare the results to the corresponding results obtained with the distilling process. Furthermore, in order to verify the stability of our two-stage approach, we perform the two-stage processing 10 times on the same positive and negative samples, and the 10 experimental results in terms of \(Acc_e\) for an efficacy are averaged. The average \(Acc_e\) and the corresponding 95% confidence interval are reported hereafter to represent the effectiveness of the approach for an efficacy.
Table
4 gives the average
\(Acc_e\) of our two-stage approach for the 13 test efficacies, the whole average
\(Acc_e\) for all these efficacies and the corresponding performances of the simple Apriori algorithm without the distilling process under the conditions
\(N = 8\) and
\(min\_sup = 0\) combined with different
K \(\in\) {
2,
3,
4}.
The experimental results in Table
4 shows that our two-stage approach with the distilling process based on the hierarchical attentive neural network consistently outperforms the counterpart approach based on the simple Apriori without the distilling process. Our hierarchical attentive neural network employs two attentive layers to capture the correlations among herbs in a prescription as well as essential herbs in a prescription for its efficacy, which makes the distilled prescription cleaner and clearer in its efficacy description, thus improving the ESHGs detection based on the Apriori algorithm. For instance, for the efficacy
“activating blood to resolve stasis (
)”, without the distilling process the number of the frequent
2-itemsets (i.e.
\(K = 2\)) reaches to about
5230, but the distilling process reduces the number to about
2400 and at the same time the accuracy in the ESHGs detection increases 5.48%. A slight exception among the accuracies of Table
4 arises on the efficacy
“relieving pain(
)” due to the small-sized positive dataset for this efficacy, which consists of just 71 samples. On the contrary, the efficacies
“dispersing phlegm(
)” and
“invigorating spleen(
)”, have much more positive samples, therefore, the resultant ESHGs for them are also much more accurate.
Furthermore, we also investigate empirically the impact of different
min_sup settings on the effectiveness of ESHGs detection. To be specific, we set
min_sup to be the
M-th largest support value among all the
K-itemsets occurring in the prescription set, and inspect the effectiveness of the Apriori algorithm on the prescription set. Figures
4,
5,
6 and
7 demonstrate comparisons of our two-stage approach with the raw Apriori in the average effects of the 13 efficacies for various combinations of the parameter settings
M,
K and
N.
We can observe from Figs.
4,
5,
6 and
7 that, for
\(K =2\), our approach performs much better that the raw Apriori algorithm even we adopt aggressive
N (when
N is set to be a small value such as
5, we aggressively filter the herbs in a prescription and retain at most only
5 herbs in a prescription). However, when
\(K = 3\) and
4, the aggressive distilling process leads to an obvious decrease in the performance. As we increase
N (for instance,
\(N = 8\) ), the effectiveness is improved even when
\(K = 3\) and
4. Therefore, we reported the results in Table
4 with
\(N = 8\).
Effect of identifying essential herbs
In this subsection, we further verify the effect of the hierarchical attentive neural network for capturing essential herbs in a prescription for its efficacy. For the purpose we collect some additional prescriptions with efficacies from the 14 ones in Table
1. As the collecting results, we obtain 8 prescriptions for every efficacy from
“activating blood to resolve stasis(
)”,
“invigorating spleen(
)”,
“ arresting cough(
)” and
“tranquillization(
)”, in total 32 prescriptions. We invite a TCM professional to annotate manually the essential herbs of these prescriptions for their corresponding efficacies. After that, we feed each of them into the trained hierarchical attentive neural network for the corresponding efficacy and therefrom fetch the attentive weights (i.e. the attentive weights of the second attentive layers) for the herbs in a prescription. We sort all the herbs in a prescription in a descending order of their attentive weights, compare them against the annotated essential herbs and accordingly evaluate the effect of our approach for the essential herb detection.
As for the evaluation metric, we employ a traditional one, namely
MAP (Mean Average Precision), which is widely used for quantitative analysis of ranking algorithms in information retrieval and search engines. Suppose a prescription
p has
\(n_p\) essential herbs annotated by the TCM professional, the average precision (
AP) of the hierarchical attentive neural network on this prescription is defined as:
$$\begin{aligned} AP_p=\frac{1}{n}\times \sum _{i=1}^{n_p} \frac{i}{position(i)} \end{aligned}$$
(12)
where
position(i) is the position of the essential herb
i in the ranking list returned by our hierarchical attentive neural network for the prescription
p. Furthermore, if an efficacy
e has
\(m_e\) prescriptions (here
\(m_e = 8\) for all the four efficacies), the
MAP of the neural network for
e is
$$\begin{aligned} MAP_e=\sum _{p=1}^{m_e}AP_p \end{aligned}$$
(13)
Table
5 demonstrates the effects of the hierarchical attentive neural network for detecting essential herbs in prescriptions in terms of
\(AP_p\) and
\(MAP_e\). In addition, in Fig.
8 we give a visualized demonstration of the attentive weights for a sample prescription with its herbs shaded differently to express their respective attentive weight values. Observing Table
5 and Fig.
8, we can conclude that our hierarchical attentive neural network indeed is able to capture essential herbs in a prescription for its efficacy, gaining more than 60% of the
\(AP_p\) for majority of the prescriptions. The extreme exception arises for the sixth prescription of the efficacy
“tranquillization(
)”, for which the
\(AP_p\) is only 9.55%. In order to get insight into the reason of the extreme exception, we analyze manually the prescription and find that, it is composed of herbs
“Rhizoma Ligustici Chuanxiong(
),
Radix Paeoniae Alba(
),
Radix Rehmanniae Recens(
),
Caulis Spatholobi(
),
Folium Mori(
),
Flos Chrysanthem(
),
Tribulus terrestris(
),
Herba Mentha(
),
Rhizoma et Radix Notopterygi(
),
Semen Ziziphi Spinosae(
),
Radix Ginseng (
)” and labeled wrongly with the auxiliary efficacy
“tranquillization(
)”, but in fact its primary efficacy should be
“clearing liver heat and restraining liver yang(
)”. The label noise results in the erroneous essential herbs identified by the hierarchical attentive neural network.