Background
Introduction
Related work
Methods
Survival functions as a measure for similarity
-
Scenario 1: Marginal differences between two survival functions will occur when two sub-cohorts are compared with respect to an unimportant attribute. For example, in a cancer therapy group where the survival probabilities are almost equal for the attribute “sex” with its values “male” and “female”. Here, the ABS between both cohorts is very small.
-
Scenario 2: Huge differences are expected when comparing two highly discriminating values of a single attribute with regard to survival. The attribute “metastasis formation” with the values “none” and “end stage”, for example, will probably have an extreme impact on the survival probability in cancer therapy. Here, the sub-cohort with “none” metastasis will have a better survival outcome than the “end stage” group. This leads to a relatively high ABS.
Formal notations and definition
Similarity metrics
Local similarity
Attribute weights and feature selection
Handling of numeric attributes
Global similarity
Results
Implementation
Preprocessing
-
Spelling correction
-
Checking of values for completeness
-
Filtering of attributes and values that are used only for comments
-
Harmonization/aggregation of values with the same meaning
-
Plausibility checks (e.g. numeric attributes may not contain characters, “null” or “unknown”)
Workflow of the similarity measure
Evaluation
Material
Preparation
Biomarker detection
Numeric Biomarker for arm A | Nominal Biomarker for arm A | |||||||
---|---|---|---|---|---|---|---|---|
Mean accuracy (SD) | Mean precision (SD) | Mean recall (SD) | Mean F1-score (SD) | Mean accuracy (SD) | Mean precision (SD) | Mean recall (SD) | Mean F1-score (SD) | |
STSM | 0,944 (0,043) | 0,946 (0,044) | 0,946 (0,044) | 0,946 (0,044) | 0,998 (0,002) | 0,999 (0,001) | 0,993 (0,006) | 0,996 (0,004) |
HEOM | 0,657 (0,013) | 0,678 (0,029) | 0,684 (0,032) | 0,681 (0,03) | 0,831 (0,004) | 0,759 (0,011) | 0,638 (0,013) | 0,694 (0,012) |
DVDM | 0,564 (0,064) | 0,595 (0,057) | 0,596 (0,058) | 0,596 (0,057) | 0,644 (0,046) | 0,401 (0,081) | 0,37 (0,06) | 0,384 (0,07) |
RANDOM | 0,502 (0,007) | 0,536 (0,034) | 0,535 (0,034) | 0,535 (0,034) | 0,582 (0,01) | 0,3 (0,01) | 0,298 (0,011) | 0,299 (0,01) |
Numeric Biomarker for arm B
|
Nominal Biomarker for arm B
| |||||||
Mean accuracy (SD)
|
Mean precision (SD)
|
Mean recall (SD)
|
Mean F1-score (SD)
|
Mean accuracy (SD)
|
Mean precision (SD)
|
Mean recall (SD)
|
Mean
F1-score
(SD)
| |
STSM | 0,909 (0,05) | 0,914 (0,048) | 0,915 (0,048) | 0,915 (0,048) | 0,997 (0,003) | 1 (0) | 0,99 (0,009) | 0,995 (0,005) |
HEOM | 0,661 (0,012) | 0,685 (0,025) | 0,7 (0,019) | 0,692 (0,022) | 0,83 (0,003) | 0,76 (0,009) | 0,648 (0,022) | 0,699 (0,016) |
DVDM | 0,535 (0,012) | 0,573 (0,022) | 0,577 (0,032) | 0,575 (0,025) | 0,671 (0,105) | 0,467 (0,188) | 0,424 (0,151) | 0,444 (0,168) |
RANDOM | 0,505 (0,009) | 0,546 (0,028) | 0,545 (0,03) | 0,546 (0,029) | 0,574 (0,013) | 0,303 (0,013) | 0,303 (0,014) | 0,303 (0,013) |
Determine the weights of attributes
All attributes | Non-biomarkers | Num. biomarker arm A | Nom. biomarker arm A | Num. biomarker arm B | Nom. biomarker arm B | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Avg. Weight | Avg. Weight | Rel. (%) | Weight | Rel. (%) | Weight | Rel. (%) | Weight | Rel. (%) | Weight | Rel. (%) | |
IT#1 | 0.940 | 0.504 | −46 | 3.609 | + 284 | 3.636 | + 287 | 3.875 | + 312 | 3.102 | + 230 |
IT#2 | 0.791 | 0.418 | −47 | 2.950 | + 273 | 2.689 | + 240 | 3.018 | + 281 | 3.469 | + 338 |
IT#3 | 0.929 | 0.548 | −41 | 3.028 | + 226 | 3.035 | + 227 | 3.416 | + 268 | 3.382 | + 264 |
IT#4 | 0.819 | 0.435 | −47 | 3.287 | + 301 | 3.219 | + 293 | 2.962 | + 262 | 3.028 | + 270 |
IT#5 | 0.852 | 0.441 | −48 | 3.445 | + 304 | 3.354 | + 294 | 3.652 | + 329 | 2.827 | + 232 |
IT#6 | 0.903 | 0.459 | −49 | 3.432 | + 280 | 4.109 | + 355 | 3.368 | + 273 | 3.354 | + 271 |
IT#7 | 1.020 | 0.622 | −39 | 3.145 | + 208 | 3.185 | + 212 | 3.587 | + 252 | 3.712 | + 264 |
IT#8 | 0.871 | 0.481 | −45 | 3.145 | + 261 | 3.238 | + 272 | 3.500 | + 302 | 2.972 | + 241 |
IT#9 | 0.951 | 0.547 | −42 | 3.386 | + 256 | 3.593 | + 278 | 3.599 | + 279 | 2.912 | + 206 |
IT#10 | 0.898 | 0.466 | −48 | 3.753 | + 318 | 3.315 | + 269 | 3.233 | + 260 | 3.658 | + 307 |
Mean | 0.897 | 0.492 | −45 | 3.318 | + 271 | 3.337 | + 273 | 3.421 | + 282 | 3.242 | + 262 |
SD | 0.064 | 0.060 | – | 0.242 | – | 0.362 | – | 0.271 | – | 0.300 | – |