Background
Methods
Sequences collection and alignments
The logic mining software system
-
The logic rules that are able to individuate each type of virus, expressed as a combination of the nucleotide positions (e.g., the rule IF (pos437 = A) and (pos486 = C) THEN BK is to be interpreted as "if the nucleotide in position 437 is A and the nucleotide in position 486 is C then the sequence belongs to virus type BK").
-
The classification statistics (confusion matrices, average and variances of error rates obtained with different sampling strategies).
The feature selection step
The formula extraction step
The application of the logic mining software
Results and discussion
Logic mining analysis on each virus class
LT | ST | VP1 | VP2 | VP3 | TOTAL | |
---|---|---|---|---|---|---|
KIPyV
| 8 | 27 | 10 | 8 | 8 | 61 |
MCPyV
| 13 | 28 | 3 | 2 | 2 | 48 |
WUPyV
| 14 | 16 | 14 | 23 | 14 | 81 |
BKPyV
| 0 | 0 | 192 | 192 | 192 | 576 |
JCPyV
| 0 | 0 | 406 | 405 | 405 | 1216 |
TOTAL
| 35 | 71 | 625 | 630 | 621 | 1982 |
Species | Genes | Formulas | Coverage |
---|---|---|---|
BK | VP1,VP2,VP3 | (pos437 = A) AND (pos486 = C) | 1.00 |
JCV | VP1,VP2,VP3 | not(pos338 = C) AND (pos532 = C) | 1.00 |
KIV | ST, LT,VP1,VP2,VP3 | not(pos294 = T) AND not(pos358 = T) AND not(pos521 = T) AND not(pos532 = G) | 1.00 |
MCV | ST,LT | (pos199 = A) AND not(pos286 = T) | 1.00 |
WUV | ST, LT, VP1, VP2, | not(pos286 = T) AND pos425 = A AND not(pos474 = G) | 1.00 |
Logic mining analysis over 21 gene regions
Species | Formulas | Coverage |
---|---|---|
BKVP1 | (pos504 = T) AND (pos518 = C) | 1.00 |
BKVP2 | (pos410 = T) AND (pos554 = A) | 1.00 |
BKVP3 | (pos518 = A) AND (pos521 = G) | 1.00 |
JCVP1 | (pos410 = G) AND (pos466 = T) | 1.00 |
JCVP2 | (pos383 = A) AND (pos417 = G) | 1.00 |
JCVP3 | (pos161 = G) AND (pos406 = A) | 1.00 |
KIVLT | (pos417 = G) AND (pos472 = C) | 1.00 |
KIVST | (pos360 = T) AND (pos381 = A) | 1.00 |
KIVP1 | (pos239 = C) AND (pos457 = G) | 1.00 |
KIVP2 | (pos518 = C) AND (pos547 = A) | 1.00 |
KIVP3 | (pos406 = G) AND (pos472 = C) | 1.00 |
MCVLT | (pos457 = C) AND (pos547 = C) | 1.00 |
MCVST | (pos417 = A) AND (pos504 = T) | 1.00 |
MCVP1 | (pos521 = C) AND (pos547 = A) | 1.00 |
MCVP2 | (pos521 = C) AND (pos547 = A) | 0.00 |
MCVP3 | (pos521 = C) AND (pos547 = A) | 0.00 |
WUVLT | (pos547 = A) AND (pos554 = A) | 1.00 |
WUVST | (pos504 = G) AND (pos554 = A) | 1.00 |
WUVP1 | (pos122 = C) | 1.00 |
WUVP2 | (pos521 = C) AND (pos554 = A) | 1.00 |
WUVP3 | (pos518 = C) AND (pos547 = G) | 1.00 |