TY - GEN
T1 - Traffic Feature-Based Botnet Detection Scheme Emphasizing the Importance of Long Patterns
AU - An, Yichen
AU - Haruta, Shuichiro
AU - Choi, Sanghun
AU - Sasase, Iwao
N1 - Funding Information:
This work is partly supported by the Grant in Aid for Scientific Research (No. 17K06440) from Japan Society for Promotion of Science (JSPS).
Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - The botnet detection is imperative. Among several detection schemes, the promising one uses the communication sequences. The main idea of that scheme is that the communication sequences represent special feature since they are controlled by programs. That sequence is tokenized to truncated sequences by n-gram and the numbers of each pattern’s occurrence are used as a feature vector. However, although the features are normalized by the total number of all patterns’ occurrences, the number of occurrences in larger n are less than those of smaller n. That is, regardless of the value of n, the previous scheme normalizes it by the total number of all patterns’ occurrences. As a result, normalized long patterns’ features become very small value and are hidden by others. In order to overcome this shortcoming, in this paper, we propose a traffic feature-based botnet detection scheme emphasizing the importance of long patterns. We realize the emphasizing by two ideas. The first idea is normalizing occurrences by the total number of occurrences in each n instead of the total number of all patterns’ occurrences. By doing this, smaller occurrences in larger n are normalized by smaller values and the feature becomes more balanced with larger value. The second idea is giving weights to the normalized features by calculating ranks of the normalized feature. By weighting features according to the ranks, we can get more outstanding features of longer patterns. By the computer simulation with real dataset, we show the effectiveness of our scheme.
AB - The botnet detection is imperative. Among several detection schemes, the promising one uses the communication sequences. The main idea of that scheme is that the communication sequences represent special feature since they are controlled by programs. That sequence is tokenized to truncated sequences by n-gram and the numbers of each pattern’s occurrence are used as a feature vector. However, although the features are normalized by the total number of all patterns’ occurrences, the number of occurrences in larger n are less than those of smaller n. That is, regardless of the value of n, the previous scheme normalizes it by the total number of all patterns’ occurrences. As a result, normalized long patterns’ features become very small value and are hidden by others. In order to overcome this shortcoming, in this paper, we propose a traffic feature-based botnet detection scheme emphasizing the importance of long patterns. We realize the emphasizing by two ideas. The first idea is normalizing occurrences by the total number of occurrences in each n instead of the total number of all patterns’ occurrences. By doing this, smaller occurrences in larger n are normalized by smaller values and the feature becomes more balanced with larger value. The second idea is giving weights to the normalized features by calculating ranks of the normalized feature. By weighting features according to the ranks, we can get more outstanding features of longer patterns. By the computer simulation with real dataset, we show the effectiveness of our scheme.
KW - Botnet detection
KW - Detection algorithms
KW - Feature emphasizing
UR - http://www.scopus.com/inward/record.url?scp=85072851734&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072851734&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-31254-1_22
DO - 10.1007/978-3-030-31254-1_22
M3 - Conference contribution
AN - SCOPUS:85072851734
SN - 9783030312534
T3 - Advances in Intelligent Systems and Computing
SP - 181
EP - 188
BT - Image Processing and Communications - Techniques, Algorithms and Applications, IP and C 2019
A2 - Choras, Michal
A2 - Choras, Ryszard S.
PB - Springer Verlag
T2 - International Conference on Image Processing and Communications, IP and C 2019
Y2 - 11 September 2019 through 13 September 2019
ER -