High speed error log control method in in-memory cluster computing platform

Ryuichi Saito, Shinichiro Haruyama

Research output: Contribution to journalArticlepeer-review


Since 2010, in-memory cluster computing platform has been increasingly used in firms and research institutions to analyze large amounts of datasets within a short amount of time. In these methods, unexpected errors cause the load to exceed the assumption for computer infrastructures such as a monitoring system, owing to the execution of multithreading, assigning divided datasets to multiple nodes, and storing them in in-memory spaces. In this research, we propose a method that notifies administrators with only information needed to understand the situation in a short period by eliminating duplications of numerous application error logs for that period and clustering messages using an unsupervised learning k-means method with an in-memory cluster computing framework “Apache Spark.” By implementing this method, we can demonstrate that it is possible to eliminate duplications of error messages by 93% on an average compared with conventional methods. Further, we can extract significant messages from the application error messages and notify the administrators in an average of 4.2 min from the time of occurrence of the error.

Original languageEnglish
Pages (from-to)310-319
Number of pages10
JournalJournal of information processing
Publication statusPublished - 2020 May


  • Distributed System
  • Error logs
  • K-means
  • Spark
  • TF-IDF

ASJC Scopus subject areas

  • General Computer Science


Dive into the research topics of 'High speed error log control method in in-memory cluster computing platform'. Together they form a unique fingerprint.

Cite this