TY - JOUR
T1 - Determining the possibility of deciphering an unintelligible text by text clustering
T2 - The case of the Voynich Manuscript
AU - Agata, Teru
AU - Agata, Mari
PY - 2009/10/14
Y1 - 2009/10/14
N2 - Purpose: One of the most common approaches to understanding an undeciphered text is to identify and then decipher the underlying code. If a document remains unintelligible or undeciphered for a long period of time even after many attempts at decoding it, the possibility of it being "gibberish" must be considered. This study proposes a method to detect the existence, or non-existence, of a coherent structure within a previously non-translated text in order to determine the possibility of deciphering it. Methods: The present method begins with the assumption that natural languageprocessing methods that are commonly employed in analyzing known languages can be applied to an undeciphered text. To detect a coherent structure in a text, the similarity of every pair of partial document is measured, and then the similarity matrix is analyzed by clustering methods. The next step is to compare the detected structure with the sections suggested by other clues such as illustrations and the page order. Thus, it is determined whether an undeciphered text contains an identifiable structure which corresponds to the latter, or whether it is "gibberish" containing no order or structure. Results: We applied the proposed method to the Voynich Manuscript, which is a renowned undeciphered text. The results clearly demonstrate that the text of the Voynich Manuscript possesses an identifiable structure, and that the structure corresponds to the existing sections of the manuscript suggested by the accompanying illustrations. Thus, the results strongly suggest that the Voynich Manuscript is not "gibberish"; additional attempts to decipher its contents would be justified. The present experiment proves the usefulness of applying this method to a previously non-deciphered text.
AB - Purpose: One of the most common approaches to understanding an undeciphered text is to identify and then decipher the underlying code. If a document remains unintelligible or undeciphered for a long period of time even after many attempts at decoding it, the possibility of it being "gibberish" must be considered. This study proposes a method to detect the existence, or non-existence, of a coherent structure within a previously non-translated text in order to determine the possibility of deciphering it. Methods: The present method begins with the assumption that natural languageprocessing methods that are commonly employed in analyzing known languages can be applied to an undeciphered text. To detect a coherent structure in a text, the similarity of every pair of partial document is measured, and then the similarity matrix is analyzed by clustering methods. The next step is to compare the detected structure with the sections suggested by other clues such as illustrations and the page order. Thus, it is determined whether an undeciphered text contains an identifiable structure which corresponds to the latter, or whether it is "gibberish" containing no order or structure. Results: We applied the proposed method to the Voynich Manuscript, which is a renowned undeciphered text. The results clearly demonstrate that the text of the Voynich Manuscript possesses an identifiable structure, and that the structure corresponds to the existing sections of the manuscript suggested by the accompanying illustrations. Thus, the results strongly suggest that the Voynich Manuscript is not "gibberish"; additional attempts to decipher its contents would be justified. The present experiment proves the usefulness of applying this method to a previously non-deciphered text.
UR - http://www.scopus.com/inward/record.url?scp=70349762704&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349762704&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:70349762704
SN - 0373-4447
SP - 1
EP - 23
JO - Library and Information Science
JF - Library and Information Science
IS - 61
ER -