TY - GEN
T1 - Fast Accurate Discovery of Tuple Inclusion Dependencies
AU - Shen, Mengfei
AU - Kawashima, Hideyuki
AU - Saito, Kazuhiro
N1 - Funding Information:
ACKNOWLEDGEMENT This work is partially supported by JSPS KAKENHI Grant Number 22H03596 and a project, JPNP16007, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Inclusion dependencies (IND) is an important problem in relational database, relevant to data integration, query optimization and various data management tasks. The discovery of IND has been addressed by many studies following different strategies, while IND detection still needs improvement as the complexity and diversity of real-life data increase. Conventional IND is only for column-to-column dimension, which is not applicable to lots of data processing tasks. The concept of dependency can be expanded. Based on the understanding of the conventional IND and approximate approach FAIDA, we present our algorithm for detecting tuple IND, converting column-to-column detection to row-to-row dimension, more in line with real-world data retrieval tasks in distributed system. Through probabilistic and accurate detection and the use of multi-threading, both accuracy and performance are guaranteed and IND detection performance is taken to a new level.
AB - Inclusion dependencies (IND) is an important problem in relational database, relevant to data integration, query optimization and various data management tasks. The discovery of IND has been addressed by many studies following different strategies, while IND detection still needs improvement as the complexity and diversity of real-life data increase. Conventional IND is only for column-to-column dimension, which is not applicable to lots of data processing tasks. The concept of dependency can be expanded. Based on the understanding of the conventional IND and approximate approach FAIDA, we present our algorithm for detecting tuple IND, converting column-to-column detection to row-to-row dimension, more in line with real-world data retrieval tasks in distributed system. Through probabilistic and accurate detection and the use of multi-threading, both accuracy and performance are guaranteed and IND detection performance is taken to a new level.
KW - Data Integration
KW - Data Profiling
KW - Inclusion Dependencies
UR - http://www.scopus.com/inward/record.url?scp=85136139063&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136139063&partnerID=8YFLogxK
U2 - 10.1109/SMARTCOMP55677.2022.00062
DO - 10.1109/SMARTCOMP55677.2022.00062
M3 - Conference contribution
AN - SCOPUS:85136139063
T3 - Proceedings - 2022 IEEE International Conference on Smart Computing, SMARTCOMP 2022
SP - 246
EP - 251
BT - Proceedings - 2022 IEEE International Conference on Smart Computing, SMARTCOMP 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th IEEE International Conference on Smart Computing, SMARTCOMP 2022
Y2 - 20 June 2022 through 24 June 2022
ER -