TY - GEN
T1 - ML-Based Compliance Verification of Data Processing Agreements against GDPR
AU - Amaral, Orlando
AU - Abualhaija, Sallam
AU - Briand, Lionel
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Most current software systems involve processing personal data, an activity that is regulated in Europe by the general data protection regulation (GDPR) through data processing agreements (DPAs). Developing compliant software requires adhering to DPA-related requirements in GDPR. Verifying the compliance of DPAs entirely manually is however time-consuming and error-prone. In this paper, we propose an automation strategy based on machine learning (ML) for checking GDPR compliance in DPAs. Specifically, we create, based on existing work, a comprehensive conceptual model that describes the information types pertinent to DPA compliance. We then develop an automated approach that detects breaches of compliance by predicting the presence of these information types in DPAs. On an evaluation set of 30 real DPAs, our approach detects 483 out of 582 genuine violations while introducing 93 false violations, achieving thereby a precision of 83.9% and recall of 83.0%. We empirically compare our approach against an existing approach which does not employ ML but relies on manually-defined rules. Our results indicate that the two approaches perform on par. Therefore, to select the right solution in a given context, we discuss differentiating factors like the availability of annotated data and legal experts, and adaptation to regulation changes.
AB - Most current software systems involve processing personal data, an activity that is regulated in Europe by the general data protection regulation (GDPR) through data processing agreements (DPAs). Developing compliant software requires adhering to DPA-related requirements in GDPR. Verifying the compliance of DPAs entirely manually is however time-consuming and error-prone. In this paper, we propose an automation strategy based on machine learning (ML) for checking GDPR compliance in DPAs. Specifically, we create, based on existing work, a comprehensive conceptual model that describes the information types pertinent to DPA compliance. We then develop an automated approach that detects breaches of compliance by predicting the presence of these information types in DPAs. On an evaluation set of 30 real DPAs, our approach detects 483 out of 582 genuine violations while introducing 93 false violations, achieving thereby a precision of 83.9% and recall of 83.0%. We empirically compare our approach against an existing approach which does not employ ML but relies on manually-defined rules. Our results indicate that the two approaches perform on par. Therefore, to select the right solution in a given context, we discuss differentiating factors like the availability of annotated data and legal experts, and adaptation to regulation changes.
KW - Data Processing Agreement (DPA)
KW - Machine Learning (ML)
KW - Natural Language Processing (NLP)
KW - Regulatory Compliance
KW - Requirements Engineering (RE)
KW - The General Data Protection Regulation (GDPR)
UR - http://www.scopus.com/inward/record.url?scp=85174423372&partnerID=8YFLogxK
U2 - 10.1109/RE57278.2023.00015
DO - 10.1109/RE57278.2023.00015
M3 - Conference contribution
AN - SCOPUS:85174423372
T3 - Proceedings of the IEEE International Conference on Requirements Engineering
SP - 53
EP - 64
BT - Proceedings - 31st IEEE International Requirements Engineering Conference, RE 2023
A2 - Schneider, Kurt
A2 - Dalpiaz, Fabiano
A2 - Horkoff, Jennifer
PB - IEEE Computer Society
T2 - 31st IEEE International Requirements Engineering Conference, RE 2023
Y2 - 4 September 2023 through 8 September 2023
ER -