PROGRAMA DE PÓS-GRADUAÇÃO DOUTORADO EM CIÊNCIA DA COMPUTAÇÃO

Banca de PROPOSTA DE TESE/DISSERTAÇÃO: OTÁVIO CURY DA COSTA CASTRO

2024-09-06 08:43:08.417

Uma banca de PROPOSTA DE TESE/DISSERTAÇÃO de DOUTORADO foi cadastrada pelo programa.
DISCENTE: OTÁVIO CURY DA COSTA CASTRO
DATA: 20/09/2024
HORA: 14:00
LOCAL: Sala virtual (https://meet.google.com/ecd-ukit-pxq)
TÍTULO: Source Code Expertise: Improving Knowledge Models and Assessing Generative AI Impact
PALAVRAS-CHAVES: software repository mining, code expertise, knowledge concentration, generative artificial intelligence
PÁGINAS: 87
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação
RESUMO: Identifying developer source code expertise is valuable in various Software Engineering contexts. Knowledgeable developers best perform tasks such as code review, and assisting newcomers. Numerous source code knowledge models for identifying experts have been proposed, making it a well-explored research topic. However, gaps in the literature must be addressed to make these models and their applications more accurate. Additionally, the increasing integration of Generative Artificial Intelligence (GenAI) tools will impact various software engineering domains, including these knowledge models. This study analyzes different history development variables to create more accurate knowledge models for identifying source code experts. Our goal includes performing a comparative study of these models, and applying them in software development contexts, focusing on a knowledge concentration metric. We also aim to gather data on the use of GenAI tools, such as ChatGPT, in open-source projects to understand how AI-generated code influences these models. We begin by investigating the correlation between development history variables and knowledge in source code files. We extract measures of variables from public and private repositories and survey developers to collect data on their knowledge of the files they contributed to. Using these measurements, we propose a linear model and machine learning classifiers and compare their performance with existing models in identifying experts in source code files. We then apply the proposed models to a Truck Factor algorithm, verifying its performance with data from public and private repositories. Additionally, we build a dataset relating code expertise information with ChatGPT use. We assess how much of the generated code matches what was added to the files, attributing these contributions to ChatGPT. This allows us to quantify the impact of GenAI on the knowledge models. As a result, from the correlation study, we found that First Authorship and Recency of Modification have the highest correlations with source code knowledge. Regarding the proposed models, the machine learning classifiers outperformed linear techniques with an F-Score of 71% to 73% in identifying experts. Additionally, using the proposed models, the Truck Factor algorithm identified developers missed by the previous expertise model, achieving the best average F-Score of 74%. Developers perceived this modified Truck Factor algorithm as more accurate. In ongoing research, we aim to obtain quantitative and qualitative findings on the impact of ChatGPT on knowledge models in source code. Our goal is to provide insights into the validity of these metrics in an increasingly automated software development environment and offer recommendations for future research in this domain.
MEMBROS DA BANCA:
Presidente - 998.971.133-04 - GUILHERME AMARAL AVELINO
Interno - 1930277 - DAVI VIANA DOS SANTOS
Interno - 751.764.243-04 - VINICIUS PONTE MACHADO
Externo à Instituição - ANDRÉ CAVALCANTE HORA - UFMG
Externo à Instituição - LINCOLN SOUZA ROCHA - UFC
Co-orientador externo à instituição - PEDRO DE ALCANTARA DOS SANTOS NETO - UFPI

DCCMAPI/CCET

inicio da opção idioma

Banca de PROPOSTA DE TESE/DISSERTAÇÃO: OTÁVIO CURY DA COSTA CASTRO