Model Inversion Efficacy & Qualitative Vulnerability Examples from LLMs

cover
29 Jul 2025

Abstract and I. Introduction

II. Related Work

III. Technical Background

IV. Systematic Security Vulnerability Discovery of Code Generation Models

V. Experiments

VI. Discussion

VII. Conclusion, Acknowledgments, and References

Appendix

A. Details of Code Language Models

B. Finding Security Vulnerabilities in GitHub Copilot

C. Other Baselines Using ChatGPT

D. Effect of Different Number of Few-shot Examples

E. Effectiveness in Generating Specific Vulnerabilities for C Codes

F. Security Vulnerability Results after Fuzzy Code Deduplication

G. Detailed Results of Transferability of the Generated Nonsecure Prompts

H. Details of Generating non-secure prompts Dataset

I. Detailed Results of Evaluating CodeLMs using Non-secure Dataset

J. Effect of Sampling Temperature

K. Effectiveness of the Model Inversion Scheme in Reconstructing the Vulnerable Codes

L. Qualitative Examples Generated by CodeGen and ChatGPT

M. Qualitative Examples Generated by GitHub Copilot

K. Effectiveness of the Model Inversion Scheme in Reconstructing the Vulnerable Codes

In this work, the main goal of our inversion scheme is to generate the non-secure prompts that lead the model to generate

TABLE IX: The number of discovered vulnerable codes generated by the CodeGen and ChatGPT models using the promising non-secure prompts generated by ChatGPT. We employ our FS-Code method to generate non-secure prompts and codes. Columns two to thirteen provide results for Python codes. Columns fourteen to nineteen give the results for C Codes. Column fourteen and nineteen provides the number of found vulnerable codes with the other CWEs that CodeQL queries. For each programming language, the last column provides the sum of all codes with at least one security vulnerability.

TABLE X: The number of vulnerable Python and C codes generated by various models using our non-secure prompt dataset. The results demonstrate the number of generated vulnerable codes among the five most probable model outputs. Columns two to thirteen provide results for Python codes. Columns fourteen to nineteen give the results for C Codes. Column fourteen and nineteen provides the number of found vulnerable codes with the other CWEs that CodeQL queries. For each programming language, the last column provides the sum of all codes with at least one security vulnerability.

Fig. 8: Number of the discovered vulnerable Python codes using different sampling temperatures. The results show the number of generated vulnerable codes using different sampling temperatures in generating non-secure prompt and codes. We employ our FS-Code method to sample vulnerable codes for three CWEs (CWE-020, CWE-022, and CWE-079).

Figure 9a and Figure 9b show the success rate of reconstructing Python and C codes, respectively. Figure 9a shows that ChatGPT has higher success rates in reconstructing target Python codes than CodeGen over different thresholds. Furthermore, Figure 9a shows a high reconstruction success rate even for high similarity scores such as 80, 85, and 90 for both of the models. For example, ChatGPT has an almost 55% success rate on threshold 80. Listing 6 provides an example of the target Python code (Listing 6a) and the reconstructed code (Listing 6b) using our FS-Code approach. Listing 6b is generated using ChatGPT model, showing the closest code to the target code among the 255 sampled codes (Based on the fuzzy similarity score). The code examples in Listing 6a and Listing 6b have a fuzzy similarity score of 85. These two examples implement the same task with slight differences in variable definitions and API use. Figure 9b shows that CodeGen and ChatGPT has a close success rate over the different threshold. We also observe that CodeGen has higher success rates in higher similarity scores, such as 80 and 85. In general, Figure 9b shows that the models have lower success rates for C codes in comparison to Python codes (Figure 9a). This was expected, as we need higher complexity in implementing C codes than Python codes. Listing 7 provides an example of the target C code (Listing 7a) and the reconstructed code (Listing 7b) using our FS-Code approach. Listing 7b is generated using CodeGen model, showing the closest code to the target code among the 255 sampled codes (Based on the fuzzy similarity score). The code examples in Listing 7a and Listing 7b have a fuzzy similarity of score 68. The target C code implements different functionality compared to generated code, and the two codes only overlap in some library functions and operations.

L. Qualitative Examples Generated by CodeGen and ChatGPT

Listing 8 and Listing 9 provide two examples of vulnerable Python codes generated by ChatGPT. Listing 8 shows a Python code example that contains a security vulnerability of type CWE-022 (Path traversal). Listing 9 provides a Python code example with a vulnerability of type CWE-089 (SQL injection). In Listing 8, the first eight lines are the non-secure prompt, and the rest of the code example is the completion for the given non-secure prompt. The code contains a path traversal vulnerability in line 23. In Listing 9, the first eight lines are the non-secure prompt, and the rest of the code example is the completion for the given non-secure prompt. The code in Listing 9 contains an SQL injection vulnerability in line 22.

Listing 10 and Listing 11 provide two examples of vulnerable C codes generated by CodeGen. Listing 10 and Listing 11 provide C code with multiple vulnerabilities of type CWE787 (out-of-bounds write). In Listing 10, lines 1 to 7 are the non-secure prompt, and the rest of the code example is the completion for the given non-secure prompt. The code contains a vulnerability of type CWE-787 in line 25. In Listing 11, the first nine lines are the non-secure prompt, and the rest of the code example is the completion for the given non-secure prompt. The code in Listing 11 contains several out-of-bounds write vulnerabilities in lines 10, 11 and 17.

M. Qualitative Examples Generated by GitHub Copilot

Listing 12 and Listing 13 show two examples of the generated codes by GitHub Copilot that contain security vulnerabilities. Listing 12 depicts a generated code that contain CWE-022, which is known as path traversal vulnerability. In this example, lines 1 to 6 are the non-secure prompt, and the rest of the code is the completion of the given non-secure prompt. The code in Listing 12 contains a path traversal vulnerability at line 10, where it enables arbitrary file write during tar file extraction. Listing 13 shows a generated code that contains CWE-079, this issue is related to cross-site scripting attacks. Lines 1 to 8 of Listing 13 contain the input non-secure prompt, and the rest of the code is the completion of the non-secure prompt. The code in this figure contains a cross-site scripting vulnerability in line 12.

Listing 8: A vulnerable Python code example generated by ChatGPT. The code contains a CWE-022 vulnerability in line 23. In this example, the first eight lines are the non-secure prompt, and the rest of the code is the completion of the given non-secure prompt.

Fig. 9: The success rate of generating target codes over different thresholds of code similarity. The codes are generated using our FS-Code approach. We use fuzzy matching as the code similarity threshold.

Listing 6: Python code reconstructed using our FS-Code approach. The vulnerable part of the target Python code was used as the last part of the FS-Code prompt. (a) represents the target code that contains a CWE-611 vulnerability. The first nine lines are the prompt, and lines 10 to 12 are the vulnerable part of the code. (b) shows the closest generated code to the target code generated by the ChatGPT model. In the generated code, lines 1 to 5 are p

Listing 9: A vulnerable Python code example generated by ChatGPT. The code contains a CWE-089 vulnerability in line 22. In this example, the first ten lines are the non-secure prompt, and the rest of the code is the completion of the given non-secure prompt.

Listing 7: C code reconstructed using our FS-Code approach. The vulnerable part of the target C code was used as the last part of the FS-Code prompt. (a) represents the target code that contains a CWE-476 vulnerability. The first six lines are the prompt, and lines 7 to 24 are the vulnerable part of the code. (b) shows the closest generated code to the target code generated by the CodeGen model. Here, lines 1 to 4 are the prompt. The fuzzy similarity score between (a) and (b) is 68.

Listing 10: A vulnerable C code example generated by CodeGen. The code contains a severe CWE-787 vulnerability in line 25. In this example, the first seven lines are the nonsecure prompt, and the rest of the code is the completion of the given non-secure prompt.

Listing 11: A vulnerable C code example generated by CodeGen. The code contains multiple vulnerabilities of type CWE-787 (lines 10, 11 and 17). In this example, the first nine lines are the non-secure prompt, and the rest of the code is the completion of the given non-secure prompt.

Listing 12: A vulnerable code example generated by GitHub Copilot. The code contains a CWE-022 vulnerability in line 10. In this example, the first six lines are the non-secure prompt, and the rest of the code is the completion of the given nonsecure prompt.

Listing 13: A vulnerable code example generated by GitHub Copilot. The code contains a CWE-079 vulnerability in line 12. In this example, the first eight lines are the non-secureprompt, and the rest of the code is the completion of the given non-secure prompt.

Authors:

(1) Hossein Hajipour, CISPA Helmholtz Center for Information Security ([email protected]);

(2) Keno Hassler, CISPA Helmholtz Center for Information Security ([email protected]);

(3) Thorsten Holz, CISPA Helmholtz Center for Information Security ([email protected]);

(4) Lea Schonherr, CISPA Helmholtz Center for Information Security ([email protected]);

(5) Mario Fritz, CISPA Helmholtz Center for Information Security ([email protected]).


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.