New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context Manager - Taint Propagation Issue #797
Comments
Hi @giusepperaffa, is this still an issue? Sorry for the (very) late reply. |
Hi @arthaud Yes, this is still an issue. Your help would be greatly appreciated. Thank you very much. |
Hi @giusepperaffa, I need more information to be able to help you here. I'm not quite sure which flow we are supposed to find. It is from Could you try to make the function with that code as small as possible, then add a Also, if possible, use the latest version of pyre. For that, you can install |
Hi @arthaud, the diagram below provides a simplified call graph of my application: graph TD;
Function_With_Source-->Function_With_Context_Manager
Function_With_Context_Manager-->updateDynamoDBTable
The following observations should further clarify my issue:
As suggested, I have re-run the analysis with Please let me know whether this allows you to understand the issue. Otherwise, I will share the code in a zip archive. I would leave the option of upgrading to the latest version of |
Hi @giusepperaffa, From what I understand, this is the flow that you are expecting: downloadPath = source()
# ^^^^^ tainted
outputFileA = os.path.join('/tmp', 'outputFileA.txt')
with open(downloadPath, 'r') as inputFileObj, open(outputFileA, 'w') as outputFileObjA:
# ^^^^^^^^^^^^ tainted
inputFileContentsList = inputFileObj.readlines()
# ^^^^^^^^^^^^^^^^^^ tainted
outputFileObjA.write(inputFileContentsList[0])
# ^^^^^^^^^^^ tainted
# outputFileA is tainted
updateDynamoDBTable(outputFileA, outputFileB)
# ^^^^^^^^^^^^ sink Unfortunately, Pysa won't be able to catch this flow. This would be similar to: x = MyClass()
y = f(x)
y.append(source())
sink(x) After |
Hi @arthaud, Thank you very much for your reply and for investigating this issue. I have now understood the root cause of this issue, which can now be closed. Thank you very much again. |
Description
I have been trying to use Pysa (Ubuntu 20.04 + virtual environment + Python 3.8) to perform a data flow analysis including the following code, which is part of a function. The source is included in a different function, whereas the sink is in the function
updateDynamoDBTable
. The inter-procedural nature of the analysis is not causing issues though, as the portion of the code where the taint propagation breaks is the one reported below:All seems to point to how Pysa deals with taint propagation within a context manager. I am struggling to understand how this can be solved, if at all. Any support would be greatly appreciated.
Pysa models
To support my analysis, I added the following Pysa models. Model 1 allows propagating a taint through the
write
method for a file (as far as I know, this is not the default Pysa behaviour), whereas models 2 and 3 ensure that the taint is propagated when the context manager performs the initialization steps. Note: model 2 and 3 were written after analysing the call graph withpyre_dump_call_graph()
.def io.TextIOBase.write(self, __s: TaintInTaintOut[Updates[self]]): ...
def open(file: TaintInTaintOut[LocalReturn]): ...
def io.IOBase.__enter__(self: TaintInTaintOut[LocalReturn]): ...
Taint analysis
Following a similar approach to #795, I have instrumented the code with the function
reveal_taint
. These are my findings:reveal_taint
called immediately after the initializationoutputFileA = os.path.join('/tmp', 'outputFileA.txt')
shows thatoutputFileA
has the expected backward taint and no forward taint. To my mind, this is correct, as it confirms that Pysa is rightly identifying the final sink in theupdateDynamoDBTable
function. The variableoutputFileA
is initialized with untainted literals, so it cannot have a forward taint.reveal_taint
called immediately before the lineoutputFileObjA.write(inputFileContentsList[0])
shows thatoutputFileObjA
has no taint.reveal_taint
called immediately after the lineoutputFileObjA.write(inputFileContentsList[0])
shows thatoutputFileObjA
has the taint propagated viainputFileContentsList
. This is the result that I expected.reveal_taint
called immediately after the context manager execution confirms thatoutputFileObjA
still has the taint acquired within the context manager body.Note: The original source (not visible in the code provided) has a taint that is propagated via
downloadPath
, theninputFileObj
and finallyinputFileContentsList
.Conclusion
The variable
outputFileObjA
acquires the taint from the intended source within the context manager body, and Pysa detects inoutputFileA
the back-propagated taint from the final sink. Therefore, after adding context manager-specific models (see above), I expected Pysa to be able to identify the entire data flow, but I have a false negative instead. Thank you very much.The text was updated successfully, but these errors were encountered: