Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misbehaving URIs #195

Open
mttcpr opened this issue Dec 7, 2020 · 13 comments
Open

Misbehaving URIs #195

mttcpr opened this issue Dec 7, 2020 · 13 comments
Labels

Comments

@mttcpr
Copy link
Contributor

mttcpr commented Dec 7, 2020

It seems that when a certificate contains a URI that lacks a hostname (e.g. ldap:///cn=xyz) there is some misbehavior decoding it. Specifically, the .native ends up with "ldap:/cn=xyz" - two of the three slashes were removed. calling tbs_cert.dump(force=True) also results in the URI being encoded with only one slash. I spent a little time stepping into the code, but quickly became frustrated trying to figure out where it was coming from. I'm hoping you'll read this and say 'Aha I know exactly why that is happening' and be able to easily patch it.

@joernheissler
Copy link
Collaborator

joernheissler commented Dec 8, 2020

Sounds related to #33. https://github.com/wbond/asn1crypto/blob/master/asn1crypto/_iri.py should contain the code which modifies your URIs.

@mttcpr
Copy link
Contributor Author

mttcpr commented Dec 8, 2020

I tracked it down to use of urlunsplit from urllib/parse.py. If I add 'ldap' to its uses_netloc list, then it produces the correct output. urllib has no specific support for an ldap uri. As much as I would love to kick ldap to the curb myself, this seems somewhat problematic given ldap is definitely still in use for PKI, particularly PKI from Redmond.

I haven't followed these changes, but do we really need this level of data manipulation (i.e. breaking every URI into all its components and reassembling them) to populate the .native field? Perhaps it's not unreasonable to assume URIs should be processed using whatever library is responsible for handling the scheme outside of the certificate implementation.

At this point the only resolution I can see is getting support for ldap in urllib, or, not using this functionality from urllib, at least not in the way it's being used today.

@wbond
Copy link
Owner

wbond commented Dec 8, 2020

I haven't followed these changes, but do we really need this level of data manipulation (i.e. breaking every URI into all its components and reassembling them) to populate the .native field?

The .native property is designed explicitly to take ASN.1 binary data and produce native Python data types. In the case of a URI, the most native Python representation is a unicode string, in normalized form. There is data in the standard about how URIs are compared for equality (linked in the code for the URI class): https://tools.ietf.org/html/rfc5280#section-7.4.

I would imagine we should be able to fix the issue with the ldap URIs without too much work.

Perhaps it's not unreasonable to assume URIs should be processed using whatever library is responsible for handling the scheme outside of the certificate implementation.

In this case, I would say the .native field isn't what you want, and you should probably ask for the raw binary data (.contents) and use it how you see fit. However, as I mentioned above, it isn't super trivial to make sure you are actually comparing URIs properly if you treat them as binary data.

@mttcpr
Copy link
Contributor Author

mttcpr commented Dec 8, 2020

Thanks Will, using .contents is in fact exactly what I did on the read side. The reason this behavior became more problematic for me was the tbs.dump(force=True) resulted in an invalid URL in the output. For now I have sidestepped the observed problem by adding urllib.parse.uses_netloc.append('ldap') to my own __init__, but I am a little worried there may be other cases related to ldap and urlunsplit that haven't presented themselves.

@wbond
Copy link
Owner

wbond commented Dec 8, 2020

If you've got a fix and the time to submit it, I'm sure @joernheissler or myself can review it.

More robust tests for ldap is certainly welcome if you've got some fixtures you can provide.

@joernheissler
Copy link
Collaborator

A full test case, or even just a certificate, would help too.

@mttcpr
Copy link
Contributor Author

mttcpr commented Dec 8, 2020

I do have that band-aide.. not sure runtime modification of urllib's behavior rises to the level of fix? All I did was add this:

import urllib.parse

if 'ldap' not in urllib.parse.uses_netloc:
    urllib.parse.uses_netloc.append('ldap')

Attached is a sample domain controller certificate with affected URIs in both CRLDP and AIA.
dc-cert.zip

I'm afraid I don't have time to dedicate to adding tests right now, but I may be able to carve out some time over the holidays.

@wbond wbond added the bug label Aug 7, 2021
@zito
Copy link

zito commented Dec 21, 2022

Hi,
I have a simple script to dump all user certificates from Active Directory and create their sha1 fingerprint list for usage in Postfix (check_ccert_access) using asn1crypto. Today I have found a bad fingerprint of an user causing mail access failed for him. An openssl fingerprint and a fingerprint obtained from certificates sha1_fingerprint atribute differs. I tried to dump a parsed certificate back to file and found out the problem this bugreport is about. For this time I have to switch the script to execute openssl -fingerprint for every certificate in the list. It is time and resource expensive. I will be happy to return back to asn1parse.
Original DER encoded certificate is jsm.cer, parsed/dumped by asn1crypto is jsm.out.cer:

zito@bobekpc:~/tmp/asn1crypt-bug$ diff -u <(openssl x509 -text -noout -inform der -in jsm.cer) <(openssl x509 -text -noout -inform der -in jsm.out.cer)

--- /dev/fd/63	2022-12-21 15:54:43.323228438 +0100
+++ /dev/fd/62	2022-12-21 15:54:43.323228438 +0100
@@ -68,10 +68,10 @@
                 C0:F4:23:F4:3F:D2:46:D8:FD:4A:54:F7:56:41:AB:8A:E1:96:00:A7
             X509v3 CRL Distribution Points: 
                 Full Name:
-                  URI:ldap:///CN=ICZ%20a.s.%20Issuing%20CA%202,CN=SPRG018,CN=CDP,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=ad,DC=i,DC=cz?certificateRevocationList?base?objectClass=cRLDistributionPoint
+                  URI:ldap:/CN=ICZ%20a.s.%20Issuing%20CA%202,CN=SPRG018,CN=CDP,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=ad,DC=i,DC=cz?certificateRevocationList?base?objectClass=cRLDistributionPoint
                   URI:http://www.i.cz/ca/ICZ%20a.s.%20Issuing%20CA%202.crl
             Authority Information Access: 
-                CA Issuers - URI:ldap:///CN=ICZ%20a.s.%20Issuing%20CA%202,CN=AIA,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=ad,DC=i,DC=cz?cACertificate?base?objectClass=certificationAuthority
+                CA Issuers - URI:ldap:/CN=ICZ%20a.s.%20Issuing%20CA%202,CN=AIA,CN=Public%20Key%20Services,CN=Services,CN=Configuration,DC=ad,DC=i,DC=cz?cACertificate?base?objectClass=certificationAuthority
                 CA Issuers - URI:http://www.i.cz/ca/ICZ%20a.s.%20Issuing%20CA%202.crt
             1.3.6.1.4.1.311.25.2: 
                 0@.>.

jsm.zip

@wbond
Copy link
Owner

wbond commented Dec 21, 2022

I’m happy to review a PR for this if it includes tests.

@MatthiasValvekens
Copy link
Contributor

Incidentally, there's a separate but related issue that will cause capitalisation in certain parts of an URI to be normalised when .dump(force=True) is called---I'd have to dig around to find a good example, but bear with me. Since this type of reencoding arguably goes beyond the requirements of DER, I think it's fair to call that a bug too.

So, while I agree that the parsing issue needs to be fixed, I'd potentially approach the reencoding part of the story a little more generally: IMO the behaviour of .dump(force=True) should be to encode the original string as DER, but it shouldn't apply any further normalisation other than what is required by the letter of the spec.

What do you think?

@zito
Copy link

zito commented Dec 22, 2022

After some investigation today I found a bug, that causes exhibition of this URI LDAP bug. So maybe another issue should be created for it. This problem with ldap slashes should not occur in normal situation after loading of a X509 certificate, because it is DER encoded and there is no reason for its reencoding. I have 851 certificates in AD and failure occurs for 6 of them. By tracing of parsing X509 certificates affected and not affected by the bug a difference is on the lines

        # If the length is indefinite, force the re-encoding
        if self._header is not None and self._header[-1:] == b'\x80':
            force = True

The above code is in core.py on more places.
A condition self._header[-1:] == b'\x80' is not correct. Indefinite encoding can't be determined such easily. All my problematic X509 certificates starts with header

zito@bobekpc:~/tmp/asn1crypt-bug$ head --bytes 4 jsm.cer | xxd
00000000: 3082 0880                                0...

The contents length is 2176 (0x880) octets. The condition above tries probably detect indefinite encoding, but indefinite encoding has value 0x80 at the first and only length octet, not last octet if I comprehend correctly BER.

@wbond
Copy link
Owner

wbond commented Dec 22, 2022

That sounds like a possible bug, but the core bug here is that the DER re-encoding is changing the URL due to a bug in the Python URL functionality in the stdlib.

@zito
Copy link

zito commented Dec 22, 2022

Yes the issue is already created #249 ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants