Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulling Non Subscribed Pulses #84

Open
GTownson opened this issue Jan 17, 2018 · 13 comments
Open

Pulling Non Subscribed Pulses #84

GTownson opened this issue Jan 17, 2018 · 13 comments
Assignees

Comments

@GTownson
Copy link

I have only subscribed to Alienvault, but the TI plugin seems to be pulling Pulses that I haven't subscribed to. This is causing issues as I have about 50k threats detected as the plugin has pulled the pulse "dont subscribe". This pulse contains only one indicator which is 8.8.8.8, so when anyone in our organisation tries to query Google's DNS we are getting threat indicated.

I am not fully sure that this is an issue in the plugin, or some kind of configuration issue on our end, but some help would be greatly appreciated. Other than this issue, what a great plugin, keep up the good work!

@jasonkeller
Copy link

jasonkeller commented Feb 1, 2018

I can verify this @jalogisch , and better I can point you and @dennisoelkers at exactly what's wrong. Not to demean anyone's efforts on it (and it is appreciated), but until we fix the below issues it will be absolutely useless to anyone that attempts to use it due to the significant amounts of noise generated.

From https://github.com/Graylog2/graylog-plugin-threatintel/blob/master/src/main/java/org/graylog/plugins/threatintel/functions/otx/OTXLookupResult.java ...

public static OTXLookupResult buildFromIntel(OTXIntel intel) {
        if(intel.getPulseCount() > 0) {

The API doesn't prune pulse responses by API key (a lookup will return all pulses that match that indicator regardless of subscription, which is a serious oversight IMHO, but I digress). This means that you have zero pulse filtering since you're just counting them.

What you should be doing is looping through the JSON construct within the pulse output from your API call to find the key "is_subscribing" and only match pulses that have a value of 1 on this key...this is the way the API 'filters' pulse results for your API key. Notice in my example below the only pulse that matches is a 0, yet I end up with a 1 for the single result.

Also if it has {"source" : "whitelist"} in the validation section, it's not to be flagged, ever. It's there to prevent boneheads from flagging things like Google DNS and OpenDNS from automated sensors and making a huge mess, hence what @GTownson is running into with Google DNS 8.8.8.8 and what I'm running into with OpenDNS 208.67.220.220.

{
  "single_value": 1,
  "multi_value": {
    "sections": [
      ......
    ],
    ......
    "pulse_info": {
      "count": 1,
      "references": [],
      "pulses": [
        {
          ......
          },
          "pulse_source": "api",
           ......
          "is_subscribing": 0,
    ..................................................
    "validation": [
      {
        "source": "whitelist",
        "message": "contained in 208.67.220.220",
        "name": "Whitelisted IP"
      }

Unfortunately none of this can reliably be processed or compensated for in the pipeline constructs due to the lack of loops and control structures. It has to be fixed in the plugin to operate properly. I know hardly anything about Java else I'd submit a pull request (did I mention I'm not a programmer?); hopefully the pointers laid out above will suffice to enable you to make a quick fix.

@dennisoelkers
Copy link
Member

Thanks for the valuable input, @jasonkeller. It helps a lot fixing this issue.

@dennisoelkers dennisoelkers self-assigned this Feb 1, 2018
@dennisoelkers
Copy link
Member

@GTownson, @jasonkeller: I was fiddling with the OTX API all day and to me it seems like the subscription information is not returned properly. At least I was not able to produce a API response where is_subscribing was set to 1 on any pulse that I was actually subscribed to when querying an indicator. Can you verify/disprove this?

@jasonkeller
Copy link

@dennisoelkers you aren't going crazy - I'm perfectly able to replicate your results.

Running down the reason why, however, seems to partly be the difference between following an author and following the pulse itself. Most people will subscribe to authors, not individual pulses (like Alienvault themselves). The portion we should be looking at in that circumstance appears to be here...

          ............
          "author": {
            "username": "AlienVault",
            "is_subscribed": 0,
            "avatar_url": "https://otx20-web-media.s3.amazonaws.com/media/avatars/AlienVault/resized/80/unnamed (1).jpg",
            "is_following": 0,
            "id": "2"
          },
          ............

So unfortunately we need to look in both places (following isn't supposed to include it, but subscribing is supposed to pull in all the pulses according to their verbiage https://www.alienvault.com/documentation/otx/subscribing-following-otx-contributors.htm).

But even then, at the moment I'm still seeing a zero in this field even though I know that I'm subscribed to their feed. Looks like this may be a bug that needs to be filed with the Alienvault OTX team.

On the other hand, we could make this more efficient with querying it differently as well...

We could do this first...(to get all the IDs of the subscribed pulses for that user, which is pruned, along with the pulse names and such we may want to enrich the message with later).
/api/v1/pulses/subscribed

Then recurse into the following using the pulse IDs retrieved above...
/api/v1/pulses/{pulse_id}/indicators

And pull out and pre-cache all the indicators we want from the subscribed pulses (you could even add file hashes and the like). Then all the indicators will be pre-cached for fast lookup, and we significantly reduce the amount of API calls that we make to OTX at the same time.

@dennisoelkers
Copy link
Member

Thanks for the input and verification, @jasonkeller.

We have a separate caching layer in the lookup table facility, which allows us to cache results quite flexibel, so introducing any caching in the OTX data adapter is misplaced. Tracking the subscribed pulses of the current user to imitate the subscription feature might be an option though, at least as a temporary workaround until the bug is fixed at the OTX side.

I will try to contact someone at their end and see what they are saying.

@jasonkeller
Copy link

@dennisoelkers i wasn't clear that i wasn't suggesting caching in the data adapter, merely having the data adapter mass-ingest indicator data from the pulses and preload the cache present in Graylog's lookup table facility.

@dennisoelkers
Copy link
Member

I have reached out to AlienVault regarding the API responses and they are looking into it.

@jasonkeller
Copy link

@dennisoelkers

I'm not sure if you've gotten a response from Alienvault thus far, however I figured I'd chime in with some additional findings. I'm currently using their Python SDK, and after messing around with it for a while it seems they are pushing people to simply pull in all the pulses they are subscribed to and parse from there. Obviously that wouldn't work too well if you don't have an API key, but when you do have an API key it's looking like the preferred method. I can show you what I'm doing in Python if you like (hopefully it maps closely to their Java SDK).

They also seem to have some modification-date parameters so you can selectively pluck new changes to your pulses after an initial full-sync. Realizing what you're trying to do with Graylog, I think this seems a far better way to do it rather than an individual API lookup per indicator hit; certainly far-more performant.

@dennisoelkers
Copy link
Member

@jasonkeller, I am still waiting for their response. What you are describing would work as a workaround, but being able to let the API prefilter the pulses for a potential IoC would both reduce code complexity and the chance for race conditions a lot, so I hope I am getting a positive response from AV regarding that. If I am getting a negative or no answer, implementing the pulse sync is an option, although not the preferred one.

@jasonkeller
Copy link

@dennisoelkers

I figured I'd put that out there since that's how OSSIM and USM (their own products) leverage OTX, along with others. And by using the getall() function in the Python SDK, the API only returns a list of the pulses and IOCs that I'm subscribed to, as it is supposed to. Here's the Python code I'm using with their SDK, and it works perfectly.

_result = otx.getall()
_indicators = []
for _pulse in _result:
    for _indicator in _pulse['indicators']:
        if _indicator['type'] == 'IPv4':
            _indicators.append({'pulse_name': _pulse['name'], 'tags': _pulse['tags'], 'type': _indicator['type'],
                               'indicator': _indicator['indicator'], 'modified': _pulse['modified']})

It's really the only way to guarantee high performance without saturation/slowdown via repetitive API calls. It's like Infoblox - if I needle it with single requests at a time, it takes ages to get anything out of it (and in the case of OTX, you'll hit throttle limits rather quickly). Hit it with a batch call, and I can pre-cache all the values in a tiny fraction of the time and not disturb them with further requests until a refresh interval, and even then I only have to ask for a differential.

With the current way this OTX plugin is architected, we'll never be able to use it in production against the flows we're wanting to inspect (10k-15k flows per second), even if they manage to fix up the subscription tags on the indicator itself for your queries.

@dennisoelkers
Copy link
Member

@jasonkeller You are right, I think. The worries I had about this approach were related to the unpredictable memory footprint of having to retain all pulses in memory and jumping through hoops to find the right sweet spot between refreshing too often and working with stale pulses. But I think even those issues are probably less painful than hammering their API, which seems to be "okay" when considering a scenario where only a number of known IPs are queried for (mitigated by our caching layer) but quickly turns into a nightmare when thinking e.g. every filehash of program executions being checked.

Thanks for the input. I will implement a first version pretty soon.

@GTownson
Copy link
Author

GTownson commented Mar 1, 2018

@dennisoelkers Hi Dennis, for now could I just subscribe to individual pulses, would the TI plugin just pull these in? I need to try and find a work around for the meantime as one of our clients requires threat intel.

@GTownson
Copy link
Author

@dennisoelkers @jasonkeller Hi guys, there any updates on this issue? I have had to create my own work around. I use a script to pull from Alien Vault's API then parse the json output into separate csv files based on the indicator type then use lookup tables to perform the lookup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants