Skip to content
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.

Source metadata (e.g. end date and num_stories_X) incorrect #726

Open
dsjen opened this issue Sep 23, 2020 · 3 comments
Open

Source metadata (e.g. end date and num_stories_X) incorrect #726

dsjen opened this issue Sep 23, 2020 · 3 comments
Labels

Comments

@dsjen
Copy link
Contributor

dsjen commented Sep 23, 2020

We've found a number of inconsistencies relating to end dates in source manager:

I think I've zeroed in on where the problem may lie. The mediaHealth endpoint returns a different end date from storyCount. For example:

mc.mediaHealth(1363086)
{'coverage_gaps': 1,
 'coverage_gaps_list': [{'expected_sentences': 569.03,
   'expected_stories': 2.69,
   'media_id': 1363086,
   'num_sentences': 102.71,
   'num_stories': 3.0,
   'stat_week': '2020-02-10 00:00:00-05:00'}],
 'end_date': '2020-02-17 00:00:00-05:00',
 'expected_sentences': 569.03,
 'expected_stories': 2.69,
 'has_active_feed': False,
 'is_healthy': False,
 'media_health_id': 1115839766,
 'media_id': 1363086,
 'num_sentences': 0,
 'num_sentences_90': 0,
 'num_sentences_w': 0,
 'num_sentences_y': 224.09,
 'num_stories': 0,
 'num_stories_90': 0,
 'num_stories_w': 0,
 'num_stories_y': 1.44,
 'start_date': '2019-02-11 00:00:00-05:00'}
fq='publish_day:[2010-01-01T00:00:00Z TO 2020-09-23T00:00:00Z]'
q='media_id:1363086 AND NOT tags_id_stories:8875452'
mc.storyCount(solr_query=q, solr_filter=fq, split=True)

{'counts': [{'count': 1, 'date': '2019-02-13 00:00:00'},
  {'count': 1, 'date': '2019-02-15 00:00:00'},
  {'count': 1, 'date': '2019-02-20 00:00:00'},
  ...
  {'count': 9, 'date': '2020-09-18 00:00:00'},
  {'count': 1, 'date': '2020-09-20 00:00:00'},
  {'count': 3, 'date': '2020-09-21 00:00:00'},
  {'count': 7, 'date': '2020-09-22 00:00:00'}]}

Note the end date in media health is 2020-02-17 00:00:00-05:00 and num_stories_90 = 0 and the final date in the split story count is 2020-09-22 00:00:00.

@dsjen dsjen added the bug label Sep 23, 2020
@hroberts
Copy link
Contributor

hroberts commented Sep 23, 2020 via email

@hroberts
Copy link
Contributor

The media health job has finished running and seems to have caught the data up. Can you please take a look and tell me if it looks better now?

@dsjen
Copy link
Contributor Author

dsjen commented Sep 25, 2020

Sorry, I hit the endpoint and the dates still don't match up. 😢

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants