Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V3 api, more flexible formulas, persist storage #14

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

tsbernar
Copy link

Add support for custom formulas and multiple factors.
Make multiple sensors instead of extra state attributes.
Make sensors SensorStateClass.MEASUREMENT for graphing and statistics support.
Persist the rolling window of data to save API calls after restarts and allow for a larger lookback window.

@tsbernar tsbernar mentioned this pull request Apr 20, 2023
Add support for custom formulas
Make multiple sensors instead of extra state attributes.
Persist rolling window of data to save API calls on restart.
Takes a start and end offset and returns total rainfall between them.
 - For example, start_hour = -72, end_hour = 0 will show the total rainfall in the last 72 hours.
*configurable lookback days, default to 30
*better API rate limit handling with configurable settings
  - backfills will now happen more slowly as permitted by limits
  - limit tracking data is persisited
* add some tests
@tsbernar
Copy link
Author

Rebased with main repo and added a few more features:

*Implement new sensor type, total_rain.
-Takes a start and end offset and returns total rainfall between them.

  • For example, start_hour = -72, end_hour = 0 will show the total rainfall in the last 72 hours.
    *Configurable lookback days, default to 30
    *better API rate limit handling with configurable settings
  • backfills will now happen more slowly as permitted by limits
  • limit tracking data is persisted
  • add some tests
  • fix merge issues
  • run backfill tasks in the background, default to 10 requests on every 30s cycle, bound by per hour and per day API rate limits. Reserve enough requests to always be able to request next 24hrs of data.

@petergridge
Copy link
Owner

With the new repository I get

2023-04-23 02:53:41.114 ERROR (MainThread) [homeassistant.components.sensor] Error while setting up openweathermaphistory platform for sensor
Traceback (most recent call last):
  File "/workspaces/core/homeassistant/helpers/entity_platform.py", line 304, in _async_setup_platform
    await asyncio.shield(task)
  File "/workspaces/core/config/custom_components/openweathermaphistory/sensor.py", line 172, in async_setup_platform
    await _async_setup_v3_entities(add_entities, hass, config, units)
  File "/workspaces/core/config/custom_components/openweathermaphistory/sensor.py", line 233, in _async_setup_v3_entities
    await sensor_registry.async_load()
  File "/workspaces/core/config/custom_components/openweathermaphistory/sensor.py", line 403, in async_load
    await self._weather_history.async_load()
  File "/workspaces/core/config/custom_components/openweathermaphistory/weatherhistory.py", line 103, in async_load
    if data["hour_rolling_window"]:
KeyError: 'hour_rolling_window'

I guess your json structure has changed, any hints to clear the persisted data

@tsbernar
Copy link
Author

tsbernar commented Apr 23, 2023

Dang. I’ll fix that, but I’m away from my computer right now.

For now, you should be able to delete the file under .storage/openweathermaphistory.history (STORAGE_KEY in the const file)

@petergridge
Copy link
Owner

That helped, moving onto the next issue :) I love testing other peoples code, sure beats people finding bugs in mine.

2023-04-23 03:16:30.565 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/workspaces/core/config/custom_components/openweathermaphistory/weatherhistory.py", line 156, in backfill_chunk
    await self._async_update_for_datetime(end_dt)
  File "/workspaces/core/config/custom_components/openweathermaphistory/weatherhistory.py", line 231, in _async_update_for_datetime
    return self.add_observation(data)
  File "/workspaces/core/config/custom_components/openweathermaphistory/weatherhistory.py", line 239, in add_observation
    rain = json_data["rain"]["1h"] if "rain" in json_data else 0
KeyError: '1h'

interesting that the data returned from the API has 'rain': {'3h': 1} not 'rain': {'1h': 1}

{'dt': 1681714800, 'sunrise': 1681720747, 'sunset': 1681761041, 'temp': 18.48, 'feels_like': 18.08, 'pressure': 1013, 'humidity': 65, 'dew_point': 11.79, 'clouds': 34, 'wind_speed': 5.33, 'wind_deg': 338, 'wind_gust': 5.73, 'weather': [{'id': 500, 'main': 'Rain', 'description': 'light rain', 'icon': '10n'}], 'rain': {'3h': 1}}

Another question, what is the behaviour if I set up two sensors with different locations? How will the persistent storage work and API counts.

@petergridge
Copy link
Owner

if it helps this is the URL/location I am running for:

url: https://api.openweathermap.org/data/3.0/onecall/timemachine?lat=-33.8715&lon=-33.8715&dt=1681714800&appid={API_KEY}&units=metric

weatherhist.py line 68, you have both lat and lon using latitude.

@petergridge
Copy link
Owner

https://openweathermap.org/history tells me that 3h is the rainfall for the last 3 hrs, so we need to subtract the previous 2 hours rainfall to get this hours rainfall. Why would they do this to us?!

@tsbernar
Copy link
Author

Thanks! This is all really helpful debugging info.

Very annoying that they have the '3h' rain samples; I hadn't come across that yet in my location and didn't see it in the v3 docs. It looks like they mix '3h' and '1h' and then also report the same number 3 times in a row for the '3h'

image

I'm not quite sure what to make of this; what do you think the "correct" total rain is in this period?
image

Maybe 1.0 + 1.13 + 1.0 + 1.31 ?

- use const variables instead of literals
- store data unique to a lat/lon location
- handle reading data from older schema
- clean up some noisy logs around storing and loading data
- log path when storing and loading data
@tsbernar
Copy link
Author

tsbernar commented Apr 23, 2023

For your other comments:

  • Fixed the json loading so it won't break when the schema changes

  • Fixed the default longitude loading

  • I like the idea of including the forecast! I think it gets trickier with integrating forecast data into you're irrigation logic instead of just historical data though. For example, if the forecast shows it is likely to rain later today, maybe you skip this morning's irrigation.. but then tomorrow, if it turns out it did not actually rain, you'd want to tweak your irrigation to make up for the one you skipped in anticipation of rain. My initial thought is to expose a sensor for a historical data factor and a sensor for a forecast data factor, and then in the irrigation logic, you would use forecast rain + past rain + past irrigation to make your decision. I saw you had some other irrigation projects that I haven't yet looked at, so maybe you're already thinking about this?

  • For multiple locations, you can set up a config with multiple entries like this and split your API limits across them:

sensor: 
  - platform: openweathermaphistory
    api_key: 'key'
    v3_api: True
    max_api_calls_per_hour: 30
    max_api_calls_per_day: 200
    lookback_days: 30
    resources:
      - name: rainfactor_default_location
        type: default_factor
        data:
          watertarget: 0.5
      - name: rainfactor_with_custom
        type: custom
        data:
          formula: 'max( (0.5 - day0rain - day1rain/2 - day2rain/4 - day3rain/8 - day4rain/16) / 0.5, 0)'
      - name: 48hr_rain
        type: custom
        data:
          formula: day0rain + day1rain
  - platform: openweathermaphistory
    api_key: 'key'
    v3_api: True
    max_api_calls_per_hour: 30
    max_api_calls_per_day: 200
    lookback_days: 6
    latitude:  -33.8302547
    longitude: 151.1516128
    resources:
      - name: rainfactor_aus_locatoin
        type: default_factor
        data:
          watertarget: 0.5
      - name: rainfactor_with_custom_aus
        type: custom
        data:
          formula: 'max( (0.5 - day0rain - day1rain/2 - day2rain/4 - day3rain/8 - day4rain/16) / 0.5, 0)'
      - name: 24hr_rain_aus
        type: custom
        data:
          formula: day0rain
 

The persistence will now store in a file with the location included in the name.

I'm open to suggestions on how to handle multiple locations better, I only have 1 location for mine.
Maybe you could override the location at each sensor instead of setting up a new platform like this?

sensor: 
  - platform: openweathermaphistory
    api_key: 'key'
    v3_api: True
    max_api_calls_per_hour: 60
    max_api_calls_per_day: 400
    lookback_days: 30
    resources:
      - name: rainfactor_default_location
        type: default_factor
        data:
          watertarget: 0.5
      - name: rainfactor_with_custom
        type: custom
        data:
          formula: 'max( (0.5 - day0rain - day1rain/2 - day2rain/4 - day3rain/8 - day4rain/16) / 0.5, 0)'
      - name: 48hr_rain
        type: custom
        data:
          formula: day0rain + day1rain
      - name: rainfactor_aus_locatoin
        type: default_factor
        latitude:  -33.8302547
        longitude: 151.1516128
        data:
          watertarget: 0.5
      - name: rainfactor_with_custom_aus
        type: custom
        latitude:  -33.8302547
        longitude: 151.1516128
        data:
          formula: 'max( (0.5 - day0rain - day1rain/2 - day2rain/4 - day3rain/8 - day4rain/16) / 0.5, 0)'
      - name: 24hr_rain_aus
        latitude:  -33.8302547
        longitude: 151.1516128
        type: custom
        data:
          formula: day0rain

Maybe we should also lower the default API rate limit settings so that 2-3 locations can be supported without having to mess with including rate limits in the config?

@petergridge
Copy link
Owner

For the '3h' issue I would simply divide the value by 3 that should be accurate enough and given they provide 3 periods with the same data it logically makes sense.

Add the forecast as a new sensor makes sense, I was not planning to use it in my factor calculation but it opens up a lot of opportunities for the future

I think the first option for multiple sensors is best as it matches the way HA supports sensors.

  • My preference is defaulting to 5 days of data to support the UI and calculation model limiting the start up load to only 120 calls for each sensor and then letting it build up naturally to a longer 30 day limit.
  • Provide service to download additional days of history so advanced users can get data faster if required
  • The end user can then be responsible for not overdoing the calls

@petergridge
Copy link
Owner

I can see that there is a lot happening, the calls are made regularly but, no sensor is created in HA. Are you seeing the same at your end?

@tsbernar
Copy link
Author

The sensors are working on my end, could you share the config you’re using ?

@petergridge
Copy link
Owner

I copied your GIT repository, I can try downloading again.

I'm using the docker dev container and Visual Studio Code as my environment.

@tsbernar
Copy link
Author

Oh I was talking about the config for your sensor so I can try to replicate on my end

@petergridge
Copy link
Owner

Ah, sorry, here is the yaml

sensor:
  - platform: openweathermaphistory
    name: 'rainfactor new'
    api_key: 6e5dd5b87a55018adee10ab2c7ed6f96
    v3_api: True
    lookback_days: 5

@tsbernar
Copy link
Author

tsbernar commented Apr 24, 2023

Got it, so you’ll need to add individual sensors under the resources list. (Borrowed the config naming from https://www.home-assistant.io/integrations/systemmonitor/)

The way it works now is you have one “platform” per lat/lon location, and then each “platform” can have multiple sensors under its “resources” list. Maybe we should just add the default sensor if none are specified to shrink down the minimal config?

Something like this should work to just give you the default sensor on your default location:

sensor:
  - platform: openweathermaphistory
    api_key: 6e5dd5b87a55018adee10ab2c7ed6f96
    lookback_days: 5
    resources:
      - name: new_rainfactor_sensor
        type: default_factor

Here’s a full example with 2 locations and multiple sensors at each

sensor: 
  - platform: openweathermaphistory
    api_key: 'key'
    max_api_calls_per_hour: 30
    max_api_calls_per_day: 200
    lookback_days: 30
    resources:
      - name: rainfactor_default_location
        type: default_factor
        data:
          watertarget: 0.5
      - name: rainfactor_with_custom
        type: custom
        data:
          formula: 'max( (0.5 - day0rain - day1rain/2 - day2rain/4 - day3rain/8 - day4rain/16) / 0.5, 0)'
      - name: 48hr_rain
        type: custom
        data:
          formula: day0rain + day1rain
  - platform: openweathermaphistory
    api_key: 'key'
    max_api_calls_per_hour: 30
    max_api_calls_per_day: 200
    lookback_days: 6
    latitude:  -33.8302547
    longitude: 151.1516128
    resources:
      - name: rainfactor_aus_locatoin
        type: default_factor
        data:
          watertarget: 0.5
      - name: rainfactor_with_custom_aus
        type: custom
        data:
          formula: 'max( (0.5 - day0rain - day1rain/2 - day2rain/4 - day3rain/8 - day4rain/16) / 0.5, 0)'
      - name: 24hr_rain_aus
        type: custom
        data:
          formula: day0rain

@tsbernar
Copy link
Author

The reason for splitting it this way is to allow all the sensors at the same location to share the same set of data / api calls. Though we could also just achieve that on the backend if you think its more desirable to just have one “platform” configured and specify different locations on the sensor level in the resources list.

 - up to 3 locations with default settings
 - A backfill up to 5 days in first few mins after startup per location
-Support "3h" in rain and snow data and add test
-Skip observation if unexpected dt
- Remove old version and config option, only use v3 now
@tsbernar
Copy link
Author

Responding to a few other comments:

For the '3h' issue I would simply divide the value by 3 that should be accurate enough and given they provide 3 periods with the same data it logically makes sense.

Makes sense to me; I've added this as well as a warning log message if we see anything else unexpected in there. Hopefully, "1h" and "3h" is all we'll see.

My preference is defaulting to 5 days of data to support the UI and calculation model limiting the start up load to only 120 calls for each sensor and then letting it build up naturally to a longer 30 day limit.
Provide service to download additional days of history so advanced users can get data faster if required
The end user can then be responsible for not overdoing the calls

I've made a change to the default API rate limits that should roughly accomplish this, though without a separate service. The default lookback is still 30 days, which is the maximum amount of data that we will keep in the rolling window and persistent store, but we will only backfill the first 5 shortly after startup. The way the backfilling works now is:

Every 30s SCAN_INTERVAL:

  1. We check if our (30-day default) lookback window is full. If it's not full, we check if we have available API limits for the current hour and the current day; if we do, we will send off a background task to backfill up to 10 hours (or less if constrained by the API limits).
  2. We check if we need to do a live update for the current hour. Step 1 always reserves enough limits so that we will be able to do the live updates once per hour.

The current limits are set to allow a backfill of 5 days in the first hour after a restart. In practice, this happens in the first 6 mins of the hour at a rate of 10 hours backfilled every 30s interval, then no backfilling for the rest of the hour until our initial requests roll off. The remaining 25 days of the full lookback window will then be slowly filled in over the next couple of days as daily and hourly limits permit. The default limits allow for up to 3 locations at a time without getting into paid API requests, assuming 0 persisted data at the start and all need a full backfill. If you already have a location configured, adding more should be okay, as the existing locations will only be using 24 requests per day once the backfills are complete.

Another option could be to have 2 lookback windows configured, a backfill window and a lookback window. The backfill window could be set to 5 days in your example, and the lookback set to 30. In this case we would only backfill the 5 days on startup (as permitted by the limits), but we will keep up to 30days of history as time passes and we naturally add more samples from live requests

@petergridge
Copy link
Owner

You have been busy, I like what you have done and I am learning something new from your coding, I still think in COBOL :)

Maybe we should just add the default sensor if none are specified to shrink down the minimal config?

That makes sense, I believe that having a default resource will make it more user friendly, less yaml = less mistakes and most users just run with default settings. we also need to consider the complexity that is needed to build into the config flow. If you are looking for an example config flow (all be it overly complex) the irrigation custom component in my repository has config flow.

I also like this option, if a user requests more than 5 days your existing rules will kick in.

Another option could be to have 2 lookback windows configured, a backfill window and a lookback window. The backfill window could be set to 5 days in your example, and the lookback set to 30. In this case we would only backfill the 5 days on startup (as permitted by the limits), but we will keep up to 30days of history as time passes and we naturally add more samples from live requests

I would consider getting the numeric value from the key and using it as the denominator, just to future proof it.

I've added this as well as a warning log message if we see anything else unexpected in there. Hopefully, "1h" and "3h" is all we'll see.

What are your plans to use the 30 days of data?

  • I have a template sensor that updates every 24 hours at midnight to capture that days details so the information is captured in HA History so I can present a graph. I can see this as one of the sensors types, the max temp, min temp, total rain and snow, average humidity value for a calendar day. This is a feature a lot of users have been looking for.
  • for more granularity the rainfall for an hour is now possible given the hourly nature of V3. These types of sensors will let us use HA's history capability to capture the long term stats,
  • We should be able to use SQL sensors to get and manipulate the long term data if required.
  • We may even be able to get away from providing all the attributes and present a card from history data rather than the attributes, not that I have seen anyone do that, but I haven't looked very hard.

@tsbernar
Copy link
Author

Nice, we're both learning here! This is the first HA integration I've worked on, and it's been much easier to see how it all works starting with an integration that already works than starting from scratch. (Just bought my first house, and have been a bit too excited about all the home automation things)

That makes sense, I believe that having a default resource will make it more user friendly, less yaml = less mistakes and most users just run with default settings. we also need to consider the complexity that is needed to build into the config flow. If you are looking for an example config flow (all be it overly complex) the irrigation custom component in my repository has config flow.

Agreed on the default. I was just starting to struggle with the config flow today, so will take a look at the irrigation component. I've been meaning to take a look at that anyway as irrigation automation is next up for me after getting this rain data. Do you have any other tips for irrigation generally? Using moisture sensors or anything else like that?

I would consider getting the numeric value from the key and using it as the denominator, just to future-proof it.

Makes sense to me. I was just worried that the division by the numeric value might not always work. I'm used to dealing with software where if something unexpected happens, you probably want to know about it right away and stop.. probably not our ideal behavior in this case, and there are other users to think about.

What are your plans to use the 30 days of data?

Mainly for UI, I have a vague idea of what I want a custom card to look like for displaying both irrigation time and rainfall over time, but I have yet to dig into the weeds of how hard that will be to make. I was thinking of using hourly data for recent days and a monthly view.

I have a template sensor that updates every 24 hours at midnight to capture that days details so the information is captured in HA History so I can present a graph. I can see this as one of the sensors types, the max temp, min temp, total rain and snow, average humidity value for a calendar day. This is a feature a lot of users have been looking for.

I think this should be straightforward with a custom type sensor after we expose humidity and temp as inputs to the formula, but also a good idea for a new sensor type with easier config.

Agreed on the stats, the HA history is great.. I just don't have enough history yet
image

I think a custom card that uses the internal state rather than HA history would give a lot of flexibility and allow us to display backdated data.

@petergridge
Copy link
Owner

I was just worried that the division by the numeric value might not always work.

As long as an error is handled and the control does not crash it should be fine, if not a valid value ignore it. But on that note I purposely exceeded the call limit to see how my version handles it, and I was thinking an error sensor type that provides the error details would be great, I can put on the dashboard with a condition to show only when it is active. Also the other sensors would benefit with a default value when they are in error so the irrigation system still gets a value, I could/should handle it at that end as well.

Do you have any other tips for irrigation generally? Using moisture sensors or anything else like that?

The irrigation control has had pretty good take up since I put it on HACS, it only took me 5 years before I got around to publishing it. I always get a rush of requests as the northern hemisphere watering season kicks in, every time I think I have all the bases covered someone has a good idea. I built it to be simple to configure and provide a functional UI capability that is not technical, since then I have built a card as well again functional rather than fancy. But this weather map history control has been downloaded over 600 time in the last couple of weeks since it was published.

I built 'rainfactor' because I got sick of fiddling with rain and moisture sensors. I built my own ESP based irrigation controller (the box it is in was more expensive than the components) with inputs to support sensors. I had issues with the sensor being in a rain shadow when the wind was blowing and it did not really help to determine how much rain there was. It could rain in the morning and my program runs in the afternoon... my list of grievance's is endless:). Even with moisture sensors it depends on where you place it, one in the lawn, one in pot plants the list goes on, and it was just fiddly so I went with a more the more global method that does not need hardware, that is what the internet is for after all, it has worked well for me.

I found it more reliable to use weather data to reduce watering based on rainfall, if a zone does not water it will check the next day and run if the conditions are met. If you have configured to run every 3 days and it does not water because of the 'rainfactor' it will still check every day until it does need to water, it does not wait another 3 days.

With the additional information and your formulas I can also increase the watering if the temperature is high or stop/reduce it if the temperature is low.

The other usage for your model is to build a template that I can use to alter the frequency of watering or even enable a second program to run if there is an extended or forecast period of hot weather, this will only need a small tweak to the irrigation control.

The work you have done will make this a much better partner for the irrigation control.

I think a custom card that uses the internal state rather than HA history would give a lot of flexibility and allow us to display backdated data.

From what I have seen (not that it is definitive), you can't access the backend data directly from the card, you need to access from the sensor and attributes, to go back 30 days you will need to expose a lot of attributes, this is not the way HA is heading. I exposed them this way as I was to lazy to create many sensors for people to get the information for their own calculations, but you have exposed the formula capability and multiple sensors so I think my attributes are no longer required.

Here is the graph I have now and the config I use, the mean option smooths out the graph to look more appealing.
image
I only have 30 days of history kept to keep the database snappy, I don't want to stress out the PI. Having said that I have a small SSD attached via USB3 and it is very good. I have an automation that runs weekly to clean up and compress the history.

Waiting 30 days was a bit of a pain when I started tracking but it was kinda nice to see the graph fill out over time. For me it was aesthetic rather than functional anyway 5 days was plenty for my purpose.

@tsbernar
Copy link
Author

tsbernar commented May 6, 2023

I still need to clean up my config flow code some more before I publish here, but it took me a while to get going so I wanted to give an update on how I think the config flow could work. Here's a screen record demo:

Screen.Recording.2023-05-06-.mov

@petergridge
Copy link
Owner

The config flow is looking good, couple of things to consider:

  • the lat,lon data is not be editable, that is good. but the API and other attributes should be.
  • The sensor names should not be editable. The sensor name can be changed in HA assuming a unique id is allocated
  • removal of the data file when the integration is removed
    I would add validation:
  • that the Lat Lon is only used once across all instances of the integration
  • sensor name is unique within the instance of the integration
    consider automatically allocating the SensorDeviceClass if the formula is simple i.e. only one known element so the unit of measure is set. See below...

I have taken your code (mostly) and put it into my latest checked in source. I have reworked:

  • the way the entities are created and the API is called
  • Allowed the allocation of Sensor class information so it allocates the unit of measure and handles the conversion of the information from metric to imperial automatically.
  • The allocation of a unique id is best as this allow you to modify the name, numeric precision etc from the UI
  • Modified how the formula's are handled to allow for mor complex templates, I wanted to use the templates to alter watering days.
  • removed any reference to API v2.5 to simplify the code significantly
  • Added current observations and forecast data for 8 days
  • allowed the allocation of attributes to support the current custom card
  • I am still using pickle to save data, but want to use your method, just haven't tried yet.

@tsbernar
Copy link
Author

tsbernar commented May 7, 2023

Thanks. I've just pushed the code with the config flow.

-The same Lat Lon can only be used once across all instances of the integration.
-Sensor names are unique across an instance
-Validation added for each step. We validate that the API key is valid and that we can call the API, and that the inputs are valid for each sensor type.
-I did not add the removal of the data file yet, it's quite small even with a longer lookback window, and it seemed more valuable for saving API calls from having the data persisted if you remove and re-add the integration, at least for testing this has been useful.

@petergridge
Copy link
Owner

Hi Trevor,

I just pushed out a version 2 of the component, I think it covers most of your requirements, after all I stole a whole lot of your work, thanks.

If you have time let me know what you think and we can add improvements from there.

Cheers
Pete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants