Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conseil API: sudden "request timeout" #941

Open
ghost opened this issue Oct 23, 2020 · 25 comments
Open

Conseil API: sudden "request timeout" #941

ghost opened this issue Oct 23, 2020 · 25 comments
Assignees
Labels

Comments

@ghost
Copy link

ghost commented Oct 23, 2020

As advised in #932 , I built Conseil from master ([commit-hash: 9f4eb4c]).
Since this change I see configuration-related warnings like:

06:58:42.422 [akka.actor.default-dispatcher-48]  WARN  t.c.c.a.metadata.UnitTransformation  - There're missing metadata overrides for "tezos / carthagenet / blocks / fork_id"
06:58:42.425 [akka.actor.default-dispatcher-48]  WARN  t.c.c.a.metadata.UnitTransformation  - There're missing metadata overrides for "tezos / carthagenet / blocks / invalidated_asof"

And also sudden "blackouts" where the API stops answering, i.e. no response, no 503, nothing to a GET /v2/data/tezos/carthagenet/blocks/head call.

Logs around the blackout:

...
13:09:54.004 [akka.actor.default-dispatcher-44]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
13:09:54.178 [akka.actor.default-dispatcher-44]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
13:09:54.179 [akka.actor.default-dispatcher-23]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
13:09:54.185 [akka.actor.default-dispatcher-34]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
13:10:00.802 [akka.actor.default-dispatcher-25]  WARN  t.c.c.a.metadata.UnitTransformation  - There're missing metadata overrides for "tezos / carthagenet / operations / invalidated_asof"
13:10:00.802 [akka.actor.default-dispatcher-25]  WARN  t.c.c.a.metadata.UnitTransformation  - There're missing metadata overrides for "tezos / carthagenet / operations / fork_id"
13:10:01.125 [akka.actor.default-dispatcher-43]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
[INFO] [10/23/2020 13:15:03.812] [conseil-system-akka.actor.default-dispatcher-7418] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [POST /v2/data/tezos/carthagenet/operations Strict(370 bytes)]
[INFO] [10/23/2020 13:15:04.063] [conseil-system-akka.actor.default-dispatcher-7418] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/carthagenet/blocks/head Strict(0 bytes)]
[INFO] [10/23/2020 13:15:04.243] [conseil-system-akka.actor.default-dispatcher-7418] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/carthagenet/blocks/head Strict(0 bytes)]

...

It has happened several times since the update and the pattern is always the same:

HTTP request                            # Healthy as far as I understand
Request timeout to .../operations       # Broken
Request timeout to .../blocks/head      # Broken

(I have a process that checks the blocks/head periodically.)
I get the metadata config warnings when I restart Conseil and sometimes before the blackout, too - but not always. I am not sure if the two problems are related.

If I restart the API it works again for some time.

@vishakh vishakh self-assigned this Oct 23, 2020
@vishakh vishakh added the bug label Oct 23, 2020
@vishakh
Copy link
Contributor

vishakh commented Oct 23, 2020

What is the CPU utilization of the Tezos node when you experience these Conseil issues? What about CPU utilization overall?

@ghost
Copy link
Author

ghost commented Oct 26, 2020

Hello @vishakh ,
When it is in the "zombie" state it looks like this:

mpstat

12:35:17 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
12:35:17 PM  all    1.38    0.00    0.68    0.79    0.00    0.03    0.09    0.00    0.00   97.03

ps -p 2014,7823 -o %cpu,%mem,cmd
%CPU %MEM CMD
19.5  2.5 tezos-node run -v --history-mode=archive --data-dir=tezos --network=carthagenet --rpc-addr=0.0.0.0:8732 --config-file=carthagenet.json --connections=5
 3.9 15.4 java -Dconfig.file=/app/conseil/conf/api-carthagenet.conf -jar conseil-api.jar

When I restart conseil-api it gets busier - but only for a minute or two.

@ghost
Copy link
Author

ghost commented Oct 28, 2020

OK, I did some more testing. The missing metadata overrides seems to be unrelated.
I found that this problem occurs if I make very busy calls to /operations and /balance_updates.

What I find really interesting is that when Conseil API freezes then tezos-node (running on the same machine) freezes, too. Do you know why it could happen?

@vishakh
Copy link
Contributor

vishakh commented Oct 28, 2020

Maybe the Postgres container is being hit hard by the Conseil queries. This reduces the available IO for the Tezos node.

Can you provide us with the queries you are running so we can profile them?

@ghost
Copy link
Author

ghost commented Nov 4, 2020

Hi, here are the calls:

{"fields":[],"predicates":[{"field":"operation_group_hash","operation":"in","set":["********************"],"inverse":false},{"field":"kind","operation":"in","set":["reveal","transaction","origination","delegation"],"inverse":false}],"orderBy":[],"aggregation":[]}
2020-11-04 16:25:10,694 TRACE [              Test worker] [client.conseil.TezosConseilAPI] --> END POST (294-byte body)
2020-11-04 16:25:10,741 TRACE [              Test worker] [client.conseil.TezosConseilAPI] <-- 200 OK http://***********************/v2/data/tezos/carthagenet/operations (46ms)
{"fields":[],"predicates":[{"field":"operation_group_hash","operation":"in","set":["**********************"],"inverse":false}],"orderBy":[],"aggregation":[]}
2020-11-04 16:25:10,782 TRACE [              Test worker] [client.conseil.TezosConseilAPI] --> END POST (186-byte body)
2020-11-04 16:25:10,808 TRACE [              Test worker] [client.conseil.TezosConseilAPI] <-- 200 OK http://***********************/v2/data/tezos/carthagenet/balance_updates (25ms)

@vishakh
Copy link
Contributor

vishakh commented Nov 6, 2020

@g574 In both cases the query is optimized since it hits a database index. Now I wonder whether your Postgres container or server is overwhelmed. Could you show us the CPU and memory utilization of the Postgres process when the problem queries are running?

@ghost
Copy link
Author

ghost commented Nov 6, 2020

@vishakh sorry, I found a bug in my systems, the tezos-node is not affected, only Conseil API. Is the Postgres CPU/memory utilization still relevant?

@vishakh
Copy link
Contributor

vishakh commented Nov 6, 2020 via email

@ghost
Copy link
Author

ghost commented Nov 6, 2020

The way I ran queries against tezos-node was wrong so I believed it stopped responding at the same time as Conseil.

@vishakh
Copy link
Contributor

vishakh commented Nov 6, 2020 via email

@ghost
Copy link
Author

ghost commented Nov 6, 2020

mpstat:

03:57:54 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
03:57:54 PM  all    1.44    0.00    0.69    0.51    0.00    0.03    0.10    0.00    0.00   97.24

ps -eo %cpu,%mem,cmd | grep postgres:

 0.0  0.6 /usr/bin/postgres -D ...
 0.0  0.0 postgres: logger process
 0.0 22.8 postgres: checkpointer process
 0.0  0.5 postgres: writer process
 0.0  0.2 postgres: wal writer process
 0.0  0.0 postgres: autovacuum launcher process
 0.0  0.0 postgres: stats collector process
 0.0  0.0 postgres: bgworker: logical replication launcher
 2.4  3.8 postgres: ... ... ...(42560) idle
 1.4  3.7 postgres: ... ... ...(42566) idle
 0.0  0.6 postgres: ... ... ...(42568) idle
 0.0  0.0 postgres: ... ... ...(42570) idle
 0.0  0.0 postgres: ... ... ...(42588) idle
 0.0  0.0 postgres: ... ... ...(42592) idle
 0.0  0.0 postgres: ... ... ...(42596) idle

I don't find it too heavy. Can it be related to the high number of "idle" processes?

@vishakh
Copy link
Contributor

vishakh commented Nov 6, 2020 via email

@ghost
Copy link
Author

ghost commented Nov 12, 2020

Hi @vishakh ,
sorry for the delay. I found that the tezos-node makes some extreme disk utilization and I/O. Among other problems it may also be the cause of these blackouts.
Is it possible to do anything about it on the Conseil end? Does lowering the number of PostgreSQL threads make any sense?

Thanks for the question but unfortunately nautilus.cloud is not an option for me right now.

@vishakh
Copy link
Contributor

vishakh commented Nov 12, 2020 via email

@ghost
Copy link
Author

ghost commented Nov 23, 2020

Hi @vishakh ,
I managed to solve the I/O problem but I still experience these blackouts.
I started Conseil API with the -verbose switch and caught this log:

08:07:46.829 [akka.actor.default-dispatcher-30]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
08:07:47.825 [-akka.actor.default-dispatcher-3]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
08:07:56.888 [akka.actor.default-dispatcher-27]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
08:07:57.886 [akka.actor.default-dispatcher-27]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
[1894.088s][info][class,load  ] akka.http.impl.engine.parsing.HttpHeaderParser$RawHeaderValueParser source: file:/app/conseil/bin/conseil-api.jar
08:07:59.275 [akka.actor.default-dispatcher-27]  INFO  t.c.c.a.d.RecordingDirectives        - HTTP request
[1894.184s][info][class,load  ] akka.http.javadsl.model.HttpEntity$Default source: file:/app/conseil/bin/conseil-api.jar
[1894.185s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$Default source: file:/app/conseil/bin/conseil-api.jar
[1894.185s][info][class,load  ] java.lang.invoke.LambdaForm$DMH/0x000000084058d840 source: java.lang.invoke.LambdaForm
[1894.186s][info][class,load  ] akka.http.impl.engine.parsing.HttpMessageParser$$Lambda$2609/0x00000008406f0040 source: akka.http.impl.engine.parsing.HttpMessageParser
[1894.186s][info][class,load  ] java.lang.invoke.LambdaForm$DMH/0x000000084058a840 source: java.lang.invoke.LambdaForm
[1894.187s][info][class,load  ] java.lang.invoke.LambdaForm$DMH/0x0000000840589c40 source: java.lang.invoke.LambdaForm
[1894.187s][info][class,load  ] java.lang.invoke.LambdaForm$MH/0x0000000840588040 source: java.lang.invoke.LambdaForm
[1894.188s][info][class,load  ] java.lang.invoke.LambdaForm$DMH/0x0000000840579c40 source: java.lang.invoke.LambdaForm
[1894.188s][info][class,load  ] akka.http.impl.engine.parsing.HttpMessageParser$$Lambda$2610/0x00000008406cb840 source: akka.http.impl.engine.parsing.HttpMessageParser
[1894.189s][info][class,load  ] java.lang.invoke.LambdaForm$DMH/0x0000000840560840 source: java.lang.invoke.LambdaForm
[1894.189s][info][class,load  ] java.lang.invoke.LambdaForm$DMH/0x00000008404f2c40 source: java.lang.invoke.LambdaForm
[1894.189s][info][class,load  ] java.lang.invoke.LambdaForm$MH/0x00000008404f3040 source: java.lang.invoke.LambdaForm
[1894.190s][info][class,load  ] akka.http.impl.engine.parsing.HttpMessageParser$$Lambda$2611/0x0000000840589040 source: akka.http.impl.engine.parsing.HttpMessageParser
[1894.193s][info][class,load  ] akka.stream.stage.GraphStageLogic$SubSourceOutlet source: file:/app/conseil/bin/conseil-api.jar
[1894.194s][info][class,load  ] akka.stream.impl.fusing.SubSink$Command source: file:/app/conseil/bin/conseil-api.jar
[1894.195s][info][class,load  ] akka.stream.stage.GraphStageLogic$SubSourceOutlet$$Lambda$2612/0x0000000840564840 source: akka.stream.stage.GraphStageLogic$SubSourceOutlet
[1894.196s][info][class,load  ] akka.stream.impl.fusing.SubSource source: file:/app/conseil/bin/conseil-api.jar
[1894.198s][info][class,load  ] akka.stream.impl.SubscriptionTimeoutException source: file:/app/conseil/bin/conseil-api.jar
[1894.198s][info][class,load  ] akka.stream.impl.fusing.SubSource$$anon$13 source: file:/app/conseil/bin/conseil-api.jar
[1894.200s][info][class,load  ] akka.http.impl.engine.server.HttpServerBluePrint$PrepareRequests$$anon$1$$anon$2 source: file:/app/conseil/bin/conseil-api.jar
[1894.202s][info][class,load  ] akka.http.impl.engine.parsing.HttpMessageParser$$anonfun$1 source: file:/app/conseil/bin/conseil-api.jar
[1894.204s][info][class,load  ] akka.http.scaladsl.model.EntityStreamException source: file:/app/conseil/bin/conseil-api.jar
[1894.206s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$Default$$Lambda$2613/0x0000000840374040 source: akka.http.scaladsl.model.HttpEntity$Default
[1894.207s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$Limitable$ source: file:/app/conseil/bin/conseil-api.jar
[1894.208s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$SizeLimit source: file:/app/conseil/bin/conseil-api.jar
[1894.208s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$Limitable$$$Lambda$2614/0x0000000840564040 source: akka.http.scaladsl.model.HttpEntity$Limitable$
[1894.209s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$Limitable source: file:/app/conseil/bin/conseil-api.jar
[1894.209s][info][class,load  ] akka.http.scaladsl.model.HttpEntity$Limitable$$anon$1 source: file:/app/conseil/bin/conseil-api.jar
[1894.212s][info][class,load  ] akka.http.impl.util.StreamUtils$$anon$3 source: file:/app/conseil/bin/conseil-api.jar
[1894.212s][info][class,load  ] akka.http.impl.util.StreamUtils$$anon$3$$anon$4 source: file:/app/conseil/bin/conseil-api.jar
[1894.215s][info][class,load  ] akka.http.impl.util.ToStrict source: file:/app/conseil/bin/conseil-api.jar
[1894.216s][info][class,load  ] akka.http.impl.util.ToStrict$$anon$1 source: file:/app/conseil/bin/conseil-api.jar
[1894.217s][info][class,load  ] akka.stream.impl.HeadOptionStage source: file:/app/conseil/bin/conseil-api.jar
[1894.218s][info][class,load  ] akka.stream.SinkShape$ source: file:/app/conseil/bin/conseil-api.jar
[1894.219s][info][class,load  ] akka.stream.scaladsl.Sink$$$Lambda$2615/0x0000000840563840 source: akka.stream.scaladsl.Sink$
[1894.220s][info][class,load  ] akka.stream.impl.HeadOptionStage$$anon$2 source: file:/app/conseil/bin/conseil-api.jar
[1894.221s][info][class,load  ] akka.stream.scaladsl.Sink$$$Lambda$2616/0x0000000840563040 source: akka.stream.scaladsl.Sink$
[1894.222s][info][class,load  ] akka.http.impl.util.ToStrict$$anon$1$$anon$2 source: file:/app/conseil/bin/conseil-api.jar
[1894.223s][info][class,load  ] akka.http.impl.util.ToStrict$$anon$1$$anon$3 source: file:/app/conseil/bin/conseil-api.jar
[1894.225s][info][class,load  ] akka.stream.stage.TimerGraphStageLogic$$anon$2 source: file:/app/conseil/bin/conseil-api.jar
[1894.225s][info][class,load  ] akka.http.scaladsl.server.directives.ExecutionDirectives$$Lambda$2617/0x0000000840560040 source: akka.http.scaladsl.server.directives.ExecutionDirectives
[1894.226s][info][class,load  ] tech.cryptonomic.conseil.api.directives.RecordingDirectives$$anonfun$loggingExceptionHandler$1$$Lambda$2618/0x0000000840556040 source: tech.cryptonomic.conseil.api.directives.RecordingDirectives$$anonfun$loggingExceptionHandler$1
[1894.226s][info][class,load  ] akka.stream.Attributes$$anonfun$getFirst$1 source: file:/app/conseil/bin/conseil-api.jar
[1894.227s][info][class,load  ] tech.cryptonomic.conseil.api.directives.RecordingDirectives$$anonfun$loggingExceptionHandler$1$$Lambda$2619/0x0000000840556840 source: tech.cryptonomic.conseil.api.directives.RecordingDirectives$$anonfun$loggingExceptionHandler$1
[1894.227s][info][class,load  ] tech.cryptonomic.conseil.api.directives.RecordingDirectives$$anonfun$loggingExceptionHandler$1$$Lambda$2620/0x0000000840bf1840 source: tech.cryptonomic.conseil.api.directives.RecordingDirectives$$anonfun$loggingExceptionHandler$1
[1894.228s][info][class,load  ] akka.stream.actor.ActorSubscriberMessage source: file:/app/conseil/bin/conseil-api.jar
[1894.228s][info][class,load  ] akka.stream.impl.fusing.SubSource$$anon$13$$Lambda$2621/0x0000000840557040 source: akka.stream.impl.fusing.SubSource$$anon$13
[1894.228s][info][class,load  ] akka.http.scaladsl.marshalling.Marshaller$$$Lambda$2622/0x0000000840557840 source: akka.http.scaladsl.marshalling.Marshaller$
[1894.229s][info][class,load  ] akka.stream.impl.fusing.SubSink$RequestOne$ source: file:/app/conseil/bin/conseil-api.jar
[1894.229s][info][class,load  ] akka.http.scaladsl.marshalling.Marshaller$$$Lambda$2623/0x0000000840beb440 source: akka.http.scaladsl.marshalling.Marshaller$
[1894.230s][info][class,load  ] akka.http.scaladsl.marshalling.Marshalling$WithOpenCharset source: file:/app/conseil/bin/conseil-api.jar
[1894.231s][info][class,load  ] akka.http.scaladsl.marshalling.Marshal$$anonfun$$nestedInanonfun$toResponseFor$3$1 source: file:/app/conseil/bin/conseil-api.jar
[1894.233s][info][class,load  ] akka.http.impl.engine.rendering.ResponseRenderingContext$CloseRequested$ForceClose$ source: file:/app/conseil/bin/conseil-api.jar
[1894.235s][info][class,load  ] akka.stream.actor.ActorSubscriberMessage$OnComplete$ source: file:/app/conseil/bin/conseil-api.jar
[1894.235s][info][class,load  ] akka.stream.scaladsl.Sink$$$Lambda$2624/0x0000000840be9040 source: akka.stream.scaladsl.Sink$
[2194.264s][info][class,load  ] akka.http.impl.util.package$RichHttpRequest$ source: file:/app/conseil/bin/conseil-api.jar
[2194.267s][info][class,load  ] akka.event.Logging$Info$ source: file:/app/conseil/bin/conseil-api.jar
[2194.268s][info][class,load  ] akka.event.Logging$Info2 source: file:/app/conseil/bin/conseil-api.jar
[2194.271s][info][class,load  ] akka.event.Logging$LogEventWithMarker source: file:/app/conseil/bin/conseil-api.jar
[2194.271s][info][class,load  ] akka.event.Logging$Info3 source: file:/app/conseil/bin/conseil-api.jar
[2194.272s][info][class,load  ] java.time.format.DateTimePrintContext source: shared objects file
[2194.272s][info][class,load  ] java.time.LocalDate$1 source: shared objects file
[2194.272s][info][class,load  ] java.time.format.DateTimeFormatterBuilder$3 source: shared objects file
[2194.279s][info][class,load  ] scala.io.AnsiColor source: file:/app/conseil/bin/conseil-api.jar
[2194.279s][info][class,load  ] scala.DeprecatedConsole source: file:/app/conseil/bin/conseil-api.jar
[2194.279s][info][class,load  ] scala.Console$ source: file:/app/conseil/bin/conseil-api.jar
[2194.317s][info][class,load  ] sun.nio.cs.US_ASCII$Decoder source: jrt:/java.base
[INFO] [11/23/2020 08:12:59.446] [conseil-system-akka.actor.default-dispatcher-927] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [POST /v2/data/tezos/carthagenet/operations Strict(434 bytes)]
[INFO] [11/23/2020 08:13:06.962] [conseil-system-akka.actor.default-dispatcher-927] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/carthagenet/blocks/head Strict(0 bytes)]
[INFO] [11/23/2020 08:13:07.951] [conseil-system-akka.actor.default-dispatcher-927] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/carthagenet/blocks/head Strict(0 bytes)]
[INFO] [11/23/2020 08:13:19.012] [conseil-system-akka.actor.default-dispatcher-929] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/carthagenet/blocks/head Strict(0 bytes)]

My current config looks like this:

include "metadata.conf"

platforms: [ {
  name: tezos
  network: carthagenet
  enabled: true
  node: {
    protocol: "http"
    hostname: "***********"
    port: 8732
    pathPrefix: ""
    }
  }
]

conseil {
  hostname: "0.0.0.0"
  port: *******

  cache-ttl: 15 minutes
  max-query-result-size: 100000
  high-cardinality-limit: 100
  startup-deadline: 5 minutes

  security.api-keys {
    keys: []
    allow-blank: true
  }

  db {
    dataSourceClass: "org.postgresql.ds.PGSimpleDataSource"
    properties {
      user: "************"
      password: "**************"
      url: "jdbc:postgresql://***************/****"
      reWriteBatchedInserts: true
    }
  }

}

akka {
  http {
    dispatcher {
      type: "Dispatcher"
      executor: "thread-pool-executor"
      throughput: 1
      thread-pool-executor {
        fixed-pool-size: 16
      }
    }

    server {
      request-timeout: 5 minutes
      idle-timeout: 5 minutes
    }
  }
}

Does it look correct?

@ghost
Copy link
Author

ghost commented Dec 8, 2020

hi @vishakh , I suppose there is some misconfiguration. Do you see any red flag in my config?

@vishakh
Copy link
Contributor

vishakh commented Dec 8, 2020

@ivanopagano @piotrkosecki Could you please review @g574's configuration above?

@piotrkosecki
Copy link
Contributor

piotrkosecki commented Dec 9, 2020

I think you could try changing this cache-ttl: 15 minutes to cache-ttl: 24 hours. Cache is not needed to be refreshed so frequently - the current value was left there for development purposes.

Other lines look fine for me.

@vishakh
Copy link
Contributor

vishakh commented Dec 9, 2020

@g574 Please try what @piotrkosecki mentioned and let us know if you still have issues.

@ghost
Copy link
Author

ghost commented Dec 15, 2020

Thank you. I have updated the configuration. I will observe my Conseil and let you know the result.

@ghost
Copy link
Author

ghost commented Dec 16, 2020

Hello @piotrkosecki , @vishakh
Unfortunately I still get those blackouts. My current config is as follows:

include "metadata.conf"

platforms {
  tezos {
    carthagenet {
      node {
        protocol: "http",
        hostname: "***********"
        port: 8732
        pathPrefix: ""
      }
    }
  }
}

conseil {
  hostname: "0.0.0.0"
  port: *****

  cache-ttl: 24 hours
  max-query-result-size: 100000
  high-cardinality-limit: 100
  startup-deadline: 5 minutes

  security.apiKeys {
    keys: []
    allow-blank: true
  }

  db {
    dataSourceClass: "org.postgresql.ds.PGSimpleDataSource"
    properties {
      user: "**********"
      password: "**************"
      url: "jdbc:postgresql://************"
      reWriteBatchedInserts: true
    }
  }

}

akka {
  http {
    dispatcher {
      type: "Dispatcher"
      executor: "thread-pool-executor"
      throughput: 1
      thread-pool-executor {
        fixed-pool-size: 16
      }
    }

    server {
      request-timeout: 5 minutes
      idle-timeout: 5 minutes
    }
  }
}

It would be OK to have a timeout now and then but I wonder why it does not recover without a restart.

@ghost
Copy link
Author

ghost commented Jan 13, 2021

I have switched to delphinet and the problem persists. I have not experienced it on mainnet yet.

@vishakh
Copy link
Contributor

vishakh commented Jan 13, 2021 via email

@ghost
Copy link
Author

ghost commented Feb 24, 2021

Hi, the problem persists on delphinet - currently on 2021-january-release-35:

[INFO] [02/24/2021 10:59:19.907] [conseil-system-akka.actor.default-dispatcher-573] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/delphinet/blocks/head Strict(0 bytes)]
[INFO] [02/24/2021 10:59:19.288] [conseil-system-akka.actor.default-dispatcher-572] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/delphinet/blocks/head Strict(0 bytes)]
[INFO] [02/24/2021 10:59:19.327] [conseil-system-akka.actor.default-dispatcher-573] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/delphinet/blocks/head Strict(0 bytes)]
[INFO] [02/24/2021 10:59:19.907] [conseil-system-akka.actor.default-dispatcher-573] [akka.actor.ActorSystemImpl(conseil-system)] Request timeout encountered for request [GET /v2/data/tezos/delphinet/blocks/head Strict(0 bytes)]

If I try time curl --noproxy '*' http://<address>/v2/data/tezos/delphinet/blocks/head in hangs forever. Once I restart the api I get a response again after 0.7 seconds.
Iotat, free, top don't show any high load. The network traffic is less than 100 kB per sec.

@vishakh
Copy link
Contributor

vishakh commented Feb 24, 2021

Can you describe the hardware you are running on? What are the load averages when this issue happens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants