Posts by garlicknots

    We have been holding steady (4 full days) since adding the additional vCPU to nodes in the cluster (from 2cpu to 3cpu on 4 nodes). We want to avoid additional change so we can be more sure about the root cause.


    Our Import and Export converters call scripting as of a few weeks ago which passes data to an influxDB using curl so we can visualize stats through Grafana. It's looking like the additional overhead from curl/HTTP was somehow causing this behavior.

    We upgraded yesterday actually. Unfortunately that did not resolve the issue.


    /edit: we have also now added an additional vCPU to the guest in case this was somehow related to processing power.


    /edit2: Marcel - I am reading in other threads about how you recommend setting up a forwarder and am unclear how it really is best to do so. We have settings similar to what you have outlined here Exporter Failures but sometimes see images missing from series that need to be redelivered. Speed is a big factor in what we do, so adding in a delay is a little scary sounding... but if we can't rely on the EC's to send everything without doing so that would be nice to know. For a group using ConQuest as a router, how would you recommend we develop our import/export converters to assure 100% store accuracy?


    /edit3: we are running on Linux just as an fyi

    Axel,


    We've observed behavior like this when certain abstract syntax selections and have disabled them. You should look at the syntax being negotiated in 1.4.17 and modify your 1.4.19 dgatesop.lst to encourage that preferred method.

    I'm working with kakarr0t on this issue. Our biggest question right now revolves around the fact that stopping and starting the process will result in successful deliveries. It's as if the ExportConverter thread itself becomes unstable or something. We have observed this issue on several export converters delivering to several differing destinations. I cap'd some packets but I did not find any answers in the TCP packets themselves.


    When this issue begins, the EC will begin processing data, then instantly fail delivery. It will remain in that state until the dgate process is restarted. If we begin deleting data from the EC queue, all the next data will also fail. If we stop/start, it's all retried and stored very very rapidly.