• Hello Marcel,


    I have exactly the same setup in 5 installations and all work fine, except one were Postgres often crashes with log "FATAL: out of memory" error which is caused by a Windows Server (NT Level) exception 0xC000012D "Out of Virtual Memory"...


    The Conquest version is 1.5.0 and runs on a Windows Server 2019 Virtual Machine with 4vCPUs, 12GBRAM, 100GB OS Drive, Postgres12 on top of a DELL / Vmware ESXi 6.7 host.


    I have observe that quite often (almost daily) we get Warning Level events in Windows Event Viewer that dgate64 consumes significant Virtual Memory such as the one I copied below.

    -------------------------------------------------------------------------------------------------

    Log Name: System

    Source: Microsoft-Windows-Resource-Exhaustion-Detector

    Date: 17/03/2021 20:50:06

    Event ID: 2004

    Task Category: Resource Exhaustion Diagnosis Events

    Level: Warning

    Description: Windows successfully diagnosed a low virtual memory condition. The following programs consumed the most virtual memory: dgate64.exe (6808) consumed 20763033600 bytes, sqlservr.exe (4580) consumed 2050162688 bytes, and MsMpEng.exe (3028) consumed 272416768 bytes.

    ---------------------------------------------------------------------------------------------


    I have tried to increase Virtual Memory space manually in Windows system advanced setup but in vain.... It eats all you give.


    Any ideas of what is going wrong ? Is it hardware problem... RAM maybe ? disks ?....


    I repeat that it happens only in 1 case out of 5 installations.


    PS. The rest 2 virtual memory hungry tasks that appear in the above event are Windows Defender and MSSQL server


    Best regards.

  • Hi Marcel,


    I check the logs and did not notice any remarkable C-Store or C-Move activity during the jump up...

    Also I checked the hardware via DELLs IDRAC dashboard and found the host system healthy...


    Next step is to perform the necessary updates on Windows Server 2019 (which updates Defender as well) and see if any progress occurs...


    Final step is to switch off Defender and replace it with another Antivirus...


    What do you think ?


    Best.

  • Hi,


    you can add:


    [lua]endassociation = print(heapinfo())


    to dicom.ini


    This would put a one line summary of what is allocated at in the log and may help track down what is going on.


    Antivirus would not affect memory allocation in dgate.


    Marcel

  • Hi Marcel


    I added the print(heapinfo()) as you suggested and while I was doing some C-MOVEs the following events were posted to pacstrouble.log

    ----------------------------------------

    20210320 11:45:44 7844 small (466311); 95 medium (61521) 9 large (539998) *** ERROR - bad heap node

    20210320 12:00:56 34972 small (1516944); 210 medium (134631) 73 large (8084432) *** ERROR - bad heap node

    -----------------------


    Can you please explain what that errors mean and suggest handling ?


    Best.

  • Hi,


    this must be clearly a bug. Did the "*** ERROR - bad heap node" message start coming after a particular operation?


    If things are correct you expect little change in the reported messages.


    Marcel

  • Hello Marcel,


    Yesterday afternoon at 18:01 we had the last crash..


    Specifically, at 17:00 Windows Event Viewer started to log Virtual Memory shortage events ... at 18:01 Postgres crashed with error "Out of memory" and auto restarted with auto recovery... finally the system became unusable this morning at 8:00 am and we had to restart the whole server (VM) ...


    During the incident evolution, the user operations (c-Moves, C-stores, C-Finds) were usual and not extraordinary...


    I have attached the Conquest log for that period as well as Postgres log, hoping that this may help to locate the problem.


    FYI... in our setup Conquest connects to Postgres via ODBC and not native ... Could this be the cause of the problem ?

    logs_20210322.zippostgresql-2021-03-22_000000.zip

    Best regards

  • Hi,


    conquest seems to be leaking large objects when decompressing particular images such as around 8:45, you can see large objects go up from 999 to 1034, and later down to 1020. Other jumps are at 9:25, 9:46, 9:50, 11:27, 12:32, 13:26, 19:39, ending at over 1300. Can you see what conquest is doing at the time of bigger jumps?


    Leaks of conquest should not affect postgres at all, but I have no experience with the ODBC driver. Is that a different setup than on the other servers?


    Marcel

  • Maybe conquest leaks when EFILM does not accept the image?


    This happens 267 times in 60 moves, the big object count increases by about 330.


    Can you check the logs for me to see if this is a consistent relationship between messages and jump in large objects?


    Marcel

  • Hi Marcel,


    The setup is the same in all servers as it is the same VM imported (OVF files).


    I used ODBC as I was familiar with MSSQL setup which is also ODBC.


    I will try to check the activity on the peaks and let you know...


    Worth to mention that there is a user who fetches 10-20 studies (MRI,CT) on EFILM which first get uncompressed... could this generate the leakage ?


    By the way, can you please elaborate on of the heapinfo() message meaning ? what is small med and large and the respective numbers?


    Best.

  • Hi Marcel,


    You are absolutely right !


    I checked today log and the leak occurs each time EFILM rejects an image ... so most likely this is indeed a consistent relationship.


    Could you please advise how to handle this ? I am afraid that stop using EFILM, even temporarily, could not be an acceptable option...


    Best regards.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!