Troubleshooting -FAILED: Error on Load

  • Sorry, I'm stuck again.


    I am having trouble with importing mammo images through the 'incoming' folder. I think I am right in saying that if I DICOM Store the images to conquest (from KPACS or dcmtk storescu), everything works fine every time.


    However, if I drop them into 'incoming', sometimes they work just fine, and other times they don't. On the same images with a delete/regeneration in-between.


    The most common errors are a simple:

    Code
    ***[AddImageFile] /mnt/archive/data/incoming/1.2.840.113619.2.66.2218493241.1747160229131157.339.dcm -FAILED: Error on Load


    or

    Code
    ***(Dyn) Encountered an invalid group order during load of DCM file (after 4d42305e)


    I can't see that there is anything unusual about these images.


    I haven't found the same issue with any other modalities yet, but I haven't been through methodically.


    I have tried doing two things: one is to add 'NoDICOMCheck = 1' to dicom.ini - this appeared to increase the number of errors printed and writes corrupted DICOM files to disk. The other was to use './dgate --debuglevel:3' - this didn't seem to make any difference to the file imports.


    What can I do to establish what is happening?


    Thanks


    Ed

  • Hi Ed,


    can you stop conquest, out the files in incoming and than start it again and see if the same files are loaded correctly?. I suspect that the mechanism I use under windows to test that a file is fully written does not work under Linux. If the above test gives no errors that is confirmed. NoDiCOMcheck would pass more files so if the mechanism fails it would indeed produce more corrupt files.


    The mechanism that I use it so attempt to open the file in "append mode". This should fail if the file is open from elsewhere.


    Marcel

  • Looks like that is the issue. A large collection of imports worked when dgate wasn't running until after the copy was complete.


    I guess it might be my setup that is causing the problem:

    • Conquest is running on linux
    • MAGDevice0 is a CIFS mount
    • Mounted share is provided by Synology NAS
    • Share is also accessed on Windows PCs; this is how the DICOM files will be copied into 'incoming'
  • Hi,


    this is the code:


    Code
    FILE *f = fopen(TempPath, "at"); if (f) { fclose(f); if (!AddImageFile (TempPath, NewPatid, PDU)) { DICOMDataObject DDO; lua_setvar(PDU, "Filename", TempPath); int rc = CallImportConverterN(&DDO, 2100, NULL, NULL, NULL, NULL, PDU, NULL, NULL); } unlink(TempPath); if (Thread) Progress.printf("Process=%d, Total=%d, Current=%d", Thread, 100, count++); }


    If that is acceptable you can add a delay in here somewhere. Or you can call a load, see if it fails and then delay.


  • I'm thinking that maybe all the invalid group order messages related to .bmp and Thumbs.db files that were inadvertently copied with the DICOM (blimmin K-PACS!)


    In which case the checking for load then delay-repeat if it fails would be good. I think this will work for the mid-size files, etc DX, MG, but the delay wouldn't be enough for a large file such as a BTO or enhanced CT. For the most part, we wouldn't be dropping those files with this mechanism, but on a new server I'd like to make things as robust as they can be!


    Is it worth looping around this, or maybe I need to investigate another mechanism of some description - drop the files in a different folder, then trigger a Lua script by going to a URL or something.


    What do you think?

  • Hi Ed,


    any of the mechanisms would suffer the same limitation; it could pickup 'live' files. Assuming such an error is rare, you have to play with the timing. I think the incoming folder is scanner every second. If you delay lets says 5 second on any failure and do not delete the file; it would retry automatically.


    Edit: Oh yes, of course this would retry indefinitely. I guess you could check the file time to see if the retry should stop.


    Creating a small loop would allow multiple retries easily, but any invalid file would be held up as well.


    Marcel

  • Hi Marcel


    I made the changes suggested to line 3093 onward of src/dgate/src/dgate.cpp, and repeated ./maklinux and made sure the dgate file in /usr/lib/cgi-bin/newweb/ had a new datestamp.


    When I retried copying a QA's worth of mammo DICOM images, all but about 6 failed. Only one of the successful ones appears to have been one that previously failed (and it was the previous line in the shell output).


    How would I know that the changes I have made have gone through to the dgate in use?


    As an alternative approach, I copied all the files to a folder in MAGDevice0/data then when the copy was complete, I 'moved' them into the incoming folder. As you might expect, the move was instantaneous and all the files imported into conquest without error.


    Ed

  • Hi Ed,


    the changes I proposed were in the window version of the algorith, the linux, with opendir is just a bit below. If you change the version string (just under the update history), you know that your changes made it through. The 6 that failed show print:


    failed: some file
    imported: same file


    Marcel

  • Does this look like an appropriate edit?


  • It appears to work!


    At first, it didn't, and about two thirds of a collection of mammo images '-FAILED: Error on Load'. But then they started working and carried on working. Subsequent dumps of lots of mammo images, or individual images have all worked with no error.


    What I am seeing though is an initial message in the shell: 'Added file: /path/image.dcm' as soon as an image copy is started, then a repeat of the message 5 seconds later. This happens even if the image is moved locally into incoming, or if dgate is started with the file already written into incoming, or if there is a backlog so the image is definitely written before it is considered for import.


    I don't understand the initial behaviour. But the other behaviour looks as though the 5 second sleep delays every import, even if the image is properly there.

  • Hi,


    I just checked again, and I see my mistake. The importconverter 2100 is for rejected images, so the test was reversed. Try this:


    Marcel


  • Thanks Marcel


    That now works - objects that have copied across are imported without delay, objects that fail the first time are imported after the 5 seconds according to the sleep set in the code.


    I'm having less consistent results with very large objects. I have tried copying in a half-gig BTO, which takes about 45 seconds to transfer. At 5 seconds sleep, I had a long list of -FAILED: Error on Load, but then when it had copied instead of then loading it, the error messages moved onto the next object. I copied two large BTO and three standard mammos in that test, and only the three standard mammos were imported.


    I then tried an 8 second delay and just importing the one file, same result, ie a list of -FAILED as it copied, then nothing; object was deleted, nothing was imported.


    I thought maybe that whilst it appeared to be looping, maybe it would only import at the first or second attempt. So I set the sleep to 90 seconds, and got the initial -FAILED followed 90 seconds later by a successful import.


    To test my hypothesis further, I then set a 40 second delay and tried again with a similar BTO. This time, I have three -FAILED messages followed by a successful import!


    So I don't understand what is happening.

  • Hi,


    I guess it is this: as the import and copying are independent, the delay incurred by a running import will give the other file more time to be finished copying. Maybe it retries because the unlink fails while the file is being written.


    I guess the ultimate solution would be something like this:


    if not on holdoff list:
    import
    if fail -->
    sleep
    import
    if fail -->
    add image to holdoff list
    else
    if image on holdoff list longer than 200s
    remove from holdoff list
    import
    fail --> report fail


    The holdoff list could be implemented like into_UpdateCache in dbsql.cpp, with time(NULL) added to time how long it has been there. It is a bit complicated, though.


    A simpler solution is to make the delay very long or make it a loop of 10 times retry with a shorter delay.


    Marcel

  • Thanks again.


    Might you have sample code for the 10 times loop? I'm thinking that in use, an occasional 15 or 20 second delay to the import would be ok, and if I could loop it then if we do try and import a 1GB+ file in this way, it should work too.


    Otherwise I'll just implement a procedure for large files (ie copy to another folder, then do a 'local' move).


    Is it possible to declare a variable in dicom.ini for the sleep parameter, so that you can set a sensible default of maybe 5 seconds in dgate.cpp, but the value could be tweeked without recompiling it?


    Thanks


    Ed

  • Hi,


    try this one for the retry. Untested.


    Marcel


  • Lovely.


    Small modification: there is a comma in the for loop that should be a semi-colon after i=0:

    Code
    for (int i=0; i<10; i++)


    Other than that, I tested as is on small and large files and it was all peachy :-) I have now compiled it with a loop going to 30 instead of 10, and tried importing files that are about 1 GB - with a following wind the 30 loops of 5 seconds were just enough and they imported fine. I might increase this again, because as long as it isn't infinite I can't see any harm in doing lots of loops. And keeping the sleep to 5 seconds means that smaller objects that just take a second or so to transfer won't get delayed for too long.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!