Conquest version 1.4.19c with PostgresSQL invalid byte sequence for encoding "UTF8": 0xb2

  • I am able to run this query directly into PostgresSQL sql tool and able to insert data fine. and also checked conquest database on PostgresSQL has encoding as UTF8. But conquest throwing out below error while receiving this data via conquest dicom server .Kindly check this and let us know how to fix this. Series Description value having exponential chars.


    [CONQUESTSRV1] ***Failed PGSQLExec : INSERT INTO DICOMSeries (SeriesInst, SeriesNumb, SeriesDate, SeriesTime, SeriesDesc, Modality, PatientPos, Manufactur, ModelName, BodyPartEx, ProtocolNa, StationNam, Institutio, FrameOfRef, SeriesPat, StudyInsta, AccessTime) VALUES ('1.2.840.113619.2.80.2406025060.23062.1552663959.28.13.2', '551', '20190315', '103003', 'ADC (10^-6 mm²/s):Mar 15 2019 10-32-39 CDT', 'MR', 'HFS', 'TEST SYSTEMS', 'TEST', 'HEAD', 'Brain MS', 'TESTMR', 'Test MRI', '1.2.840.113619.2.408.5554020.6883172.15624.1552656627.957', 'O0000678', '2.16.5.5.100.1.0.0.0.20145514094327', 1553066728)

    [CONQUESTSRV1] ***Error: ERROR: invalid byte sequence for encoding "UTF8": 0xb2

    [CONQUESTSRV1] ***Unable to DB.Add()


    Thanks

    Suresh

  • Hi. 0xb2 indicates that a high bit is set on one of the characters. I.e. the scanner is using some non ASCII encoding, Conquest doesn't modify any of the text. What encoding is shown in the image header? Marcel

  • Thanks for the info:


    It is the 2 superscript which is encoded as 0xb2 in ISO IR 100 (Latin 1), but not in UTF8. I.e. SQL should not have been set up for UTF8. There is currently no form of character translation in Conquest, this is one of its weak spots. Options:


    1) change character encoding in postgres SQL - not sure where and how to do this

    2) add a lua script to replace offending characters (with what is then the question?)

    3) add character replacement to conquest source code.

    Note that conquest is self contained in low level C++, and is designed to compile with quite old compilers, restricting what external code can be added. For reference I found this on stack exchange:


    Code
    size_t iso8859_1_to_utf8(char *content, size_t max_size)
    { char *src, *dst;
    //first run to see if there's enough space for the new bytes for (src = dst = content; *src; src++, dst++) { if (*src & 0x80) { // If the high bit is set in the ISO-8859-1 representation, then // the UTF-8 representation requires two bytes (one more than usual). ++dst; } }
    if (dst - content + 1 > max_size) { // Inform caller of the space required return dst - content + 1; }
    *(dst + 1) = '\0'; while (dst > src) { if (*src & 0x80) { *dst-- = 0x80 | (*src & 0x3f); // trailing byte *dst-- = 0xc0 | (*((unsigned char *)src--) >> 6); // leading byte } else { *dst-- = *src--; } } return 0; // SUCCESS
    }

    Regards,


    Marcel

  • Hi,


    I will not mess with character sets in 1.4.19, it a a way too big change, and few users have run into this issue.


    Just read up but character encoding in DICOM is quite complex. As first step I did add an experimental and incomplete UTF-8 handling in 1.5.0alpha, you have to add UTF8ToDB=1 in dicom.ini. It then assumes all text items are ISO_IR 100 and converts them to UTF-8 without checking 0008,0005 (that is again a way bigger change).



    Download source and binaries here: https://github.com/marcelvanhe…15f05002a03dec9b8cde6b87a


    The dgate(64).exe of 1.5.0alpha should run drop in in your server without re-install.


    Marcel

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!