Converting a huge database issues

raduke · Oct 11th 2009

Hello, im in the process of converting (and importing) a very large database of patients (about 1.5 TB).
Goal: getting the old messy db (which is actually scavanged from several sources) from a chaotic (because of multiple sources) .dcm file structure to a tidy conquest directory structure of NKI compressed v2's.
Our strategy is:
1) We put all files on a HDD which have chaotic file structure (filesyntax). We set conquest2 to look there and regen its database upon those files. The files remain with obscene filesyntax, for now untouched, but indexed in conquest2's database.
2) We tell conquest2 to copy database queries to conquest1 which will tidy the filesyntax.

We stumbled upon a handful of issues (mostly regarding minimal cleanup of our messy db) and finally managed to regen it in conquest2. However, when we reached the step of copying the db from conquest2 to conquest1 (which are on same pc actually), we found out that it takes criminal amount of time: we had to move maximum of 2-3k images (out of millions) because of the infamous "allocation" error.
What we basically need is a fast way to copy an entire database from one conquest to another, the goal being v2 compression and tidy filename syntax.
Is our way the only one? And if so, is there another way to copy from one server to another, like, say, a batch of more than ~2-3k images? How could we circumvent the allocation error and copy entire db from one click ?
Thank you.

marcelvanherk · Oct 12th 2009

Hi,

what is the exact error, and in which of the servers does it occur and which version?

Marcel

raduke · Oct 12th 2009

Quote from marcelvanherk

Hi,

what is the exact error, and in which of the servers does it occur and which version?

Marcel

The error is "DICOM move error: Association lost" (see http://www.image-systems.biz/f…c.php?f=33&t=1438&start=0.
The error occurs in conquest2 (the sending server), the first server doesn't even get to receive anything.
Thanx for teh quick reply.

marcelvanherk · Oct 12th 2009

Hi,

The problem seems to be a TCPIP timeout because it takes so long to inspect all files before sending.

I would suggest suggest creating a batch file with dgate --movepatient:local,server1,patientID commands and run that on server2. At least this would be restartable in case of error and it avoids the timeout problem.

Marcel

marcelvanherk · Oct 12th 2009

Hi

To get going, you could also set "EnableReadAheadThread" to 0 in dicom.ini. This avoids a lot of preprocessing on which it seems to timeout and the copying starts at once. I will also try to optimize the preprocessing speed.

But a batch file is preferable: if an error occurs you would not know where to restart.

Marcel

raduke · Oct 12th 2009

Thank you so much for the prompt solutions.
In the meanwhile I did send small batches of studies (about 2k studies per batch) which were reasonable and did not produce any error. However, one batch consisted of more images and produced the error. To my surprise, after about 1 minute, the server started sending the studies, server1 received them but the window that displayed the ("Slice x out of y") did not work.
It did, however, send all studies.
I was wondering, if I could just let it use the big batch, ignore the error and let it do it in "pseudosilent mode". Would I only be missing the "Slice x out of y" output? Or the thing goes deeper than that?
Thank you.

marcelvanherk · Oct 12th 2009

Hi,

There are 3 programs (threads) at work: the c-move sender (query/move page), the server that sends and the server that recieves. It is indeed possible that the link with the c-move sender is broken (no slice xx out of xx is shown), but that the sender and reciever are still communicating. It all depends on the timeouts.

Just from the memory size though, itis impossible to send all data at once. The query is stored in memory, and that has a thousands of chars per image. Trying to send a batch of 50.000 images, the sending system needed 100 MB to store all that information.

Marcel

raduke · Oct 12th 2009

Hello,
We've tried the "EnableReadAheadThread" to 0 solution but it seemed not to be working (tried on a small batch and still got the lag after displaying the query).
We were wondering if there is a way to extend the tcpip timeouts and obtain a working solution by doing so. We tried the dicom.ini command TCPIPTimeOut set to more than 300 (we experienced with 15000), but that too didn't seem to work on a larger batch (the minimum batch that induced error, however). Are there other ways to circumvent that?

Quote

...the sending system needed 100 MB to store all that information.

Is that really 100 GB?
We haven't tried the "dgate --movepatient:local,server1,patientID" solution yet, as we really don't know how to implement it (we basically need to copy all studies(images), older than a specific datestudy, from server2 to server1). But, would making this work allow us to copy zillions of patients in a single batch?
Another thing. We would be interested in setting up a conquest server that would query some virtualservers for a specific patientID and retrieve it's latest, say 3 CT's, based on the fact that that patient did a CT modality study in our clinic that day.
Thanx again, a gazillion times
Radu

marcelvanherk · Oct 12th 2009

Hi,

it is 100 MB: 2000 chars are needed per image to be transmitted (filenames, UID's etc). So if you try to move 1 million images, you would need 2 GB memory, which won't work as it does not fit into memory on a 32 bits machine.

The movepatient trick will not do what you want, but a movestudy will be workable.

The following command lists date and UID of all studies (unwanted information is formatted as %0.0s, i.e., not printed at all):

dgate "--studyfinder:local||%s,%0.0s%0.0s%0.0s%s" > studies.txt

Studies.txt will contain such lines:

20070812,1.2.826.0.1.3680043.2.135.733552.44047153.7.1243911268.234.67

import this file in excel, sort it on date (before the ,), and use excel or some form of macro editor to extract the UIDs into a batch file with these lines:

dgate --movestudy:local,server2,1.2.826.0.1.3680043.2.135.733552.44047153.7.1243911268.234.67

The batch file will move data study by study. Alternatively you can create a batch file like this, with one line per date (range). This will move data based on the studydate. If your move is too big, you will run into the same problem though.

dgate --movestudies:local,server2,20010101-20012231
dgate --movestudies:local,server2,20020101

Marcel

raduke · Oct 15th 2009

Thank you for the prompt answers.
We actually scriptgenerated a batchfile for to make a dgate64 --movestudies: command for each day ranging in our study. That way the queries were small and it ensured a nice smooth working (we do have 64bit system, btw). It worked like a charm. Thanks again for the elegant suggestion.
We now are facing a delicate problem. Running some SQL queries plus a PHP script to validate the patientID (we use our national PNC for patientID), we sadly have found out that about 10-12% of our studies have wrong PNC. This is unacceptable as our final goal is to make a conquest router fetch latest 3 exams from the archive, which would require a precise patientID match. I made a PHP script to export those mistaken IDs into a list. Hopefully at some point we will be able to correct those PNCs by looking into hospital archives etc.
The trick is that we would like to automate the process of correcting the wrong PNCs once we know the correct ones (we're talking of about 10k of wrong PNC, thus it's a job worth automating). Moreover, for any given patient it is most likely that only a minority of his studies (generally 1 out of many) will have a bad PNC. Thus, we were looking for a way to correct only that study.
I've been looking in the -h command of dgate and saw there are a couple of choices for us. My best clue would be to make a script running an sql query (with some joins) and then generate a batch file with dgate commands. However, I'm not yet sure as to which particular command to focus on: --modifypatid:patid, file would be an obvious first choice, and apply it to the files that belong to the bad study. That would imply some cumbersome mysql scripting and some huge workspace. I was wondering if the 'merge' commands could do the trick better, but I'm not sure of how to use the merge commands. Any suggestions? Also, any insight on how to fetch latest 3 studies of a patient from virtualservers?
Thank you alot.

marcelvanherk · Oct 15th 2009

Hi,

If you have the list of invalid patient ID's (and optionally the studies they are in), modifypatientid is the right choice. It will change the patient ID, change UID's to avoid UID clashes, enter the new file into the server, and delete the original file. The scripts would have to run two levels: an image lister (dgate --imagelister) to list all files to the incorrect studies and generate a batch file. That file is then run changes all of them. These scripts can be found on the forum (http://www.image-systems.biz/f…5&hilit=imagelister#p6565). Detecting which patient ID's are wrong is beyond the scope of the server.

The virtual server will work, but only collects data on request. To get the latest studies (or to prefetch in a virtual server context), you would need to use the "get" script command that has no options to access a preset numver of studies (yet) but that does filter on studydate and modality. For example, this command would prefetch all CT studies of last year when recieving a new CT of the same patient.

ImportConverter0 = ifequal "%m", "CT"; get patient modality CT now-365+000 from PACS

The presence of the "modality" and/or "now" items forces to server to collect all SERIES with matching modality and date for the incoming patient. The get will by default execute 10 minutes after the importconverter runs to avoid multiple execution for each incoming image.

Marcel

Converting a huge database issues

Participate now!

Share