Mongo DB and its application (a case study of MongoDB in Healthcare)

Today I am sharing about Mongo DB and how it can be used in healthcare along with a comparison of MongoDB with CouchDB.

Recent days, any product development demands more than what we planned initially, in terms of scalability, flexibility and performance. So, considering the growing demands, we should pick tools and technology cautiously before we start the development.

With my need for creating a massive and flexible platform for healthcare industry that should serve nations like India, China, US and other countries, I was analyzing various tools and technologies to find the right tool for my demands.

Some of the objectives of my application

  1. Scalable – with ability for quick turnaround
  2. Constraint Less – my application should be highly customizable / configurable
  3. High Performance and availability
  4. Compatible for Desktop, Mobile and other devices
  5. Cloud Friendly and more…
Why NoSQL or Document Stores and not only RDBMS for healthcare?
Healthcare information is not just about number of records or entities. It deals with hugely diversified data from different specialties (like cardialogy, neurology, etc..), and need to deal with data of Clinical Care, Medication, Labs, History, Demographics, Reports and more. And we should clearly understand that each and every data will have changes every now and then. For example, healthcare standards organization may ask any healthcare providers to introduce / modify / remove a procedure in any given specialty any time. Hence the platform design should incorporate all these demands without any compromise.
Need for Dynamic Contents and Form:
Healthcare or any platform solutions demand for the need to have configurable or dynamically built forms. Here in healthcare, we should always consider to have all the forms and content dynamically. We can build a administrative portal that can be used to build the forms or content. This will be helpful to add / update / remove any procedures or protocols added to healthcare entity as we discussed in previous paragraph. This need to deal with various types of data elements and components. This demand will also require a database without any constraints (schema less).
Considering all these aspects, a healthcare solution cannot be built only using RDBMS or traditional methodologies. We can mix and match various tools for corresponding needs. Any document oriented database can be used to store the Patient record, healthcare record, medication records and lab records as they are subject to change quite often as well as it changes from one healthcare provider to other. Also, we need to have a database, that is highly configurable, without constraints for the need to build dynamic content and forms. There are quite a huge set of document stores / NoSQL databases available for us. But for this case, we picked MongoDB as it is document oriented store without any limitations or schema.
Please note that MongoDB need not be considered as complete database solution across the whole platform. I would recommend to use other technologies like Vertica (columnar database) for reporting and BI solution around same platform and Hadoop for free text based analysis for internal BI tools.

I am gonna talk only about MongoDB today.. But, why MongoDB? why not other document stores / NoSQL databases? There are few reasons for us to think about MongoDB.

Directly uses BSON / JSON Format

It is easy for any developer to use MongoDB as it deals with JSON objects. This eliminates the use of data transposal / manipulation in controllers layer. For eg. if we are using RDBMS or some other databases in backend and with some RIA in front end (most of the RIAs always gets and produces JSON data), you need to convert the JSON to resultset or some POJOs before you persist them in DB. If you use any document store, you can directly store JSON data from UI if you have properly designed both the ends.

Rich Driver Support

MongoDB drivers are available for most of the middle ware or server side scripting languages which you can refer here. The drivers have very good implementation of all the MongoDB controls.

Performance

MongoDB has a good score when compared against its counter parts. MongoDB has its own native drivers, where most of its counter parts has only REST feature available; where in case of MongoDB it has REST access as well as direct protocol access through its drivers. The native drivers will have some positive points over using only REST protocol to access data from our DB. As well, MongoDB uses prealloc strategy to store data, which may again help to improve the performance to a considerable level if we are dealing with lot of access to and from DB.

I am glad to share some of the testing results that I have done considering my healthcare platform

Test Case Details: MongoDB vs CouchDB (check another blog here)

The metrics are like this

Think that  one patient document has four different entities like demographic, medication, lab and more. If you see the test results, each time, I have changed the number of entities per parent entity.

Download / View: MongoDB_VS_CouchDB_TEST_RESULTS

No of Records Demographic Medication Lab Other columns Test Result of COUCH DB – Couch4j Library
Size in MB Time taken to save Time taken to get 1 attribute (parent : name) Time taken to create
1 as 1

100

100

100

100

1.8

665

644

450

100 as 1

100

100

100

100

180.1

58501

43595

18394

950 as 50

100

100

100

100

1700

561915

150221

10000 as 1

100

100

100

100

1 as 1

100

1000

500

100

12

3885

1266

25 as 1

100

1000

500

100

297.9

96821

18730

30 as 1

100

1000

500

100

40 as 1

100

1000

500

100

475 as 1

100

1000

500

100

5500

1541609

1463199

333686

4500 as 1

100

1000

500

100

No of Records Demographic Medication Lab History Max No of records can be stored in 32 Bit (2.5 GB Limitation) Test Result of Mongo DB
Size Time taken to store Time taken to get 1 attribute (parent : name) Time taken to create 1 object # Data Files
1 as 1

100

100

100

100

1000

64

389

249

652

1 + 1
100 as 1

100

100

100

100

190

5460

17565

11276

2 + 1
950 as 50

100

100

100

100

1930

68161

138568

5 + 1
10000 as 1

100

100

100

100

11900

757029

1342676

7 + 1
1 as 1

100

1000

500

100

100

208

693

1381

2 + 1
25 as 1

100

1000

500

100

448

9814

20550

3 + 1
30 as 1

100

1000

500

100

448

17196

24230

3 + 1
40 as 1

100

1000

500

100

448

15259

33704

3 + 1
475 as 1

100

1000

500

100

5900

192082

383862

7 +1
4500 as 1

100

1000

500

100

31900

1766936

3516905

20 + 1

Note: This case study is purely based on my own scenarios and test cases. It may vary depending your scenarios. I am sharing whatever results I got at that time based on my cases only. You need not consider this as your complete reference.

Performance Score based on test results above: 

If you see the above score card, it is very evident that MongoDB has won the performance scores.

Memory Score based on test results above:

You may look that the data file size is huge in terms of MongoDB against CouchDB. It is because of the pre allocation mechanism to store the data. Whenever you create a new document, it pre allocates a space and fills with dummies initially, so that it avoids the time to allocate memory at each and every write; which again slightly has some performance credits. Nowadays, we have luxury to use more memory as we have huge availability and not too costly too.

Compatibility (32 bit system and 64 bit systems):

In this sense, CouchDB has higher score when compared to MongoDB in terms of 32 bit systems. Because, in MongoDB you cannot store data more than 2 GB approxmiately. You can see more details here.

Compatibility with reporting tools and technologies:

MongoDB has highly compatible adapters / drivers to different frameworks other than data access drivers. It has adapter to BIRT reporting, Pentaho for ETL and reporting. Since we have lot of adapterd provided for different languages, we can write our own adapters to fit our needs.

Platform (PaaS) availability:

We have huge set of PaaS providers for MongoDB as listed here. These will reduce the burden of any developer / administrator and business to overcome the burden of maintaining the databases on their own. So, scaling of databases became piece of cake for users.

Support:

10gen is behind MongoDB. They have a very good premium support options. In terms of groups and public support, we have huge list of communities available to support us.

Overall:

If we re-assess all the aspects of the need for healthcare platform, it is pretty clear that we should use a tool like MongoDB as database that stores the data of clinical care, medication, labs, history and to manage the dynamic contents and forms as well. This improved a lot of our development turn around time, to make our platform as configurable as possible more scalable.

Important Notice: All these study and information given is purely based on my own analysis and assessment. It may change based on the respective business model and needs. Kindly do an elaborated analysis before finalizing on any tool for the system.

JBOSS AS 7 as service (JBoss AS 7 in Silent [background] mode)

In most of our cases we will be pushed to instances requiring to run JBoss AS 7 in silent mode in our LINUX environment. It is very simple to make our JBoss AS 7 to run in silent or in background mode. Let us come out of the run.sh mode as I am demonstrating a very simple idea here.

Environment

OS: RHEL 5.4 64 Bit

Assumptions:

Installation of  JBOSS AS 7 under /opt/jboss

Installation of JDK under /usr/java/jdk1.6.0_30

Expectation:

To start and stop JBoss like a service. (Eg. service jboss start / service jboss stop)

Steps:

1. Create java.sh file under /etc/profile.d/java.sh with following contents

export JAVA_HOME=/usr/java/jdk1.6.0_30
export PATH=$JAVA_HOME/bin:$PATH

In this step, we are establishing the installed JDK to be used to run our JBoss instance.

2. Create jboss.sh file under /etc/profile.d/jboss.sh with following contents

export JBOSS_HOME=/opt/jboss
export PATH=#JBOSS_HOME/bin:$PATH
export MODULEPATH=

In this step, we are setting JBOSS_HOME and make to come under path visibility. There is a reason to make ‘MODULEPATH=’ as empty. If not, JBoss AS 7 may not get loaded and throw  exceptions as it may not be able to load modules..

3. Create jboss file under /etc/init.d/jboss with following contents

#!/bin/sh
### BEGIN INIT INFO
# Provides: jboss
# Required-Start: $local_fs $remote_fs $network $syslog
# Required-Stop: $local_fs $remote_fs $network $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Start/Stop JBoss AS v7.0.0
### END INIT INFO
#
#source some script files in order to set and export environmental variables
#as well as add the appropriate executables to $PATH
[ -r /etc/profile.d/java.sh ] && . /etc/profile.d/java.sh
[ -r /etc/profile.d/jboss.sh ] && . /etc/profile.d/jboss.sh

start(){
echo “Starting JBoss 7”
sh /etc/init.d/iptables save
sh /etc/init.d/iptables stop
sh ${JBOSS_HOME}/bin/standalone.sh -b 0.0.0.0 >/dev/null 2>/dev/null &
}

stop(){
echo “Stopping JBoss 7”
sh ${JBOSS_HOME}/bin/jboss-admin.sh –connect command=:shutdown
}

restart(){
stop
# give stuff some time to stop before we restart
sleep 60
# protect against any services that can’t stop before we restart
su -l jboss -c ‘killall java’
start
}

case “$1” in
start)
# echo “Starting JBoss AS 7.0.0”
# sudo -u jboss sh ${JBOSS_HOME}/bin/standalone.sh -b 0.0.0.0 > /dev/null
start
;;
stop)
# echo “Stopping JBoss AS 7.0.0”
# sudo -u jboss sh ${JBOSS_HOME}/bin/jboss-admin.sh –connect command=:shutdown
stop
;;
restart)
# echo “Restarting JBoss AS 7.0.0”
restart
;;
*)
echo “Usage: /etc/init.d/jboss {start|stop|restart}”
exit 1
;;
esac

exit 0

4. After creating these three files, execute the following command

# chmod a+x /etc/init.d/jboss

#chmod a+x /etc/profile.d/jboss.sh

#chmod a+x /etx/profile.d/java.sh

5. Now login as SU

# su

and execute

#service jboss start

This should start your jboss server in background.

Conclusion:

Thats all, you made your JBoss AS 7 to run in silent (background) mode.

Datasource configuration setup for JBoss AS 7 with example of Postgresql

Our application always demands the datasource setup in the server side to manager all our database connections.

We need to understand the modules available in JBoss before even we are starting the setup. Modules folder is located under \JBOSS-HOME\modules under which you can see lot of packaged structure which starts like com, org, etc.. There we need to copy our desired jar files by creating or copying it under right hierarchy.

In my instance,

1. Deploy Driver / Module of DB

I copied my postgresql-8.4-701.jdbc3.jar file to c:\jboss\modules\org\postgresql\main and created module.xml with following content

<jboss:module:1.0″ name=”org.postgresql”>
<resources>
<resource-root path=”postgresql-8.4-701.jdbc3.jar”/>
</resources>
<dependencies>
<module name=”javax.api”/>
<module name=”javax.transaction.api”/>
</dependencies>
</module>

Then, try restarting the JBoss, you should be able to see the postgresql driver deployed and your postgresql-8.4-701.jdbc3.jar.index created in the same directory where you copied your jar file.

2. Configure DB Driver

Then after successful deployment of you driver module, its time to edit your standalone.xml or domain.xml by appending following piece of xml code to it,

<driver name=”org.postgresql” module=”org.postgresql”>
<xa-datasource-class>
org.postgresql.xa.PGXADataSource
</xa-datasource-class>
</driver>

Copy this information under subsystem>datasources element. Now, it means that you have included the same driver for the standalone startup. This enables the user to add the datasource to his server. This can be achieved in may ways, thought admin console in web, or programatically by editing standalone.xml / domain.xml or by CLI.

I will explain how to add datasource programatically and test it in CLI and sample java code.(contact me if you need steps to deploy by other means).

3. Configure Datasource in standalone.xml / domain.xml

Edit your standalone.xml present in c:\jboss\standalone\configuration folder. There add following lines of code under subsystem>datasources element

<datasource jndi-name=”java:jboss/datasources/Test” pool-name=”java:jboss/datasources/Test_Pool” enabled=”true” jta=”true” use-java-context=”true” use-ccm=”true”>
jdbc:postgresql://192.168.1.192:5444/testdb
<driver>org.postgresql</driver>
<security>
<user-name>testuser</user-name>
<password>testpassword</password>
</security>
</datasource>

Note: Make sure the datasource name has the prefix of java:\ or java:jboss\ or else, your datasource cannot be referenced.

Then with that, you are done on the configuration side. Start/restart your server where you can notice in console that the datasource is started.

4. Testing Datasource

4.1 Testing the datsource in CLI

Connect your CLI to your local server by the following code

C:\jboss\bin>jboss-admin.bat
You are disconnected at the moment. Type ‘connect’ to connect to the server or ‘help’ for the list of supported commands.
[disconnected /] connect
Connected to standalone controller at localhost:9999
[standalone@localhost:9999 /]

Then execute the following code,

[standalone@localhost:9999 /] /subsystem=datasources/data-source=java\:jboss\/datasources\/Test:test-connection-in-pool
{
“outcome” => “success”,
“result” => [true]
}

4.2 Testing datasource in JAVA Code

Note: You cannot test datasource from a standalone java api. You should put that code / java api in side a war / web application and deploy in to the JBoss where you have deployed the data source and try it.

In my case, I have created a servlet called TestServlet and in GET method I have added the following lines of code

DataSource ds = null;

Context ctx = null;

try {
String strDSName = “java:jboss/datasources/Test”;
ctx = new InitialContext();
ds = (javax.sql.DataSource) ctx.lookup(strDSName);
resp.getWriter().print(“Success getting DS : ” + ds.getClass());
} catch (Exception e) {
resp.getWriter().print(“Error getting DS : ” + e);
}

Try calling the servlet by giving its context/servlet path which should print the success message.

Thanks.

Kousik Rajendran.