Mongo DB and its application (a case study of MongoDB in Healthcare)

Today I am sharing about Mongo DB and how it can be used in healthcare along with a comparison of MongoDB with CouchDB.

Recent days, any product development demands more than what we planned initially, in terms of scalability, flexibility and performance. So, considering the growing demands, we should pick tools and technology cautiously before we start the development.

With my need for creating a massive and flexible platform for healthcare industry that should serve nations like India, China, US and other countries, I was analyzing various tools and technologies to find the right tool for my demands.

Some of the objectives of my application

Scalable – with ability for quick turnaround
Constraint Less – my application should be highly customizable / configurable
High Performance and availability
Compatible for Desktop, Mobile and other devices
Cloud Friendly and more…

Why NoSQL or Document Stores and not only RDBMS for healthcare?

Healthcare information is not just about number of records or entities. It deals with hugely diversified data from different specialties (like cardialogy, neurology, etc..), and need to deal with data of Clinical Care, Medication, Labs, History, Demographics, Reports and more. And we should clearly understand that each and every data will have changes every now and then. For example, healthcare standards organization may ask any healthcare providers to introduce / modify / remove a procedure in any given specialty any time. Hence the platform design should incorporate all these demands without any compromise.

Need for Dynamic Contents and Form:

Healthcare or any platform solutions demand for the need to have configurable or dynamically built forms. Here in healthcare, we should always consider to have all the forms and content dynamically. We can build a administrative portal that can be used to build the forms or content. This will be helpful to add / update / remove any procedures or protocols added to healthcare entity as we discussed in previous paragraph. This need to deal with various types of data elements and components. This demand will also require a database without any constraints (schema less).

Considering all these aspects, a healthcare solution cannot be built only using RDBMS or traditional methodologies. We can mix and match various tools for corresponding needs. Any document oriented database can be used to store the Patient record, healthcare record, medication records and lab records as they are subject to change quite often as well as it changes from one healthcare provider to other. Also, we need to have a database, that is highly configurable, without constraints for the need to build dynamic content and forms. There are quite a huge set of document stores / NoSQL databases available for us. But for this case, we picked MongoDB as it is document oriented store without any limitations or schema.

Please note that MongoDB need not be considered as complete database solution across the whole platform. I would recommend to use other technologies like Vertica (columnar database) for reporting and BI solution around same platform and Hadoop for free text based analysis for internal BI tools.

I am gonna talk only about MongoDB today.. But, why MongoDB? why not other document stores / NoSQL databases? There are few reasons for us to think about MongoDB.

Directly uses BSON / JSON Format

It is easy for any developer to use MongoDB as it deals with JSON objects. This eliminates the use of data transposal / manipulation in controllers layer. For eg. if we are using RDBMS or some other databases in backend and with some RIA in front end (most of the RIAs always gets and produces JSON data), you need to convert the JSON to resultset or some POJOs before you persist them in DB. If you use any document store, you can directly store JSON data from UI if you have properly designed both the ends.

Rich Driver Support

MongoDB drivers are available for most of the middle ware or server side scripting languages which you can refer here. The drivers have very good implementation of all the MongoDB controls.

Performance

MongoDB has a good score when compared against its counter parts. MongoDB has its own native drivers, where most of its counter parts has only REST feature available; where in case of MongoDB it has REST access as well as direct protocol access through its drivers. The native drivers will have some positive points over using only REST protocol to access data from our DB. As well, MongoDB uses prealloc strategy to store data, which may again help to improve the performance to a considerable level if we are dealing with lot of access to and from DB.

I am glad to share some of the testing results that I have done considering my healthcare platform

Test Case Details: MongoDB vs CouchDB (check another blog here)

The metrics are like this

Think that one patient document has four different entities like demographic, medication, lab and more. If you see the test results, each time, I have changed the number of entities per parent entity.

Download / View: MongoDB_VS_CouchDB_TEST_RESULTS

No of Records	Demographic	Medication	Lab	Other columns	Test Result of COUCH DB – Couch4j Library
No of Records	Demographic	Medication	Lab	Other columns	Size in MB	Time taken to save	Time taken to get 1 attribute (parent : name)	Time taken to create
1 as 1	100	100	100	100	1.8	665	644	450
100 as 1	100	100	100	100	180.1	58501	43595	18394
950 as 50	100	100	100	100	1700	561915		150221
10000 as 1	100	100	100	100
1 as 1	100	1000	500	100	12	3885		1266
25 as 1	100	1000	500	100	297.9	96821		18730
30 as 1	100	1000	500	100
40 as 1	100	1000	500	100
475 as 1	100	1000	500	100	5500	1541609	1463199	333686
4500 as 1	100	1000	500	100

No of Records	Demographic	Medication	Lab	History	Max No of records can be stored in 32 Bit (2.5 GB Limitation)	Test Result of Mongo DB
No of Records	Demographic	Medication	Lab	History		Size	Time taken to store	Time taken to get 1 attribute (parent : name)	Time taken to create 1 object	# Data Files
1 as 1	100	100	100	100	1000	64	389	249	652	1 + 1
100 as 1	100	100	100	100		190	5460	17565	11276	2 + 1
950 as 50	100	100	100	100		1930	68161		138568	5 + 1
10000 as 1	100	100	100	100		11900	757029		1342676	7 + 1
1 as 1	100	1000	500	100	100	208	693		1381	2 + 1
25 as 1	100	1000	500	100		448	9814		20550	3 + 1
30 as 1	100	1000	500	100		448	17196		24230	3 + 1
40 as 1	100	1000	500	100		448	15259		33704	3 + 1
475 as 1	100	1000	500	100		5900	192082		383862	7 +1
4500 as 1	100	1000	500	100		31900	1766936		3516905	20 + 1

Note: This case study is purely based on my own scenarios and test cases. It may vary depending your scenarios. I am sharing whatever results I got at that time based on my cases only. You need not consider this as your complete reference.

Performance Score based on test results above:

If you see the above score card, it is very evident that MongoDB has won the performance scores.

Memory Score based on test results above:

You may look that the data file size is huge in terms of MongoDB against CouchDB. It is because of the pre allocation mechanism to store the data. Whenever you create a new document, it pre allocates a space and fills with dummies initially, so that it avoids the time to allocate memory at each and every write; which again slightly has some performance credits. Nowadays, we have luxury to use more memory as we have huge availability and not too costly too.

Compatibility (32 bit system and 64 bit systems):

In this sense, CouchDB has higher score when compared to MongoDB in terms of 32 bit systems. Because, in MongoDB you cannot store data more than 2 GB approxmiately. You can see more details here.

Compatibility with reporting tools and technologies:

MongoDB has highly compatible adapters / drivers to different frameworks other than data access drivers. It has adapter to BIRT reporting, Pentaho for ETL and reporting. Since we have lot of adapterd provided for different languages, we can write our own adapters to fit our needs.

Platform (PaaS) availability:

We have huge set of PaaS providers for MongoDB as listed here. These will reduce the burden of any developer / administrator and business to overcome the burden of maintaining the databases on their own. So, scaling of databases became piece of cake for users.

Support:

10gen is behind MongoDB. They have a very good premium support options. In terms of groups and public support, we have huge list of communities available to support us.

Overall:

If we re-assess all the aspects of the need for healthcare platform, it is pretty clear that we should use a tool like MongoDB as database that stores the data of clinical care, medication, labs, history and to manage the dynamic contents and forms as well. This improved a lot of our development turn around time, to make our platform as configurable as possible more scalable.

Important Notice: All these study and information given is purely based on my own analysis and assessment. It may change based on the respective business model and needs. Kindly do an elaborated analysis before finalizing on any tool for the system.

JBOSS AS 7 as service (JBoss AS 7 in Silent [background] mode)

Datasource configuration setup for JBoss AS 7 with example of Postgresql

2 Replies

Our application always demands the datasource setup in the server side to manager all our database connections.

We need to understand the modules available in JBoss before even we are starting the setup. Modules folder is located under \JBOSS-HOME\modules under which you can see lot of packaged structure which starts like com, org, etc.. There we need to copy our desired jar files by creating or copying it under right hierarchy.

In my instance,

1. Deploy Driver / Module of DB

I copied my postgresql-8.4-701.jdbc3.jar file to c:\jboss\modules\org\postgresql\main and created module.xml with following content

<jboss:module:1.0″ name=”org.postgresql”>
<resources>
<resource-root path=”postgresql-8.4-701.jdbc3.jar”/>
</resources>
<dependencies>
<module name=”javax.api”/>
<module name=”javax.transaction.api”/>
</dependencies>
</module>

Then, try restarting the JBoss, you should be able to see the postgresql driver deployed and your postgresql-8.4-701.jdbc3.jar.index created in the same directory where you copied your jar file.

2. Configure DB Driver

Then after successful deployment of you driver module, its time to edit your standalone.xml or domain.xml by appending following piece of xml code to it,

<driver name=”org.postgresql” module=”org.postgresql”>
<xa-datasource-class>
org.postgresql.xa.PGXADataSource
</xa-datasource-class>
</driver>

Copy this information under subsystem>datasources element. Now, it means that you have included the same driver for the standalone startup. This enables the user to add the datasource to his server. This can be achieved in may ways, thought admin console in web, or programatically by editing standalone.xml / domain.xml or by CLI.

I will explain how to add datasource programatically and test it in CLI and sample java code.(contact me if you need steps to deploy by other means).

3. Configure Datasource in standalone.xml / domain.xml

Edit your standalone.xml present in c:\jboss\standalone\configuration folder. There add following lines of code under subsystem>datasources element

<datasource jndi-name=”java:jboss/datasources/Test” pool-name=”java:jboss/datasources/Test_Pool” enabled=”true” jta=”true” use-java-context=”true” use-ccm=”true”>
jdbc:postgresql://192.168.1.192:5444/testdb
<driver>org.postgresql</driver>
<security>
<user-name>testuser</user-name>
<password>testpassword</password>
</security>
</datasource>

Note: Make sure the datasource name has the prefix of java:\ or java:jboss\ or else, your datasource cannot be referenced.

Then with that, you are done on the configuration side. Start/restart your server where you can notice in console that the datasource is started.

4. Testing Datasource

4.1 Testing the datsource in CLI

Connect your CLI to your local server by the following code

C:\jboss\bin>jboss-admin.bat
You are disconnected at the moment. Type ‘connect’ to connect to the server or ‘help’ for the list of supported commands.
[disconnected /] connect
Connected to standalone controller at localhost:9999
[standalone@localhost:9999 /]

Then execute the following code,

[standalone@localhost:9999 /] /subsystem=datasources/data-source=java\:jboss\/datasources\/Test:test-connection-in-pool
{
“outcome” => “success”,
“result” => [true]
}

4.2 Testing datasource in JAVA Code

Note: You cannot test datasource from a standalone java api. You should put that code / java api in side a war / web application and deploy in to the JBoss where you have deployed the data source and try it.

In my case, I have created a servlet called TestServlet and in GET method I have added the following lines of code

DataSource ds = null;

Context ctx = null;

try {
String strDSName = “java:jboss/datasources/Test”;
ctx = new InitialContext();
ds = (javax.sql.DataSource) ctx.lookup(strDSName);
resp.getWriter().print(“Success getting DS : ” + ds.getClass());
} catch (Exception e) {
resp.getWriter().print(“Error getting DS : ” + e);
}

Try calling the servlet by giving its context/servlet path which should print the success message.

Thanks.

Kousik Rajendran.

Share this:

Share this:

Share this: