This project has moved. For the latest updates, please go here.

shortcomings of database operation codes in openPDC

Developer
Dec 16, 2010 at 11:16 PM
Edited Dec 16, 2010 at 11:21 PM

Hello:

we are from WSU, and are developing a simulation platform based on openPDC. Configuring openPDC manually is not acceptable for our application, so we develop a program which can parse a load flow file and configure openPDC automatically. During the development, we found that openPDC has some shortcomings in database operation. I hope these shortcomings can be overcomed in future version, making openPDC more efficient in database operation.

1. impossible for unit test.

In our application, we must build our own "Database Access Layer", but the problem is that if we build our DAL based on codes in "openPDCManager.Web.Data.CommonFunctions.cs", then our DAL cannot be unit tested.

That is because, openPDC uses class "ConfigurationFile" to read configuration file and retrieve database connection string. But during unit testing, "ConfigurationFile" detects current application type to be "unknown" and refuses to load configuration file. This shortcoming makes automatic unit testing impossible, and we have to do manual test.

2. low efficiency in inserting large volume data.

Although new version openPDC has changed some codes to let multiple function calls share the same database connection, there are still more improvement needed to be done.

Example-1. Some tables in openPDC have auto-generated primary key. When inserting records into such tables, codes in class "CommonFunctions" always involves two database operations: first insert, and then query the database for that newly-added record by a unique name. But the same result can be achieved by a single "ExecuteScalar" call, then half database operation can be saved.

Example-2. In the implementation of function "CommonFunctions.GetDeviceByAcronym", the codes actually load all the records from remote server to local dataset, and filter that local dataset, and return the wanted device. In such a way, a lot of un-wanted records are also transported from server to client, and filtering local dataset is actually a linear search.

In my opinion, a much more efficient method is embedding the query condition into SQL, and sending that SQL to remote database server. Database server has a lot of optimized mechanism to execute that SQL efficiently, and just send the wanted result back to client. I cannot see any reason why not do in such a way.

Example-3. Before inserting each measurements, there are always some codes to query table "SignalType" and find proper signal types for PMU, voltage phasor or current phasor. The problem is that since the content in SignalType are constant, it is a waste of time to query it repeatedly. 

I think a better method is to query "SignalType" once and cache the result, and when inserting measurement, just look up in the cache for proper signal types, instead of querying the remote server. And I know the codes in "CommonFunctions.cs" are mainly used for web service. Although state sharing in WCF is constrained by InstanceMode, but there are still a lot of ways to share state among multiple web service calls.

Actually, above optimization methods have already been used in our DAL. We do experiment on two area system. For such a small system, we want to insert 8 nodes, 105 devices, 196 phasors, 686 measurements into database. By using API provided by "Web.Data.CommonFunctions", 5.99 seconds are costed, while by using our optimized API, only 1.31 seconds are costed. I believe our optimized API will save much more time when dealing with large system.

I hope my advices above can be helpful to improve the efficiency of openPDC. Thank you.

Developer
Dec 17, 2010 at 10:52 AM

Hi Checka,

All of your suggestions are important and will improve efficiency. Most of this code was started in TVA as a very basic simple system (without input device wizard and lot of other fancy features you see now) with SQL back end and then it grew into a big complicated open source system which also supports other databasees such as mySQL and MS Access. Also originally it was developed as a silverlight based web system and data layer (including WCF service) was running only on server so "Example-2" was not a big issue then because client received only filtered data back. Also adding device was supported only manually by a form based entry so "Example-3" code was written to add one device at a time. Issue in "Example-1" was due to the fact that either mySQL or Access did not like it but I will test this again just to make sure. You might have noticed few other strange tactics in there like retrieving boolean value as an object from database and then convert it to boolean. This was done to support mySQL. So when system grew these code was not reworked whole lot. I will go ahead and implement your suggestions and test them against all three database types in next few days.

We really appreciate your inputs and please continue to do so. Thank you,

Mehul Thakkar

Developer
Dec 17, 2010 at 5:30 PM

Hi mthakkar:

Thank you for your reply. Historical reason is not welcomed, but it can never be avoided in software development process, so I fully understand what you said. 

I still want to discuss with you on some issues:

Example-1. Before I post the message, I just guessed that it is the compatibility reason which cause the author drops using "ExecuteScalar".  your replay just confirms my guess. I am not very familiar with MySQL, but I think it should also support returning the auto-generated primary key when inserting a record, otherwise, I cannot understand why it has comparable performance with SQL server and Oracle. I don't know whether Access has the same feature. 

In my opinion, it is not a good solution to drop some advanced feature of popular database engine (sql server, or oracle) just for compatibility with some unpopular, seldom-used database engine (access). A better solution may be, for each kind of database engine, designing a specific API which can apply the advantage of that database engine. That maybe cost too much effort for us, so that's why a lot of ORM frameworks appear. OpenPDC is a open source project, I don't know whether it has limitation to use MS Entity Framework. If you want to use open source ORM, you can try Subsonic (http://subsonicproject.com/).

Example-2. If you host Sliverlight server and database server on the same computer, transporting unwanted records maybe not a big problem. However, although inter-process communication doesn't cost as much as cross-machine communication, but it still incur some communication overhead. 

Even data transportation is not a big problem, another problem is that codes in "CommonFunctions" filter local dataset to find the wanted record, and that local filtering is simply a huge loop, its time cost will be O(N), "N" is the number of records returned, including those unwanted ones. Take "GetDeviceByAcronym" for an example, if you embed your query condition into SQL and let database engine to execute it, and your DBA builds index on column "Acronym", then the time cost will nearly be O(1). so I still think it is still worthy to improve that part. 

Example-3. I know the codes in "CommonFunctions" are designed for manual operation on UI, other than batch, automatic operations. Due to using WCF, I guess the author just want to keep every function call state-less, which makes the codes simple. But manual operation on UI can also benefit from caching. For example, if the user of openPDCManager wants to insert three devices one by one, if you don't cache and query repeatedly, then all three saving operation will be the same slow. However, if you use cache, only the first saving will be slow, and the following operations will be fast, which must be a comfortable user experience. And just as what I mentioned in my first post, caching is feasible in WCF. 

I will be very glad if any of my suggestions can be helpful for openPDC. Any further discussion is welcomed. 

Thank you.

Developer
Dec 21, 2010 at 11:16 PM

recently, we do another experiment,

we insert 114 nodes, 1676 devices, 3352 phasors, 11732 measurements into database

if I use API provided by "openPDCManager.Web.Data", it will cost 649.0051344 seconds, more than 10 minutes

however, if I use our optimized API, it only cost 10.2354463 seconds, only 1/65 of the previous time cost.

 

Developer
Dec 30, 2010 at 6:30 PM

Just FYI, I have implemented your suggestions number 2 and 3 in a change set # 61459. Suggestion number 1 introduces DBMS specific code so I will have to skip that one for now.

About unit testing I am sure there are ways you can automatically copy and rename config file to your test project before running any tests.

Using Entity Framework to generate DBMS specific libraries is not feasible based on our current priorities and available resources. Also with Entity Framework (before version 4 came out), database schema changes and other customization was cumbersome.