This project has moved. For the latest updates, please go here.

bugs in openPDC

Developer
Mar 13, 2010 at 8:16 PM

Hello:
 I am a research associate in the School of Electrical Engineering and Computer Science, Washington State University. Our research group, led by Professor Anjan Bose, is performing a software system based on openPDC. For doing my job well, I have read most of openPDC's source codes. I have found that openPDC is a wonderful and powerful tool, and I highly admire your achievements. But I also find some problems in the source code, which I will write below in this letter, and I will be very glad if my discovery can be helpful to improve openPDC.

1. openPDC has some problem in supporting IPV6.
    I have download a openPDC version, whose name is "openPDC-41754", and intall it on my laptop. My operating system is Windows7. at first, it runs well, but after a couple of days, I found that openPDCConsole cannot connect with openPDC. After debugging, I found that the bug is from "TVA.Communication.Transport.CreateEndPoint". In its implementation, when the parameter "hostNameOrAddress" is empty, and the operating system supports IPV6, this function will return a "new IPEndPoint (IPAddress.IPv6Any, port);", which caused the connection failure. After I modified codes, making it always returns "new IPEndPoint(IPAddress.Any, port);", openPDCConsole can connect openPDC successfully. 

It is strange that this problem didn't happen the first time I run openPDC, but can repeat if I don't modify the codes. I guess the installation of some windows update is the real cause of this problem.

2. there is race condition in "PhasorDataConcentratorBase"
I know that openPDC's concentrator is based on a "Producer/Consumer" Pattern, that is, "SortMeasurements" will put the newly accepted measurements into a queue, and another thread "m_publicationThread" will invoke "PublishFrames" to pop up frames from the queue and publish them out. Since concentrator has used multi-thread mechanism, synchronization methods to avoid race condition is very important. "ConcentratorBase" class is very careful in this point, in its "Producer" side, "AssignMeasurementToFrame" first lock the frame before putting measurements into it, and in its "Consumer" side, "PublishFrames" also locks the frame first before toggling its "Published" flag to stop receiving any more measurements.

But for its child class "PhasorDataConcentratorBase", it seems that the author forgot the synchronization. In its overriden "AssignMeasurementToFrame" and "PublishFrame", the author forgot to lock the frame before operation, so there is possibility that the data frame is under publishing when it is still being inserted with new measurements at the same time, which is a serious race condition.

3. there is some bug when generating CRC number for IEEE 1344 protocol.
in "TVA.PhasorProtocols.Ieee1344.ConfigurationFrame.Initialize" method, I add some codes like below:
... ...
    // there is some bug in this un-marshing process, CRC check is valid when doing on the initial buffer
                        // ################ cheka debug ####################//
                        bool isValid = CommonFrameHeader.ChecksumIsValid(binaryImage, startIndex, length);
                        System.Diagnostics.Debug.Assert(isValid);
                        // ################ end cheka debug ####################//

                        // Parse out header, body and footer images
                        startIndex += ParseHeaderImage(binaryImage, startIndex, length);
                        startIndex += ParseBodyImage(binaryImage, startIndex, length - startIndex);
                        startIndex += ParseFooterImage(binaryImage, startIndex, length - startIndex);

                        // but when do CRC check to the configuration frame parsed out, the result is invalid
                        // ################ cheka debug ####################//
                        isValid = CommonFrameHeader.ChecksumIsValid(this.BinaryImage, 0, this.BinaryLength);
                        System.Diagnostics.Debug.Assert(isValid);
                        // ################ cheka debug ##############
... ...
the added codes first do CRC check on the data buffer passed in as parameter, the check is successful. And then after a configuration frame is parsed out from the input buffer, I do CRC check on this frame's "BinaryImage", the check fails. there must be a bug.

4. Asynchronous Sending Problems
in "TVA.PhasorProtocols.PhasorDataConcentratorBase.PublishFrame", the author uses "MulticastAsync", which is implemented with asynchronous IO method, to publish out frames. Asynchronous IO, in my opinion, is nothing more than a multi-threaded mechanism, except that
   (a)  the multiple threads are managed by OS which we don't need to pay much attention.
   (b)  the threads exist in a thread-pool, which saves a lot of system resource.


since asynchronous IO still belongs to multi-threaded technology, so there is still risk to cause race condition.  The classic pattern to use asynchronous IO is first posting an asynchronous IO operation, and in its completion event handler, the author launch another new asynchronous IO operation. In this pattern, there is always only one IO thread working on an individual device at any time, which avoids race condition.


But the implementation of "TVA.PhasorProtocols.PhasorDataConcentratorBase.PublishFrame", in my opinion, violates this pattern. in its implementation, as long as one frame is available, a background thread will be launched to send it out, so there will be multiple threads writing to the same socket simultaneously, which will results in serious race condition to this socket's sending buffer.

5. Extension Problem
The project we are performing includes a distributed state estimation, in which each substation performs local state estimation, and then sends its result to control center to perform an overall state estimation for the whole grid.


To reuse the functions provided by openPDC (such as time-alignment), I should derive from the base class such as "DataFrameBase", "HeaderFrameBase" to define our own format to describe the state estimation result. Though deriving and implementation is not a hard job, but to integrate the new data format with openPDC tool, such as "MultiProtocolFrameParser" is not so convenient.

I have to modify the definition of "PhasorProtocol" to add a new item, and then modify "MultiProtocolFrameParser.InitializeFrameParser" to create and initialize my own parser.  To add a new data format, I should modify the source code, which violates the "Open/Closed Principle", which states that "software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification".

And in our project, the local state estimation result is sent to controll center by a communication middleware called "GridStat", a new transport channel not included in the defintion of Enum of "TransportProtocol". To integrate "GridStat" with "MultiProtocolFrameParser", I have to modify the source code once again.  So I think it needs an improvement to this problem. A better solution maybe makes parser and transport channels also as plug-ins, and integrate them into openPDC by configuration, just like the way for input adapters, action adapters and output adapters.

 I would be very honored if any discussion about the problems listed above is needed. Thank you.

   Chuanlin Zhao

cheka008@gmail.com
   2010-3-13

Developer
Mar 15, 2010 at 3:20 PM

Hi, Chuanlin,

For the issue 3 " there is some bug when generating CRC number for IEEE 1344 protocol."

In your second CRC check, if you modify your code as following:

                        isValid = CommonFrameHeader.ChecksumIsValid(binaryImage, 0, length);
                        System.Diagnostics.Debug.Assert(isValid);

You will still have isValid return true if the first CRC check is true.

Thanks,

Jian Zuo

 

Coordinator
Mar 15, 2010 at 6:46 PM

Hi Chuanlin,

We really appreciate you taking the time out to thoroughly understand the inner working of openPDC and providing us your valuable feedback. To address your concerns in points 1 and 4:

1) By default, openPDC will use IPv6 on OS with IPv6 stack installed and enabled. This will work fine if you are trying to connection to the openPDC service using openPDC Console locally on a IPv6 enabled OS or from any other IPv6 enabled OS, but will not work from any other OS that does not have IPv6 stack. The settings that I would like for you to check are ConfigurationString under remotingServer section and ConnectionString under remotingClient section of openPDC.exe.config and openPDCConsole.exe.config files respectively. The default value for ConfigurationString is Port=8500 and for ConnectionString is Server=localhost:8500, which will work fine on IPv6 enabled OS when connecting locally. You can however force the use of IPv4 on IPv6 enabled OS by change ConfigurationString to Port=8500;Interface=0.0.0.0 and ConnectionString to Server=127.0.0.1:8500;Interface=0.0.0.0 which will enable you to connect to openPDC service using openPDC Console from any OS (one with IPv6 stack and one without).

2) The IServer.MulticastAsync method used by TVA.PhasorProtocols.PhasorDataConcentratorBase.PublishFrame uses Socket.BeginSend which internally uses Windows I/O Completion Ports whereby all asynchronous operations against a completion port (socket or file) are queued and processed in a FIFO order. So even though the operations are being executed asynchronously, they are still processed serially at the transport layer (given that all async operations are issued by the same thread) eliminating race condition involving the socket. That being said, it is possible that there might be a race condition if the buffer used for storing the data to be transmitted is shared across all async calls, but with TVA.PhasorProtocols.PhasorDataConcentratorBase.PublishFrame the buffer being used is local (see TVA.PhasorProtocols.ChannelFrameBase) and is safe from any race conditions.

- Pinal

Developer
Mar 16, 2010 at 4:18 AM

Hi Pinalpatel:

    I am very glad to receive your reply.

    I will try your method to solve the IPV6 problem. But I still think there is some other problems. The strangest thing is that openPDCConsole can connect with openPDC successfully at first, but a few days later, it cannot. I am sure that I didn't modify any source code or configuration during that period of time, so I guess there is conflict between openPDC and some windows updates.

    Another thing I am very interested in is the "IO Completion Port" problem. I have used IOCP before. When I switched to C#, I also try to find how to use IOCP in c#. But all the information I get from the web told me that the only way to use IOCP in .NET is to call the managed codes written in c++, and I find no evidence to relae the asynchronous socket API in c# with IOCP, even in MSDN.

    Would you please tell me where you find the information that the asynchronous socket API in c# were internally implemented with IOCP, and I want to know more technical details. If it is true, I am afraid some parts of my program need to be re-written, to achieve a higher concurrency.

    Thank you.

Coordinator
Mar 16, 2010 at 6:29 PM
Edited Mar 16, 2010 at 6:31 PM

Hi Chuanlin,

I haven't seen anything officially published by Microsoft, but here's a couple of link about asynchronous I/O in .NET and completion ports:

- Wintellect Slidedeck - These guys train Microsoft new hires, so very credible.
- Dr. Dobb's Article - Internal .NET implementation has changed somewhat, but the overall idea stays the same.

Another thing you can do for yourself is to disassemble mscorlib using .NET Reflector and see how Socket class is implemented; you will see that all asynchronous socket operations make a call to Socket.BindToCompletionPort() internal method.

- Pinal

Coordinator
Mar 16, 2010 at 7:41 PM
Edited Mar 16, 2010 at 8:00 PM

First, we are very honored to have one of Anjan’s students reviewing the code. We have a tremendous respect for Dr. Bose, Dr. Bakken as well as Dr. Venkatasubramanian (and WSU as a whole) all of whom have had an instrumental influence on NASPI and synchrophasor technologies as a whole. We also would like to thank you for taking time to evaluate the system - in the end people like yourself will be the one’s adding the most value to the system by finding and helping to correct issues.

 1. openPDC has some problem in supporting IPV6.

We have successfully deployed the openPDC in both IPv6 and IPv4 environments - Pinal has already weighed in on this issue (above). If his suggested configuration settings do not resolve the issue for you please let us know.  Thanks!

 2. there is race condition in "PhasorDataConcentratorBase"

Very observant! I am the author so I should explain my behavioral “deviation” :) The base class is designed around the concept of a “frame” of measurements (implemented as an abstract interface, “IFrame”, containing a “dictionary” of measurement values). This frame becomes the object that holds all time-aligned measurements. As such this dictionary is used for thread synchronization so that the base class can safely stop sorting measurements into the frame before offering it to the consumer for publication. This allows the consumer to use the frame without worry about locking and further modification to the frame by other threads.

The area of code you noticed in “PhasorDataConcentratorBase” is a derived class designed to construct a frame of synchrophasor data in a standard protocol from these constituent measurements. The important thing to note is that although the synchrophasor frame implements the “IFrame” interface as is required by “ConcentratorBase” - it doesn’t actually use this dictionary of measurements. It instead handles assignment of measurements to its frame as a “special case” since the measurements are destined for “special locations “within the frame - not simply maintained as a keyed-list. As a result, I chose not to lock the unused dictionary as an optimization - especially given that processing is already limited to 1 / fps (typically 1/30th or 1/60th of a second). By not using the dictionary the race condition will be avoided even if its not locked in the “AssignMeasurementToFrame” override. It’s rather impressive that you actually were able to dig deep enough to find this however! :)

 

3. there is some bug when generating CRC number for IEEE 1344 protocol.

The phasor protocol library is written to be completely “bi-directional” - that is, not only can it “parse” data from the synchrophasor protocols but it can also “generate” data in this format as well. In your above example you are using the buffer as provided for “parsing” to validate the CRC - you are then also using the “generation” properties (i.e., this.BinaryImage and this.BinaryLength) to create an image and then validate the same CRC. However, simply because the protocols were designed to be bi-directional, not all of them have been thoroughly tested. Typically we only use either IEEE C37.118 or BPA PDCstream in “generate” mode - looks like you found an issue here! The problem is that the generation method in the “ConfigurationFrame” of IEEE 1344 was coded using CRC16 and the parsing is using CRC-CCITT (which is done correctly) - this has been corrected as of change number 42489. Thanks for finding this!

 

4. Asynchronous Sending Problems

Hoping that Pinal’s answer to this query (above) has resolved this issue. To summarize, only thing happening asynchronously here is egress from “PublishFrame” method.

 

5. Extension Problem

Reusability of the synchrophasor parsing classes is open and accomplished through deriving from Stream which means data can from any source. However, like you mention the class “MultiProtcolFrameParser” is a closed by using the protocol enumeration. My only defense is that this particular class exists only to simplify consumption of phasor protocols by combining typical network layer and transports into one class using enumerations - as well as to provide a good example on how to use these libraries. Using this class is not required to use the phasor protocols, but it does mean the consumer needs manage their own transport layer as well as creating the protocol specific parsing classes to do otherwise.

That being said I am excited that you are considering integrating GridStat as a communications layer into the openPDC. Since GridStat is not really a phasor protocol in itself (i.e., it can “carry” phasor data but is not a standard phasor protocol) - I would not recommend extending the phasor protocol library to accommodate the GridStat system. Instead, I recommend writing one or more “extensions” to the openPDC. Extending the openPDC is handled by directly extending “InputAdapterBase” (to map incoming data to measurements), “ActionAdapterBase” (to time-align and process desired measurements) or “OutputAdapterBase” (to queue measurements for archival) - see Creating a Custom Adapter.

You could simply write a “GridStatInputAdapter” to allow the openPDC to receive time-series style data from GridStat as well as a “GridStateOutpuAdapter” to allow the openPDC to publish time-series data to GridStat. I think this would accomplish the “integration” mission and both adapters could live in a single assembly (e.g., GridStatAdapters.dll). The openPDC Synchrophasor source code includes a “MySQLAdapters” project which can be used as a template of how to write an InputAdapter and an OutputAdapter that would operate in a similar fashion.

 

Thanks!

J. Ritchie Carroll

Developer
Mar 19, 2010 at 1:29 AM

Hello Pinal:

 first I am sorry to reply to you so late. there is some problems with my laptop, which make me unable to access the internet for a couple of days.

 Thank you very much for the information you provided. Based on those information, I will re-write some of my codes, to post multiple asynchronous IO operation at the same time, to achieve a higher concurrency.

Developer
Mar 19, 2010 at 1:53 AM

first, I am sorry to reply to you so late. My laptop is broken, making me unable to receive e-mails for a couple days.

and thank you very much for your praise. We should thank you and the whole openPDC team to make openPDC such a wonderful achievement, as well as make it open source, to help a lot more people.

But for "PhasorDataConcentratorBase", I still think that there is race condition. I have noticed that the derived class doesn't use the dictionary of the frame, so I agree with you that there is no race condition on that dictionary, so there is no need to lock and unlock the dictionary.  But the frame itself is also shared by multiple threads:

1. if there are multiple input adapters receiving data from different devices, each input adaper will run on its own thread, and try to put its newly-received measurements into that frame, so there is race condition among multiple "writers".

2. there is another thread to call "PublishFrame". "PublishFrame" will read the content of a frame and publish it out, so there is race condition among the only one "reader" and multiple "writers".

so, the race condition doesn't occur on that dictionary, but on the other collections within the frame, such as the "PhasorValueCollection", "DigitalValueCollection", "AnalogValueCollection". these collections are all shared among multiple threads, so synchronization is still needed, in my opinion.

I am trying to extend openPDC. In our system, the data for openPDC to process is not PMU measurements, but substation level SE result. And the data will be transmitted by GridStat middleware. I hope my work can make contributions to openPDC later.

Thanks.

Coordinator
Mar 21, 2010 at 4:29 AM
Edited Mar 21, 2010 at 5:05 AM

I see what you are saying - I assume your concern lies in the "AssignMeasurementToFrame" of the "PhasorDataConcentratorBase" where there is no synchronization applied to local collections and/or at a higher layer because of multi-thread access. As a matter of the "threading canon tenets", you are correct in your thinking: access to member collections by multiple threads should be synchronized. In this case however, because of the nature of the incoming data and its intended use I made an exception based on two points: (1) the collection lengths will not be modified dynamically - a primary concern with cross-thread synchronization on collections, and (2) the relationships to the measurements and their destinations in the frame (i.e., the locations in the collections) are basically one-to-one - that is, only one measurement will ever be assigned to a given location in the frame (i.e., collection index) - hence the concern of the wrong value ending up in the frame is alleviated.

Regardless, I will carefully look through this code again and make sure that this will continue to hold true in all scenarios. If I choose not to lock as an optimization I will clearly document this in the code as intentional (which I should have done in the first place).

Thanks!
Ritchie

Coordinator
Apr 6, 2010 at 3:22 PM

I have checked in some changes that should address your concerns - I would appreciate it if you can take a look again sometime. Specifically, I did some testing related to locking the frame measurements in "AssignMeasurementToFrame" in the "PhasorDataConcentratorBase" and it did not seem to adversely affect performance - so this lock should elimate the potential race condition you thought may exist. Additionally, "MultiProtocolFrameParser" is now marked as "sealed" - so it should be obvious that this particular class was not designed to extend.

Thanks again for your research into the openPDC.

Ritchie

Developer
Apr 6, 2010 at 10:30 PM

Hi Ritchie:

I have checked out the source codes and noticed your changes.

I am very honored that my advice has been taken such seriously, and I am also very happy that my advice can be helpful to openPDC.

My colleague, Xing Liu, Dr. Venkatasubramanian's student, has paid a visit to TVA to accept the training that how to use openPDC. After he came back, he gave a lot of praise to your team. I hope that I will have the same opportunity later on.

Thank you very much.

Developer
Jun 8, 2010 at 10:23 PM

Hi pinalpatel:

    Last time I report that I met a problem that openPDCConsole cannot connect to openPDC which binds and listens on a IPV6 IPEndpoint. Though I tried the methods in your reply, it still cannot work, and I just modify the source codes and let openPDC always listen on IPV4 addresses. Recently I re-install OS on my laptop, and I found the real reason of that problem.

    I have told you before that the most strange thing is that openPDC just works well at first, and suddenly one day, openPDC cannot be connected. This is because I modify the "hosts" file under "C:\Windows\System32\drivers\etc" and insert a line "127.0.0.1       localhost" at the bottom.

    And this newly added line affects the method "Dns.GetHostEntry(hostNameOrAddress)", which is used by "TVA.Communication.Transport.CreateEndPoint". After I changed the "hosts" file, since "Socket.OSSupportsIPv6" is still true on Windows7, then openPDC will bind and listen on "IPAddress.IPv6Any". But when openPDCConsole wanted to connect the address "localhost", DNS will return "127.0.01", which is a IPV4 address, not the IPV6 version "::1", so cause the connection failure.

   It seems that this problem is rather rare, because Windows7 forbids the user to modify the "hosts" file. Since I have some experience to develop programs under Linux, so I just modify that file as I always do under Linux, and then cause that strange problem.

  Anyway, thank you for your instruction, and I hope my experience can help to avoid similiar problems that may happen on other people.