Hello Everybody,
If you aren't familiar with me I will start with a little introduction. I am a Principal Technical Support Analyst for Infor, I have been supporting Infor Process Automation Support for about 6 years now. As part of my daily activities I come across a variety of process flows designed and developed by a wide range of people, as such I see various techniques employed to achieve certain tasks. I have recently been coming across a variety of flows that are leveraging JavaScript to build output files so I figured I would come here and see if I could reach a few people who do this and provide what is hopefully some sound advice that will benefit someone here.
If you have a flow that is using JavaScript variables to build an output file you have come to the right post. If your flow contains a loop or is dealing with more than a thousand rows of data please keep reading:
outputFileData += field1,field2,field3,field4 <<<< This is an extremely dangerous flow technique which can have severe impact on the health of your system.
Why?
The why is easy; outputFileData is an immutable string. This means every time I assign it a new value in my loop; I create a new object in memory to hold it's new value and I throw the old copy out. The JVM / LPA Grid Node; must find/mark/delete every object I throw away. Running a test here on my server I did this in an assign node:
var mystring = "XXXXXXXXXXX" << 1024 X's
for (i=1;i<99999;i++) {
outputFileData += mystring
}
This creates 99,999 versions of outputFileData growing by 1024 characters each time; I then ran this flow on my test system as a base test. The result was it ran for 5 minutes and 30 seconds. More importantly when I ran 8 of them at the same time ( my test server had 8 logical processors ) ... each of the 8 flows took around 27 minutes to complete.
TECH NOTE: IPA Workunits have CPU affinity; meaning a flow runs on a single processor. So if I run 1 and it takes 5 minutes shouldn't running 8 result in 8 workunits taking 5 minutes? The reason it did not; was a single flow took my CPU usage up to 60% and running 8 of them pegged my CPU core to 100%. The reason is the JVM/LPA Node's Garbage Collector is not single threaded ... the garbage collection threads run across all available CPU and a single workunit was capable of generating enough garbage the system could barely keep up. As soon as I ran more than 1 at the same time my workunits were now competing with the garbage collection threads thus slowing everything down.
The fix?
To fix a flow like this; where possible I recommend loading the data from a file, using a Data Iterator with Input Mode=File to process the data line by line then use an assign node inside it to: [var1,var2,var3] = DataIterator_outputData.split(",") this splits the data into single variables. I then use a Message Builder to build my output data ... and every 100 to 1000 times I write to the Message Builder I output/append a file and clear my message builder.
The reason this works is:
- the message builder uses a stringBuilder which makes it mutable. That means I don't create a new object and throw the old one out every time I add to it. This is a system saver please use this technique whenever possible ... and if you are building output files periodically appending the data is great because it also keeps pinned/live memory low.
If you are in a scenario where you can not use this technique and must use JavaScript then:
Create a StringBuilder class in your JavaScript
In an assign node at the start of your flow put the following:
// Initializes a new instance of the StringBuilder class
// and appends the given value if supplied
function StringBuilder(value)
{
this.strings = new Array("");
this.append(value);
}
// Appends the given value to the end of this instance.
StringBuilder.prototype.append = function (value)
{
if (value)
{
this.strings.push(value);
}
}
// Clears the string buffer
StringBuilder.prototype.clear = function ()
{
this.strings.length = 1;
}
// Converts this instance to a String.
StringBuilder.prototype.toString = function ()
{
return this.strings.join("");
}
Now that you have that loaded in your flow; instead of outputFileData += field1, field2 do this:
var outputFileData = new StringBuilder(); <<< create the variable
outputFileData.append(field1,field2,"text"); <<< to add data to the variable
outputFileData.toString(); <<<< to write out the full contents of the variable (aka in your file access node)
Result of my previous test:
var mystring = "XXXXXXXXXXX" << 1024 X's
var outputFileData = new StringBuilder();
for (i=1;i<99999;i++) {
outputFileData.append(mystring);
}
var OutputStringData = outputFileData.toString();
The previous result was 5minutes for a single flow by itself; 27 minutes with 8 running simultaneously.
The result of this test using the String Builder:
Single Flow Running by itself: 7seconds
Eight Flows Running Simultanesously: 20-40 seconds
My CPU never went above 30%; the LPA Node/JVM looked much healthier and everything ran much much faster. This performance is what you should get with DataIterator(filemode) > Assign to split > MessageBuilder to build output. They should be very close; and your system administrator will thank you. I am still creating a large object in pinned memory with this technique so I would prefer the DataIterator>MessageBuilder option so I can clear out my data while appending to a file periodically ... but this is significantly better than "+="