Optimizing XSLT to boost your BizTalk map performance
Who hasn’t had this discussion when working with BizTalk? Should you use the built-in BizTalk mapper or go with XSLT instead. For me, it all depends which situation I am in. If there are a lot of BizTalk developers, I would say it’s a no brainer and go with XSLT. But if I have to educate people at a client to work with BizTalk in the first place, people who will only occasionally develop or bugfix stuff? Then it’s almost impossible to use XSLT. The learning-curve for BizTalk itself is pretty steep and it will not help if you are introducing yet another technology. And BizTalk’s mapper is often one of the reasons such a client choose the product. It just makes life so much easier compared to other products out there.
Maps generated in BizTalk are inefficient
If you use the BizTalk built-in mapper, it will generate the XSLT for you, which is awesome. It’s a powerful tool, which can save you a lot of time if you are developing transformations. It can even solve more complex problems pretty easily. And if some problems are too complex, you can use the scripting functoids to use custom C# code or XSLT.
But there’s a catch. The XSLT that is being generated is trying to solve your problem in a generic way. Alltough it tries the best it can, it often leads to a lot of unnecessary for-each loops and if-statements and scripts in the XSLT being generated. This isn’t a problem when you are only handling small to medium complex files, but when files get bigger and more complex, performance will drop drastically and your map will end up using a lot of CPU-power. Power you could rather use for other processes running in your BizTalk environment.
Boost the performance of your BizTalk maps with XSLT
However, we can’t blame BizTalk for this behavior. It can only help you the best it can in a generic way, since it doesn’t know what problem you are trying to solve exactly. In itself it’s pretty amazing it generates the XSLT for you. The only way to circumvent this behavior is by taking full control and writing your own XSLT, which can be time-consuming. Especially people new to XSLT will have to get used to developing XSLT.
But here’s a tip (if you didn’t already know it): First build the the basis of your map in the BizTalk mapper, then validate the map and use the generated XSLT as your starting point. You can find the link to the XSL-file in the output window:
Optimizing XSLT performance
This of course will not ensure your map will perform any faster. It will only give you BizTalk’s interpretation of your map, in XSLT. It’s a starting point, which saves you a lot of typing. Step 2 is to optimize the XSLT to improve the performance.
First, get rid of all the unnecessary or inefficient for-each loops. You will notice that the generated XSLT sometimes generates a loop in a loop or it will loop over several different fields. By getting rid of unnecessary loops, you will boost performance instantly.
It doesn’t take a genius to see that the code below isn’t that efficient and could use some refactoring.
<xsl:for-each select="Headers/Header | Lines/Line"> <xsl:for-each select="Item"> <Item> <ItemNo> <xsl:value-of select="ItemNo" /> </ItemNo> <xsl:variable name="var:v1" select="ItemNo" /> <xsl:variable name="var:v2" select="userCSharp:InitCumulativeMax(0)" /> <xsl:for-each select="//Discount"> <xsl:if test="LinkedItemNo = $var:v1"> <xsl:variable name="var:v3" select="userCSharp:AddToCumulativeMax(0,string(Amount),"1")" /> </xsl:if> </xsl:for-each> <xsl:variable name="var:v4" select="userCSharp:GetCumulativeMax(0)" /> <Discount> <xsl:value-of select="$var:v4" /> </Discount> </Item> </xsl:for-each> </xsl:for-each>
XSLT is stateless, i.e. it will evaluate your xpath queries time and time again. This means that many evaluations or wrong evaluations can have a performance impact. There are a few famous rules of thumb that roam the internet. Microsoft also has some recommendations. I will try and explain them below.
Avoid using “//item” too often
The bigger and more diverse your XML document gets, the more inefficient this is. Use more explicit lookups instead.
<!-- DON'T USE THIS --> <xsl:value-of select="//item/itemno" /> <!-- USE THIS --> <xsl:value-of select="items/item/itemno" />
Don’t evaluate the same node-set more than once, but save it in a variable instead
Especially in for-each loops, you shouldn’t evaluate the same node each time. This can have a huge performance impact. Each evaluation will perform another loop through the entire nodeset again and again, thus you will end up not having a loop that takes n-times, but a loop that takes n-times-n.
Think of what you are evaluating and where you are performing the evaluation.
<!-- DON'T USE THIS --> <xsl:for-each select="//item[headerid='1234']"> <nodecount><xsl:value-of select="count(//item[headerid='1234'])" /></nodecount> </xsl:for-each> <!-- USE THIS --> <xsl:variable name="myNodeset" select="//item[headerid='1234']" /> <xsl:variable name="myNodesetCount" select="count($myNodeSet)" /> <xsl:for-each select="myNodeset"> <nodecount><xsl:value-of select="$myNodesetCount" /></nodecount> </xsl:for-each>
Attributes are faster than elements
So if you are in charge of creating the XML specification, try and use attributes where possible.
Scripts downgrade performance
Maybe you can get rid of some scripts in the generated XSLT. E.g. the CumulativeSum script that BizTalk generates can also be achieved by using the built-in sum() function.
As do calls to templates (xsl:call-template)
If you are just doing something once, don’t use the xsl:call-template and perform the logic inline instead.
Compare directly by name, instead of using “local-name()”
I sometimes tend to use the “local-name()” XSLT function to avoid using the namespace in the xpath, but be aware that it’s slower than using the name directly.
<!-- DON'T USE THIS --> <xsl:value-of select="*[local-name()='item']" /> <!-- USE THIS --> <xsl:value-of select="ns1:item" />
Avoid complex patterns in template rules. Instead, use within the rule.
The rule is located in the match attribbute of the <xsl:template>. The template will be executed if a rule is matched. Avoid complex rules and just put your logic within the template. Even selecting a node instead of the root of the document has a performance impact.
<!-- DON'T USE THIS --> <xsl:template match="//item[headerid='1234']"> </xs:template> <!-- USE THIS --> <xsl:template match="/rootnode"> <xsl:variable name="myItem" select="items/item[headerid='1234']"> </xs:template>
So the best way to go anyway, is to just inline most of your templates:
<!-- DON'T USE THIS --> <xsl:template match="//item"> <your logic ... /> </xs:template> <!-- USE THIS --> <xsl:template match="/"> (...) <xsl:for-each select="item"> <your logic ... /> </xsl:for-each> </xs:template>
Avoid <xsl:number> if you can. Use position()
Sometimes you can’t avoid using <xsl:number>, but avoid it if you can. If you just want the current iteration of the node within the nodeset, use position(). It’s way faster.
<!-- DON'T USE THIS --> <xsl:for-each select="//item"> <pos> <xsl:number /> </pos> </xsl:for-each> <!-- USE THIS --> <xsl:for-each select="//item"> <pos> <xsl:value-of select="position()" /> </pos> </xsl:for-each>
Use <xsl:key>, for example to solve grouping problems
We can’t even solve the whole grouping-thing within BizTalk’s mapper, so we need to add XSLT to the map if we want to achieve grouping. Read more on how to do grouping in Sandro’s Mapping Patterns and Best Practices book. You can also find his code samples over here: https://code.msdn.microsoft.com/windowsdesktop/Muenchian-Grouping-and-790347d2
Be careful when using the preceding[-sibling] and following[-sibling] axes
This often indicates an algorithm with n-squared performance and that’s quite a performance impact.
Don’t sort the same node-set more than once. If necessary, save it as a result tree fragment and access it using the “node-set” extension function
Sorting at itself is an intensive process. If you are using XSLT or C#. Try to sort once, or just try and make sure data is sorted at the source.
So if you want to increase performance of your BizTalk maps, you should choose to go the XSLT route and take the things I’ve just mentioned into account. I’ve recently had a large file that took about 2 hours to process, but with some XSLT-tweaking I was able to bring it down to just 7 minutes. Now, that’s some impressive performance increase and a good way to satisfy your boss 🙂
And while we are on the subject of mapping… I just want to mention Sandro’s Mapping Patterns and Best Practices book again. It’s is a must-read for everyone developing in BizTalk. It will also help even more advanced BizTalk developers to solve difficult issues within maps.
Hopefully this helps some of you guys out there who need to speed-up their transformations! And I am sure Google will have some more tips as well!
Sources:
Great read Rob!
Any experiences on tools for benchmarking XSLT?
Hi Jeroen, thanks! And haven’t looked into that. The C# Stopwatch() helped me a lot with benchmarking several tranforms 🙂 But it’s a good thing to look into next indeed! Could be really helpful.
So can we improve this solution for transformations that needs to deal with a large number of nillable elements?
Hi Edwardo. It totally depends on what you are trying to do here and what BizTalk’s mapped generated.
If it’s a straightforward mapping, BizTalk may already generate rather xslt that’s rather ok. With more complex mappings making your own xslt gets more and more interesting. Maybe you can iterate through all elements that aren’t marked as “nil” for instance.