Archive for the 'Sample code' Category

Efficient grouping and debatching of big files using BizTalk 2006

Wednesday, July 16th, 2008

I’ve seen people struggle both on the forums and while doing consulting when in it comes to finding an good way of grouping and transforming content in file before debatching it. Say for example we have a text file like the example below.

0001;Test row, id 0001, category 10;10
0002;Test row, id 0002, category 10;10
0003;Test row, id 0003, category 10;10
0004;Test row, id 0004, category 20;20
0005;Test row, id 0005, category 20;20
0006;Test row, id 0006, category 20;20
0007;Test row, id 0007, category 20;20
0008;Test row, id 0008, category 10;10
0009;Test row, id 0009, category 10;10
0010;Test row, id 0010, category 30;30

Notice how the the ten rows belong to three different categories (10,20 and 30). These kind of export are in my experience quite common batch export from legacy systems and they usually aren’t ten rows (in my last project the sizes ranged from 5 MB to 25 MB) …

The problem

The problem is that the receiving system expects the data to be in separate groups, grouped by the categories the rows belong to. The expected message might look something like the below for category 10 (notice how all rows within the group are from category 10)

<ns1:Group numberOfRows="5" xmlns:ns1="http://Blah/Group">
  <Row>
    <Id>0001</Id>
    <Text>Test row, id 0001, category 10</Text>
    <Category>10</Category>
  </Row>
  <Row>
    <Id>0002</Id>
    <Text>Test row, id 0002, category 10</Text>
    <Category>10</Category>
  </Row>
  <Row>
    <Id>0003</Id>
    <Text>Test row, id 0003, category 10</Text>
    <Category>10</Category>
  </Row>
  <Row>
    <Id>0008</Id>
    <Text>Test row, id 0008, category 10</Text>
    <Category>10</Category>
  </Row>
  <Row>
    <Id>0009</Id>
    <Text>Test row, id 0009, category 10</Text>
    <Category>10</Category>
  </Row>
</ns1:Group>

The problem is now that we need to find a efficient way of first grouping the incoming flat file based message and then to debatch it using those groups. Our ultimate goal is to have separate messages that groups all rows that belongs to the same category and then send these messages to the receiving system. How would you solve this?

I’ve seen loads of different solution involving orchestrations, databases etc, but the main problem they all had in common is that they’ve loaded up to much of the message in memory and finally hit an OutOfMemoryException.

The solution

The way to solve this is to use pure messaging as one of the new features in BizTalk 2006 is the new large messages transformation engine.

Large message transformation. In previous versions of BizTalk Server, mapping of documents always occurred in-memory. While in-memory mapping provides the best performance, it can quickly consume resources when large documents are mapped. In BizTalk Server 2006, large messages will be mapped by the new large message transformation engine, which buffers message data to the file system, keeping the memory consumption flat.

So the idea is the to read the incoming flat file, use the Flat File Disassembler to transform the message to it’s XML representation (step 1,2 and in the figure below) and the to use XSLT to transform in to groups (step 4 and 5). We will then use the XML Disassembler to split those groups into separate messages containing all the rows within a category (step 6 and 7).

GroupingFlow2

Step 1, 2 and 3 are straight forward and pure configuration. Step 4 and 5 will require some custom XSLT and I’ll describe that in more detail in the section below.  Step 6 and 7 will be discussed in the last section of the post.

Grouping

Let’s start by looking at a way to group the message. I will use some custom XSLT and a technique called the Muenchian method. A segment from the XML representation of the flat file message could look something like this.

<Rows xmlns="http://Blah/Incoming_FF">
    <Row xmlns="">
        <ID>0001</ID>
        <Text>Test row, id 0001, category 10</Text>
        <Category>10</Category>
    </Row>
    <Row xmlns="">
        <ID>0002</ID>
        <Text>Test row, id 0002, category 10</Text>
        <Category>10</Category>
    </Row>
...
[message cut for readability]

The XSLT will use could look something like the below. It’s kind of straight forward and I’ve tried commenting the important parts of in the actual script. Basically it will use keys to fins the unique categories and then (again using keys) selecting those rows within the category to loop and write to a group.

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:ns1="http://GroupAndDebatch.Schemas.Incoming_FF"
                xmlns:ns2="http://GroupAndDebatch.Schemas.Grouped"
                >
    <!--Defining the key we're gonna use-->
    <xsl:key name="rows-by-category" match="Row" use="Category" />

    <xsl:template match="/ns1:Rows">
        <ns2:Groups>

        <!--Looping the unique categories to get a group for-->
        <xsl:for-each select="Row[count(. | key('rows-by-category', Category)[1]) = 1]">

            <!--Creating a new group and set the numberOfRows-->
            <Group numberOfRows="{count(key('rows-by-category', Category))}">

            <!--Loop all the rows within the specific category we're on-->
            <xsl:for-each select="key('rows-by-category', Category)">
                <Row>
                    <ID>
                        <xsl:value-of select="ID"/>
                    </ID>
                    <Text>
                        <xsl:value-of select="Text"/>
                    </Text>
                    <Category>
                        <xsl:value-of select="Category"/>
                    </Category>
                </Row>
            </xsl:for-each>
            </Group>
        </xsl:for-each>
        </ns2:Groups>
    </xsl:template>

</xsl:stylesheet>

noteYou have found all the XSLT and XML related features in Visual Studio – right?

Ok, so the above XSLT will give us a XML structure that looks some like this.

<?xml version="1.0" encoding="utf-8"?>
<ns2:Groups xmlns:ns2="http://Blah/Groups" xmlns:ns1="http://Blah/Group">
    <ns1:Group numberOfRows="5">
        <Row>
            <ID>0001</ID>
            <Text>Test row, id 0001, category 10</Text>
            <Category>10</Category>
        </Row>
        <Row>
            <ID>0002</ID>
            <Text>Test row, id 0002, category 10</Text>
            <Category>10</Category>
        </Row>
        <Row>
            <ID>0003</ID>
            <Text>Test row, id 0003, category 10</Text>
            <Category>10</Category>
        </Row>
        <Row>
            <ID>0008</ID>
            <Text>Test row, id 0008, category 10</Text>
            <Category>10</Category>
        </Row>
        <Row>
            <ID>0009</ID>
            <Text>Test row, id 0009, category 10</Text>
            <Category>10</Category>
        </Row>
    </ns1:Group>
    <ns1:Group numberOfRows="4">
        <Row>
            <ID>0004</ID>
            <Text>Test row, id 0004, category 20</Text>
            <Category>20</Category>
        </Row>
...
[message cut for readability]

Finally! This we can debatch!

Debatching

Debatch the Groups message above is also rather straight forward and I won’t spend much time on in this post. The best way to learn more about it is to have a look ate the EnvelopeProcessing sample in the BizTalk SDK.

And the end result of the debatching are single messages within a unique category, just as the receiving system expects! Problem solved.

Issue #1 – slow transformations

The first time I’ve put a solution like this in test and started testing with some real sized messages (> 1 MB) I really panicked, the mapping took forever. And I really mean forever, I sat there waiting for 2-3 hours (!) for a single file getting transformed. When I had tested the same XML based file in Visual Studio the transformation took about 10 seconds so I knew that wasn’t it. With some digging here I found the TransformThreshold parameter.

TransformThreshold decides how big a message can be in memory before BizTalk start buffering it to disk. The default value is 1 MB and one really has to be careful when changing this. Make sure you thought hard about your solution and situation before changing the value – how much traffic do you receive and how much of that can you afford reading in to memory?

In my case I received a couple of big files spread out over a night so setting parameter with a large amount wasn’t really a problem and that really solved the problem. The mapping finished in below 10 minutes as I now allow a much bigger message to be read into memory and executed in memory before switching over to the large message transformation engine and start buffering to disk (which is always much slower).

Problem #2 – forced to temp storage

Looking at the model of the data flow again you probably see that I’m using the XML Disassembler to split the grouped files (step 5 to step 6).

GroupingFlow3

The only way I’ve found this to work is actually to write the Grouped XML message to file and the to read that file in to BizTalk again and in that receive pipeline debatch the message. Not the most elegant solution, but there really isn’t a another out-of-the-box way of debatching messages (the XML Assembler can’t do it) and I don’t want to use an orchestration to execute the a pipeline as I want to keep the solution pure messaging for simplicity and performance reasons.

Finishing up

Have you solved similar cases differently? I’d be very interested in your experience! I also have a sample solution of this – just send me an email and make sure you’ll get it.

Update

Also don’t miss this issue (pdf) of BizTalk Hotrod magazine. There is an article on “Muenchian Grouping and Sorting using Xslt” describing exactly the problem discussed above.

Writing BizTalk context properties to a message from a WCF service using behaviors

Thursday, April 24th, 2008

The new WCF adapter in BizTalk 2006 R2 offers a lot of new possibilities. One of those is to write data to the BizTalk Message context properties directly from an exposed WCF Service. A practical use of this technic could be to write the username from the Windows credentials of the calling client into the context of the BizTalk message. This could be useful as this information is encrypted in messages that are received via the WCF adapter and isn’t possible to read when inside BizTalk. I’ll try and demonstrate the technique in this post.

If you have used the SOAP adapter before you might know that all you had to do was to turn on Windows based security for the exposed SOAP service and the username was automatically promoted to the context of the incoming BizTalk message. That username could then be used for routing, tracking which user called the service or using the value in plain text when communicating further to other connected systems. However using the WCF adapter this is not true anymore – when using the new WCF Message Security model the username and password is encrypted in the message and once the message is received by BizTalk it’s to late to read it. Basically we have to read the username in the actual service and write it into our own context property (that doesn’t get encrypted).

One way of achieving this is to read the username in the service and then to add it to the WCF Message Headers. All WCF message headers will by default be written to a the BizTalk Message context property called InboundHeaders (in the http://schemas.microsoft.com/BizTalk/2006/1/Adapters/WCF-properties namespace). First we’ll create an EndpointBehavior that will use a MessageInspector to add the username to the message header.  Finally we create BehaviorExtensionElement so we can use a WCF Custom Binding in BizTalk and configure it to add our new behavior.

Creating the new EndpointBehavior

To create the configurable behavior we’ll need the three classes we mentioned above.

  1. A class that implements the IDispatchMessageInspector interface to handle to reading and writing to the actual message.
  2. A class that implements the IEndpointBehavior interface to define what kind of endpoint we’re creating and what it should do.
  3. A class that implements the BehaviorExtensionElement abstract class to  create the behavior and make it configurable.
using System; using System.Collections.Generic; using System.Text; using System.ServiceModel; using System.ServiceModel.Channels; using System.ServiceModel.Dispatcher; using System.ServiceModel.Description; using System.ServiceModel.Configuration; namespace CustomWCFProperties.Behavior { /// <summary> /// PromoteUserNameMessageInspector implements IDispatchMessageInspector and adds the name from the WindowsIdentity to a WCF header called WindowsUserName in the http://CustomWCFProperties.Schema namespace. BeforeSendReply only returns as we're not interested in handling the response. /// </summary> public class PromoteUserNameMessageInspector : IDispatchMessageInspector { #region IDispatchMessageInspector Members public object AfterReceiveRequest(ref System.ServiceModel.Channels.Message request, System.ServiceModel.IClientChannel channel, System.ServiceModel.InstanceContext instanceContext) { string windowsUserName = ServiceSecurityContext.Current.WindowsIdentity.Name; request.Headers.Add(MessageHeader.CreateHeader("WindowsUserName", "http://CustomWCFProperties.Schema", windowsUserName)); return null; } public void BeforeSendReply(ref Message reply, object correlationState) { return; } #endregion } /// <summary> /// PromoteUserNameBehavior implements IEndpointBehavior and adds a message inspector to the dispatch behavior. Doesn't use any binding parameters, doesn't validate any configuration etc and can't be used in a client (only in a service). /// </summary> public class PromoteUserNameBehavior : IEndpointBehavior { #region IEndpointBehavior Members public void AddBindingParameters(ServiceEndpoint endpoint, System.ServiceModel.Channels.BindingParameterCollection bindingParameters) { return; } public void ApplyClientBehavior(ServiceEndpoint endpoint, System.ServiceModel.Dispatcher.ClientRuntime clientRuntime) { throw new Exception("The method or operation is not implemented."); } public void ApplyDispatchBehavior(ServiceEndpoint endpoint, System.ServiceModel.Dispatcher.EndpointDispatcher endpointDispatcher) { endpointDispatcher.DispatchRuntime.MessageInspectors.Add(new PromoteUserNameMessageInspector()); } public void Validate(ServiceEndpoint endpoint) { return; } #endregion } /// <summary> /// Defines the behavior. /// </summary> public class PromoteUserNameBehaviorElement : BehaviorExtensionElement { protected override object CreateBehavior() { return new PromoteUserNameBehavior(); } public override Type BehaviorType { get { return typeof(PromoteUserNameBehavior); } } } }

Finally we have to sign the assembly using a strong key and add it to the GAC.

Configure the machine.config

As we need BizTalk and the WCF adapter to pick up the need behavior and make it possible to configure our receive port we need to to add the behavior element to the machine.config. The easiest way of doing this is to use the new WCF Service Configuration Editor tool and point to the machine.config file.

PromoteUserNameBehavior GAC

After the dll been added and the machine.config file has been saved the the line below should have been added to the <behaviorExtensions> element (that is if you use the same strong name key as in the sample project I’ve linked here).

<add name="addCustomWCFProperties" type="CustomWCFProperties.Behavior.PromoteUserNameBehaviorElement, AddCustomWCFPropertiesBehavior, Version=1.0.0.0, Culture=neutral, PublicKeyToken=705e34637fdffc54" />

Create the BizTalk Receive Port and Receive Location

Next thing to do is to start the BizTalk WCF Service Publishing Wizard. Choose to publish a service endpoint and make sure you enable metadata and create a receive location. In this example we’ll next choose to "Publish schemas as WCF service" and then define our service by naming service operations and so on.

When you then browse to the URL you choose to publish your service to you’ll see the nice example of how to instance the service you just defined.

WSDL code example

If we then send a request message to service (you’ll find a client as part of the attached solution here) and inspect the message and its context properties in BizTalk we’ll see that the username of the calling client is nowhere to be found.

Message No Username

Configure a WCF-Custom binding and adding a Endpoint Behavior

To add the username to the message context we’ll need to add our newly created behavior to our service. We’ll do this by switch the service over to use a WCF-Custom binding to enable configuration. We then need to add the URL in the address field, define the binding type to a wsHttpBinding and to add our addCustomeWCFProperties behavior to the list of endpoint behaviors.

Add Endpoint behavior

note  NOTE: there is a limitation in the BizTalk WCF implementation in that you can’t create the WCF-Custom receive location that uses a HTTP in-process based binding (like the wsHttpBinding used in a WCF-Custom endpoint is) first and then use the WCF Publishing Wizard to only publish a metadata endpoint.

Richard Seroter writes about it here and I found the same thing to be true.

"This error doesn’t have to do with mixing MEX endpoints and “regular” endpoints in the same IIS web site, but rather, creating MEX endpoints for in-process HTTP bindings seems to trigger this. Note that an IIS-hosted MEX endpoint CAN be created for IIS-hosted HTTP endpoints, but not for in-process hosted HTTP endpoints."

If you however choose a different binding that Http or (as in this case) publishes the metadata first and then switches over to a custom binding you’re ok.

If we then post another message to the service and inspect the message we’ll see that the behavior actually added a header and that it’s part of our BizTalk context properties. The adapter is also smart enough to know that this header isn’t part of the original headers and therefore stores in it’s own field within the context properties (you’ll find as part of the InboundHeaders block as well).

Message Username

One problem remains – the actual value of the user is nested inside a XML node and the property isn’t promoted. 

Extract and promote the value

To extract and promote the value we use an old fashion pipeline component using the following code in the execute method (the complete project is part of the downloadable sample project).

public IBaseMessage Execute(IPipelineContext pc, IBaseMessage inmsg) { StringReader reader = new StringReader(inmsg.Context.Read("WindowsUserName", "http://CustomWCFProperties.Schema").ToString()); if (reader != null) { XPathDocument document = new XPathDocument(reader); XPathNavigator navigator = document.CreateNavigator(); string value = navigator.SelectSingleNode("/").Value; inmsg.Context.Promote("WindowsUserName", "http://CustomWCFProperties.Schema", value); } return inmsg; }

All the component does is reading the XML node the value exists inside and then it reads the actual value. Finally it writes the value back and promotes it. To be able to promote the value we also have to have a Property Schema deployed with a corresponding property name and namespace (WindowsUser and http://CustomeWCFProperties.Schema in this case).

The end results looks something like this.

Message Promoted Username

The username is extracted and promoted and available for example for tracking or to for example use in a routing scenario.

This technique could of course be used for all kinds of scenarios where you like to add information to the context properties and could potentially replace a lot of the classic scenarios for custom pipelines.

All kind of comments are of course more than welcome!

Download the sample solution  here.