Sunday, February 17, 2013

Memorizing method results with PostSharp (part 1)

Now regularly when I require a new functionality of some sort, and I try to think of the problem in general, outside of the current problem context, I run into the need of caching the result of a specific method, in order to speed up the code if someone (even me) would decide to use it extensively.

The problem

I mean, how may times did you see a code like this in your life:
    public class SomeClass
    {
        private static Dictionary<Tuple<Argument1Type, Argument2Type>, MyResultType> someExpensiveMethodsomeMethodResultCache = new Dictionary<Tuple<Argument1Type, Argument2Type>, MyResultType>();

        public MyResultType SomeExpensiveMethod(Argument1Type arg1, Argument2Type arg2)
        {
            var key = Tuple.Create(arg1, arg2);
            if (!someExpensiveMethodsomeMethodResultCache.ContainsKey(key))
            {
                MyResultType result = null;
                // fill result;
                [...]
                someExpensiveMethodsomeMethodResultCache.Add(key, result);
            }
            return someExpensiveMethodsomeMethodResultCache[key];
        }

    }

Now, obviously you could cache the result not statically, but in an object cache, or any other way, but the bottom line is, it seems to be quite a lot of code and responsability for a functionality, that i would much prefer just to 'attach' to the method, right?

The goal

As we all know, there is a mechanism in the .Net framework, to 'attach' functionality to the code elements, and they are called attributes.
Now what i'd really like to do, is say simply:
    public class SomeClass
    {
        [CacheResult(CacheLocations.Static)]
        public MyResultType SomeExpensiveMethod(Argument1Type arg1, Argument2Type arg2)
        {
            MyResultType result = null;
            // fill result;
            [...]
            return result;
        }
    }
Well, i do think so, so lets try to create an attribute like that.

The solution

Now one major thing to understand here is, that we want to interfere with the normal execution of the program, and hijack it. Basically, if we already have the cached verion, we just return that, and do not execute the code again.
Here is where a new concept comes in called aspects.
If you are not familiar with the term aspects, you should read more about it on Wikipedia here, or here, or just google for it!! It is a truly exceptional field, and can do really amazing things, for example what we are doing here :)
Now a very nice and tool for such things is PostSharp, which has a free license for 1 developer. It is a limited edition, but it gets you enough to accomplish what we want here.

A few words on PostSharp

Well, it is useful to understand how PostSharp works, and what can it do for you. PostSharp is basically a library, that gets into the build pipeline. Once your code is compiled to IL, PostSharp will go through the compiled code, and look for attributes, that derive from their attributes. It will then change the method implementations according to what the attributes should do.
It can intercept method calls and property calls, and a lot more. You can read about the capabilities on the SharpCrafter's website

Ok, so now that we have a general idea of the tools we can use, lets jump into the unit tests.

Unit Tests

Generally when creating a unit test for a functionality, you also need to come up with the interfaces and classes that you will use to interact with the functionality.
So first we will need an attribute, lets say CacheResultAttribute.
    [AttributeUsage(AttributeTargets.Method, AllowMultiple=false)]
    public class CacheResultAttribute : Attribute
    {
    }
Now for the sake of future requirements, when you will not want to cahce the results statically, but in another way, lets allow the specification of the cache location, with an enum called CacheLocations.
    public enum CacheLocations
    {
        Static,
    }
and also add this as a contructor parameter for the attribute:
        private readonly CacheLocations cacheLocation;
        public CacheResultAttribute(CacheLocations location)
        {
            cacheLocation = location;
        }
Now we can create a test class that will count the method execution, and return the amount of time executed.
    [TestClass]
    public class CacheResultAttributeTest
    {
        public class TestClass
        {
            public int NrOfExecutions = 0;
            [CacheResultAttribute.CacheResult(CacheLocations.Static)]
            public int DoubleTheNumber(int i)
            {
                NrOfExecutions++;
                return i * 2;
            }
        }
    }
As you can see, I already decorated the DoubleTheNumber method with the attribute i just created above. I know public fields are not a good idea, but for the sake of keeping the test code short, its not a big deal.
Now in our tests we simply want to see if the DoubleTheNumber method is not called for parameters that were already passed in. The test method if fairly simple,
        [TestMethod]
        public void TestResultCached()
        {
            var subject = new TestClass();
            Assert.AreEqual(0, subject.NrOfExecutions);
            var result = subject.DoubleTheNumber(3);
            // we dont care if the method is actually ok, so this is not needed
            //Assert.AreEqual(6, result);

            //but we want to check if the execution count went up
            Assert.AreEqual(1, subject.NrOfExecutions);

            // now lets see if it is called with other parameter
            result = subject.DoubleTheNumber(2);
            //again check that execution nr went up
            Assert.AreEqual(2, subject.NrOfExecutions);

            //now get the result for 3 again
            result = subject.DoubleTheNumber(3);
            // and make sure we didnt execute the method
            Assert.AreEqual(2, subject.NrOfExecutions);
        }
it doesnt even need explanation.

Great, now we have a test, lets run it, and we have the first test result : "Result Message: Assert.AreEqual failed. Expected:<2>. Actual:<3>."
Of course since we didnt implement anything yet, the 3rd call just calls the function, and it is executed the 3rd time as well. So lets have a look at how this will be done.

The implementation

The first thing you have to do is get and install PostSharp. Please follow the steps on the website if you dont have it yet.
Oncc you have postsharp, and reference to the dlls, we can start extending an attribute called OnMethodBoundaryAspect which is in the postsharp dlls. This aspect allows you to intercept metod calls. You have 3 methods you can override:
bool CompileTimeValidate(MethodBase method)
void OnEntry(MethodExecutionArgs args)
void OnExit(MethodExecutionArgs args)
The first one is executed, when PostSharp recompiles the code. This means that the code here is only executed at build time, it does not have an influence at runtime, so this can be as heavy as you want. Obviously you will have a longer compile time, but it will not be visible to the code users.
The second and third methods are called when entering and exiting a method.
The MethodExecutionArgs class contains all necessary data, that you will require at runtime to decide what to do. So lets jump in.
    [Serializable]  // required by PostSharp
    [AttributeUsage(AttributeTargets.Method, AllowMultiple=false)]
    public class CacheResultAttribute : PostSharp.Aspects.OnMethodBoundaryAspect
    {
        private readonly CacheLocations cacheLocation;
        public CacheResultAttribute(CacheLocations location)
        {
            cacheLocation = location;
        }

        public override bool CompileTimeValidate(System.Reflection.MethodBase method)
        {
            return base.CompileTimeValidate(method);
        }

        public override void OnEntry(PostSharp.Aspects.MethodExecutionArgs args)
        {
            base.OnEntry(args);
        }

        public override void OnExit(PostSharp.Aspects.MethodExecutionArgs args)
        {
            base.OnExit(args);
        }

    }
Now lets start with the first. We really dont need to do anything here, but just for the same of understanding, lets not allow applying this attribute to void methods and constructors. That would be misleading anyways. So lets check if we are a void method, and if so, throw the error.
        public override bool CompileTimeValidate(System.Reflection.MethodBase method)
        {
            if (method == null)
            {
                PostSharp.Extensibility.Message error = new PostSharp.Extensibility.Message(MessageLocation.Explicit("CacheResultAttribute", 25, 0), SeverityType.Error, "AOP0001", "Method is null", "#", "CacheResultAttribute.cs", null);
                MessageSource.MessageSink.Write(error);
                return false;
            }
            if(method.IsConstructor)
            {
                PostSharp.Extensibility.Message error = new PostSharp.Extensibility.Message(MessageLocation.Explicit("CacheResultAttribute", 25, 0), SeverityType.Error, "AOP0001", "Attribute cannot be applied to constructors", "#", "CacheResultAttribute.cs", null);
                MessageSource.MessageSink.Write(error);
                return false;
            }
            if (!(method is MethodInfo))
            {
                PostSharp.Extensibility.Message error = new PostSharp.Extensibility.Message(MessageLocation.Explicit("CacheResultAttribute", 25, 0), SeverityType.Error, "AOP0001", string.Format("Attribute cannot be applied to method {0} because it cannot be cast to a MethodInfo", method.Name), "#", "CacheResultAttribute.cs", null);
                MessageSource.MessageSink.Write(error);
                return false;
            }
            if((method as MethodInfo).ReturnType == typeof(void))
            {
                PostSharp.Extensibility.Message error = new PostSharp.Extensibility.Message(MessageLocation.Explicit("CacheResultAttribute", 25, 0), SeverityType.Error, "AOP0001", string.Format("Attribute cannot be applied to method {0} because it has a void return type", method.Name), "#", "CacheResultAttribute.cs", null);
                MessageSource.MessageSink.Write(error);
                return false;
            }
            return base.CompileTimeValidate(method);
        }

Really not much interesting here. So lets get to the next one, what do we want to do in the OnEntry and OnExit methods?
Well, postsharp allows you to hijack method execution, and do not actually run the method by setting the args.FlowBehavior = FlowBehavior.Return. This will be very handy, since we will do just that, when we find a cached value.
So we have all the arguments in the args.Arguments property, we have the methodInfo in the args.Method property, and the instance (if not static method) in the args.Instance property.
Now one convention we will use, is, that if we encounter a static method, then we will store it to the type on which it is defined. Otherwise we will store the results for the cached object. Since System.Type does not use our attribute (we are just creating it), we should have no problems about duplicate dictionary keys. So we will index our static cache by the following keys:
1) Object instance or type
2) MethodBase
3) arguments
With these assumptions we have the follwoing code:
        private static readonly IDictionary<object, IDictionary<MethodBase, IDictionary<object, object>>> staticMethodCache = new Dictionary<object, IDictionary<MethodBase, IDictionary<object, object>>>();

        public override void OnEntry(PostSharp.Aspects.MethodExecutionArgs args)
        {
            if (this.cacheLocation != CacheLocations.Static)
                throw new NotImplementedException("Only static cache location is implemented for method return cache");
            var item = args.Instance ?? args.Method.ReflectedType;
            if (staticMethodCache.ContainsKey(item)
                && staticMethodCache[item] != null
                && staticMethodCache[item].ContainsKey(args.Method)
                && staticMethodCache[item][args.Method] != null)
            {
                object argsKey = args.Arguments.Count == 0 ? string.Empty : tupleCreator(args.Method.GetParameters().Select(x => x.ParameterType).ToArray(), args.Arguments.ToArray());
                if (staticMethodCache[item][args.Method].ContainsKey(argsKey))
                {
                    args.ReturnValue = staticMethodCache[item][args.Method][argsKey];
                    args.FlowBehavior = PostSharp.Aspects.FlowBehavior.Return;
                }
            }
            base.OnEntry(args);
        }
We simply checked if we have the value in the dictionary, and if yes, then we set the return value, and break the flow.
We used here a helper method tupleCreator. This essentially creates a tuple from the arguments. The tuple is of type (for example for arguments (int, string, Type) it is Tuple<Type, Tuple<string, Tuple<int>>>).
        /// 
        /// creates a tuple of the arguments
        /// 
        /// 
        /// 
        /// 
        private object tupleCreator(Type[] types, object[] arguments)
        {
            if (types == null)
                throw new ArgumentNullException("types");
            if (arguments == null)
                throw new ArgumentNullException("arguments");
            if (types.Length == 0)
                throw new ArgumentOutOfRangeException("types", "The specified type list needs at least 1 type");
            if (types.Length != arguments.Length)
                throw new ArgumentException("The specified argument count does not equal the type count", "arguments");
            var tupleCreator1 = typeof(Tuple).GetMethods(BindingFlags.Static | BindingFlags.Public).Where(x => x.Name == "Create" && x.GetGenericArguments().Length == 1).Single();
            var tupleCreator2 = typeof(Tuple).GetMethods(BindingFlags.Static | BindingFlags.Public).Where(x => x.Name == "Create" && x.GetGenericArguments().Length == 2).Single();

            var result = tupleCreator1.MakeGenericMethod(types[types.Length -1]).Invoke(null, new object[]{arguments[types.Length -1]});
            for (int i = types.Length -2; i > -1; i++)
            {
                result = tupleCreator2.MakeGenericMethod(types[i], result.GetType()).Invoke(null, new object[] { arguments[i], result });
            }
            return result;
        }

The method is quite straightforward, it can be improved by statically caching the two method infos for getting the Tuple factory methods. It can be also decreased with the number of calls to Invoke, since there are Tuple methods with more arguments, but for now this will do.
Now for the exit, we basically need to store the result in the dictionary, and we are done.
        public override void OnExit(PostSharp.Aspects.MethodExecutionArgs args)
        {
            if (this.cacheLocation != CacheLocations.Static)
                throw new NotImplementedException("Only static cache location is implemented for method return cache");
            var item = args.Instance ?? args.Method.ReflectedType;
            if (!staticMethodCache.ContainsKey(item))
            {
                staticMethodCache.Add(item, new Dictionary<MethodBase, IDictionary<object, object>>());
            }
            if (!staticMethodCache[item].ContainsKey(args.Method))
            {
                staticMethodCache[item].Add(args.Method, new Dictionary<object, object>());
            }
            object argsKey = args.Arguments.Count == 0 ? string.Empty : tupleCreator(args.Method.GetParameters().Select(x => x.ParameterType).ToArray(), args.Arguments.ToArray());
            staticMethodCache[item][args.Method].Add(argsKey, args.ReturnValue);
            base.OnExit(args);
        }
We create the dictionaries, and we insert the value. This would by the way throw an exception, if we managed somehow to get here again, with the same parameters.

Well, now lets run our unit tests, and wohooo. It passes. So with just a couple of line of code, we managed to move the caching code from the actual place to an aspect, that now we can apply anywhere.

Room for improvement

As you can see, there is only 1 caching method implementaion, the one for static caching. I use this type of caching a lot, when i create type maps, and other methods, that operate mostly on types. You could implement other caching mechanisms, and extend the above.

Problems

DO NOT USE THIS CODE IN PRODUCTION!!!
Why do I say that?

Well, lets take a look at our dictionaries. The first key is the object itself. That means, that any time this aspect is hit, there will be a reference to that object created in your dictionary, which will prevent garbage collection for that object. This is a massive issue, if you want to use this attribute on instances. As long as you use it on static methods, there is no major memory loss, but with instance methods this will result in you using up more n more memory. We will discuss this issue in an upcoming post.

The same problem then with object instances, comes up with the arguments. The dictionaries will have references to those arguments as well, which will do the same harm as above. As above, this will be addressed in the next post.
Cheers for reading.

Continue reading on the second part of this post

No comments:

Post a Comment