Java Quickly

Just another WordPress.com weblog

Partial rule Matching in Drools

Most RETE based rule engines don’t support partial rule matching, because the tree branches are bruned as soon as a condition is violated (either alpha or beta nodes)

We have tried two solutions for implementing partial matching

Single rule per risk profile approach:

  • Anded conditions are defined in the object creation LHS part
  • Ored conditions are defined using eval keyword as java objects
  • Helper classes were used to perform comparison operations on BigInteger  , BigDecimal ,Date and String fields to prevent null pointer exceptions for not supplied fact data
  • Scoring:
      • Implemented by accumulating the conditions score in a globally defined ArrayList within the working memory for example ( c1 && sums.add(new condition(“cond1” , 15))
      • All the risk profile is anded with another eval that invokes proceed method that takes the sums and a threshold, that determines if a risk should be inserted into the working memory or not

Pros: Number of rules = no. of risk profiles + 1

Cons:Extensive use of eval keyword in scoring which removes any optimization done by Drools engine (Alpha nodes sharing , indexing and early tree pruning)

  • Multiple rules for each Ored or Scored conditions approach:
    • Anded conditions are defined in the object creation LHS part
    • Ored conditions are defined in different rules in the same activation group with salience governing the conditions order
    • Scoring:
      • Each condition is defined in a separate rule with salience = -1 so that it never fire (accumulate rule is the only one allowed to fire)
      • An accumulate rule is defined for each risk profile that invokes the proceed with the accumulated conditions sums

Pros:Usage of Drools optimization techniques

Cons: Number of rules = no. of risk profiles * no.of ored conditions Or scored conditions + no.of risk profiles + 1

for example if we have 2500 risk profiles, where 500 of them has on average 4 ored and 500 scored conditions (3 conditions each)= 2000 + 4 * 500 + 500 * 3 + 500 + 1 = 6001

For POC case most of the risk profiles contains ored conditions and scoring so for the 2500 risk profiles defined, about 7000 rules were generated.

Performance comparison for both approaches

Test was performed on local standalone OC4J with local Oracle XE database.

Binary packages for single rule per risk profile approach was used as it was not possible to increase the Java heap space more than 1GB on windows XP, thus multiple rules approach failed locally because of Java heap space exceptions.

Note:

This data are manually collected as different criteria are monitored (on first request response time, OC4J memory increases, after first request response time)

Memory Consumption:

  • Single Rule per risk profiles approach:
    • Binary package size ~75MB
    • OC4J needed 430-450MB to build a binary package of 2800 rules
    • Load time ~32s to ~1 minute
  • Multiple rules approach:
    • Binary package size > 212MB (sum of single packages)
    • OC4J needed >1GB to build a DRL of 7000 rules
    • OC4J needed >1GB to build binary packages of sizes ranging from 20 to 90MB (most of the times the OC4J fails to load)
    • Load time from 7 to 10 minutes

Single Request Performance:

  • First time invocation average: 750-1000ms
  • Further Requests average: 500-650ms
  • Memory increased by about 20-30MB

Six Concurrent Requests Performance

  • First time invocation average: >5s
  • Further Requests average: 1.5 – 2s
  • Memory increased by about 50 – 60 MB

Fifty concurrent requests Performance:

  • Memory increased by about 300MB
  • First response arrival: ~4-6s
  • Last response arrival: ~12-16s

July 9, 2009 Posted by wzedan | Uncategorized | , , | No Comments Yet