Some ways to come up with less performant algorithms

Discussion in 'Learning Ruby and RGSSx' started by DoubleX, Mar 7, 2015.

  1. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    Disclaimer:

    Performance isn't everything, it isn't even always the most important thing. All the other programming aspects should be taken care of as well. Scripters should have a clear understanding of the importance and priority of each programming aspect in each part of their codes to ensure a working balance. They should also take their own current scripting proficiency into account to ensure none of their codes will ever be out of their control.

    This post aims to incite some other scripters to share some causes behind some less performant algorithms(Mainly about time and memory usage).

    Let's start by showing how I came up with less performant algorithms lol

    Example 1

    Suppose I've to check if a battler's any new or removed states and the checking can be needed at any frame. The result is used by at most 1 method for at most once per frame and the usage timing can be different from that of changing states.

    1st algorithm - I write a battler method being called per frame, which checks if the current state array is the same as that in the last frame and returns the result:

    def battler_state_change? return false if @states == @last_states @last_states = @states.clone trueend
    Performance - The time complexity of this algorithm's at least O(n) and the memory has to read and probably write(when the state array's changed) an array.

    It seems acceptable for me at first, but after a while I reviewed this method and asked "Can it be better?". Fortunately, I came up with a more performant algorithm shortly afterwards.

    2nd algorithm - I first use a variable to mark that a battler's added or removed a state, then write a method to reset that variable and returns true if that variable's true:

    alias clear_states_example clear_statesdef clear_states  clear_states_example  @state_change = trueendalias add_new_state_example add_new_statedef add_new_state(state_id) add_new_state_example(state_id) @state_change = trueendalias erase_state_example erase_statedef erase_state(state_id) erase_state_example(state_id) @state_change = trueenddef battler_state_change? return false unless @state_change @state_change = false trueend
    Performance - The time complexity of this algorithm's O(1) and the memory has to read and probably write(when the state array's changed) a boolean value. Also, the method calls are likely increased due to aliasing.

    Performance comparison -The 2nd algorithm is clearly more performant than the 1st one, and the difference of their results can become non-trivial when there are lots of battlers each having lots of states. The fact that both can be run every frame significantly magnifies the difference, which may cause a less powerful machine to use a constant average fps drop to complain :D

    Although the 2nd algorithm can be more prone to compatibility issues(like when some scripts that have to be placed below my script have rewritten clear_state, add_new_state or erase_state), they shouldn't be too hard to solve in most cases.

    You may argue that the increased number of method calls might outweight what the 2nd algorithm gains, but it'd be true only if states are added and/or removed every several frames, which is extremely unlikely.

    How did I come up with the 1st algorithm - It's done by directly translating the requirements into algorithms without even analyzing those requirements beforehand. That direct translation causes me to think that it's necessary to store the state array in the last frame and compare that with that in the current frame as the latter can be changed in any frame.

    After analyzing those requirements for a while, however, I realized that if I can catch all the codes that changes the state array, I can just use an instance variable to mark that the state array's changed. It just have to be reset to its default value right after notifying the change to ensure correct notification.


    Example 2(Slightly modified version of the YSA-CATB Percentage Addon)

    Suppose I've to update the position of all the 5 atb bars(back, atb, charge, cooldown, percentage) of each enemy per frame to ensure they're all always placed directly below each enemy's sprite.

    1st algorithm - I use the enemy sprite x and y positions to set all that enemy's atb bars directly below that enemy's sprite, and ensure they won't overlap with the status window(mostly the original Yami's algorithm with quite some minor twists, added support to that script and some compatibility issues):

    (Both @type and @start_width are fixed once set; update_position is the only method changing the atb bar positions)

    #----------------------------------------------------------------------------|#  Rewrite method: update_position                                           |#----------------------------------------------------------------------------|def update_position dx = @battler.screen_x - @start_width / 2 dy = @battler.screen_y # Added to set the x and y offset of percentage text if @type == :percent dx += TEXT_X_OFFSET dy += TEXT_Y_OFFSET end # rect.x = dx dh = rect.height + 1 # Rewritten to ensure the percent bar won't overlap with the status window dh += 2 unless @type == :back || @type == :percent dy = [dy, Graphics.height - dh - 120].min dy += 1 unless @type == :back || @type == :percent # # Added to be compatible with DoubleX RMVXA Enemy MP/TP Bars Addon to YEA-BattleEngine and YEA-EnemyHPBars dy -= ENEMY_MP_GAUGE_HEIGHT + ENEMY_TP_GAUGE_HEIGHT if TP_MP_ADDON dy -= YEA::BATTLE::ENEMY_GAUGE_HEIGHT if $imported["YEA-EnemyHPBars"] # rect.y = dyend # update_position
    Performance - Right now it's just too complicated for me to analyze, but my data suggests that the above code's quite expensive.

    It' still great though as it does almost nothing redundant and the codes are nearly optimized(as it's mostly the Yami's original algorithm with some minor twists to boost the performance a bit further :) ).

    2nd algorithm - I first check if the enemy sprite x and y positions are changed, and use the 1st algorithm only if at least 1 of them changed:

    (Both @type and @start_width are fixed once set; update_position is the only method changing the atb bar positions)

    #----------------------------------------------------------------------------|# Rewrite method: update_position |#----------------------------------------------------------------------------|def update_position  dx = @battler.screen_x  dy = @battler.screen_y # Added to update the atb positions only if the battler's position's changed  return if @last_x == dx && @last_y == dy  @last_x = dx  @last_y = dy  dx -= @start_width / 2 # # Added to set the x and y offset of percentage text if @type == :percent dx += TEXT_X_OFFSET dy += TEXT_Y_OFFSET end # rect.x = dx dh = rect.height + 1 # Rewritten to ensure the percent bar won't overlap with the status window dh += 2 unless @type == :back || @type == :percent dy = [dy, Graphics.height - dh - 120].min dy += 1 unless @type == :back || @type == :percent # # Added to be compatible with DoubleX RMVXA Enemy MP/TP Bars Addon to YEA-BattleEngine and YEA-EnemyHPBars dy -= ENEMY_MP_GAUGE_HEIGHT + ENEMY_TP_GAUGE_HEIGHT if TP_MP_ADDON dy -= YEA::BATTLE::ENEMY_GAUGE_HEIGHT if $imported["YEA-EnemyHPBars"] # rect.y = dyend # update_position
    Performance - It's O(1) and several added memory access to some integer values + that of the 1st algorithm when the enemy sprite position changes, otherwise it's just O(1) and several added memory access to some integer values, unless the screen_x or screen_y readers are rewritten to some insane stuffs.

    Performance Comparison - In most cases, it'll perform significantly better than the 1st algorithm, as normally it's quite unlikely that an enemy sprite x and y positions will change frequently(let alone per frame), and the position checking's cheap while the 1st algorithm's expensive. The average fps boosted by at least 2 by switching from the 1st algorithm to the 2nd algorithm in my machine, which can be noticeable.

    How did I come up with the 1st algorithm - It's done by directly implementing my percentage addon to YSA-CATB and solving compatibility issues without asking "Can really great codes be even better?". This causes me to merely did some minor optimizations instead of rethinking the Yami's algorithm.

    As my percentage addon sucks hard(it still is) and causes a 30+ average fps drop(from roughly 119 to roughly 88) in my machine, I had to optimize every last bit of the addon. Then I realized that all the atb bar positions need to be changed only if the enemy sprite's position's changed, as the 1st algorithm won't change the formers at all unless the latter's changed. So I decided to call the 1st algorithm only if the latter's changed to reduce the amount of redundant code executions.


    Example 3(Slightly modified version of (DoubleX)Intercept Magic)

    Suppose I've to read some notetags used by that script from the database.

    1st algorithm - I check each line against all notetags to be read:

    #----------------------------------------------------------------------------|#  New method: load_notetags_intercept_magic                                 |#----------------------------------------------------------------------------|def load_notetags_intercept_magic  @intercept_tier = 0  @intercept_chance =100  @battler_intercept = {}  @intercept_mp = @intercept_tp = @hit_intercept_mp = @hit_intercept_tp = false  @note.split(/[\r\n]+/).each { |line|    if line =~ /<(?:INTERCEPT_TIER|intercept tier):[ ]*(\d+)>/i      @intercept_tier = $1.to_i if $1.to_i > @intercept_tier end    if line =~ /<(?:INTERCEPT_MP|intercept mp)>/i      @intercept_mp ||= @intercept_tier > 0 end    if line =~ /<(?:INTERCEPT_TP|intercept tp)>/i      @intercept_tp ||= @intercept_tier > 0 end    if line =~ /<(?:HIT_INTERCEPT_MP|hit intercept mp)>/i      @hit_intercept_mp ||= DoubleX_RMVXA::Intercept_Magic::Intercept_MP_Limit && @intercept_mp end    if line =~ /<(?:HIT_INTERCEPT_TP|hit intercept tp)>/i      @hit_intercept_tp ||= DoubleX_RMVXA::Intercept_Magic::Intercept_TP_Limit && @intercept_tp end    if line =~ /<(?:ALLY_INTERCEPT|ally intercept)>/i      @battler_intercept[:ally] = true end    if line =~ /<(?:ENEMY_INTERCEPT|enemy intercept)>/i      @battler_intercept[:enemy] = true end    if line =~ /<(?:INTERCEPT_CHANCE|intercept chance):[ ]*(\d+)>/i      @intercept_chance = $1.to_i / 100.0 if $1.to_i < @intercept_chance  end  }end # load_notetags_intercept_magic
    Performance - I don't know the performance of reading regular expressions(O(n)? O(n^2)? Just my wild guesses for its time complexity), but it's also determined by the number of lines of the item and that of the notetags to be read. The more notetags to be read, the more time the above algorithm needs. The memory usage depends on what and how many notetags are read, and right now it's too complex for me to analyze.

    I was satisfied. It works and theoretically it does what it just need to do.

    2nd algorithm - Similar to the 1st one, but once a notetag's read the method jumps right to the next line. Also, I check the most frequent notetag first:

    #----------------------------------------------------------------------------|# New method: load_notetags_intercept_magic |#----------------------------------------------------------------------------|def load_notetags_intercept_magic @intercept_tier = 0 @intercept_chance =100 @battler_intercept = {} @intercept_mp = @intercept_tp = @hit_intercept_mp = @hit_intercept_tp = false @note.split(/[\r\n]+/).each { |line| case line when /<(?:INTERCEPT_TIER|intercept tier):[ ]*(\d+)>/i @intercept_tier = $1.to_i if $1.to_i > @intercept_tier when /<(?:INTERCEPT_MP|intercept mp)>/i @intercept_mp ||= @intercept_tier > 0 when /<(?:INTERCEPT_TP|intercept tp)>/i @intercept_tp ||= @intercept_tier > 0 when /<(?:HIT_INTERCEPT_MP|hit intercept mp)>/i @hit_intercept_mp ||= DoubleX_RMVXA::Intercept_Magic::Intercept_MP_Limit && @intercept_mp when /<(?:HIT_INTERCEPT_TP|hit intercept tp)>/i @hit_intercept_tp ||= DoubleX_RMVXA::Intercept_Magic::Intercept_TP_Limit && @intercept_tp when /<(?:ALLY_INTERCEPT|ally intercept)>/i @battler_intercept[:ally] = true when /<(?:ENEMY_INTERCEPT|enemy intercept)>/i @battler_intercept[:enemy] = true when /<(?:INTERCEPT_CHANCE|intercept chance):[ ]*(\d+)>/i @intercept_chance = $1.to_i / 100.0 if $1.to_i < @intercept_chance end }end # load_notetags_intercept_magic
    Performance - Similar to the 1st one, but now it also depends on how many notetags are checked before a notetag's read.

    Performance Comparison - Unless there's no notetag at all or the only notetag being read is the least frequent one, the 2nd algorithm will perform at least slightly better than the 1st one, as now not every line needs to check against all notetags, they just have to check until a notetag's read. The performance difference can be non-trivial when there are lots of notetags to be read.

    How did I come up with the 1st algorithm - It's done by not thinking the user side of the story thoroughly enough. Theoretically, I've to check every line against every notetag as I've no way to ensure all users of my script will always put at most 1 notetag per line. But in practice, almost all users will just put at most 1 notetag per line unless the scripts ask otherwise. If you still feel unsafe, you can even ask users to put at most 1 notetag per line and nearly all users should have no problem with it. So when a notetag's read, I'm almost certain that there will be no more notetags in the same line, so continue checking against the rest is just redundant. It's better to jump to the next line instead. Checking the most frequent notetag first gives that script the highest chance to check against the least notetags, thus further boosting the performance.


    There may be(and extremely likely are) many, many other specific reasons and/or general practices behind less performant algorithms. Any specific example or general thought about this to share here?

    P.S.: I won't claim any of my 2nd algorithm is the best or even ideal, as I just don't know if there are even better ones. I'd be glad if you'd tell me some better algorithms(if any) for the above examples.
     
    Last edited by a moderator: Mar 21, 2015
    #1
    Zeriab and Another Fen like this.
  2. Evgenij

    Evgenij Veteran Veteran

    Messages:
    349
    Likes Received:
    100
    First Language:
    German
    Primarily Uses:
    N/A
    Example 1:

    What happens if you try to call battler_state_change? more than one time? You can use battle_state_change? only 1 time for it too work, what if there are more methods or classes which need to know about the state change during the same time?
     
    #2
  3. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    If you're(or anyone is) querying the correctness of the algorithms:

    1st algorithm:

    As the result's used at most once per frame, if @last_states == @states is false, then battler_state_change? sets @last_states as @states.clone and correctly returns true. It won't be used again in the same frame. Also:

    Case 1 - If after x frames, battler_state_change? is called but @states didn't change before calling that method at that frame, then unless some other scripts write @last_states somewhere(which is compatibility issues, although it can be a bit hard to solve in some cases), @last_states == @states will return true and battler_state_change? will correctly return false.

    Case 2 - If after x frames, battler_state_change? is called and @states changed before calling that method at that frame, then unless some other scripts write @last_states somewhere(which is compatibility issues, although it can be a bit hard to solve in some cases), @last_states == @states will return false and battler_state_change? will correctly return true.

    So the 1st algorithm is correct unless there are compatibility issues.

    2nd algorithm:

    As the result's used at most once per frame, if @state_change is true, then battler_state_change? sets @state_change as false and correctly returns true. It won't be used again in the same frame. Also:

    Case 1 - If after x frames, battler_state_change? is called but neither clear_states, add_new_state nor erase_state is called before calling the former at that frame:

    1. Unless some other scripts added more methods that can change the state array(which is also compatibility issues that are quite easy to fix in general), I'm sure the state array doesn't change.

    2. Unless some other scripts write @state_change somewhere(which is also compatibility issues, although it can be a bit hard to solve in some cases), @state_change will remain false and battler_state_change? will correctly return false.

    Case 2 - If after x frames, battler_state_change? is called and at least one of clear_state, add_new_state or erase_state is called before calling the former at that frame:

    1. Unless my aforementioned compatibility issues come into play(which is quite easy to solve in general), I'm sure the state array's changed.

    2. Unless some other scripts writes @state_change somewhere(which is also compatibility issues, although it can be a bit hard to solve in some cases), @state_change will be true and battler_state_change? will correctly return true.

    So the 2nd algorithm is correct unless there are compatibility issues.
    If battler_state_change? needs to be called more than once per frame(meaning at least some parts of the requirements change), then I can change both algorithms into the below(I don't know if there are better ones and I'd be glad to know them if that's the case):

    Expanded 1st algorithm - Similar to the original one, but a hash will be used to mark which method calls this method at a frame:

    def battler_state_change?(def_name) return false if @states == @last_states[def_name] @last_states[def_name] = @states.clone trueendP.S.: To support multiple calls per frame from a single method, use def_name_call_x as keys of @last_states and an instance variable counter to count the number of calls from that method. Of course the performance can become quite ugly lol
    Performance - Its time complexity is still O(n) per call and the memory still has to deal with an array(twice when the checking will return true; actually an array in a hash, which is at least slightly more expensive) per call.

    However, the time complexity per frame is now O(n^2) and the memory now has to deal with lots of arrays in a hash per frame unless extremely few methods will ever call battler_state_change? at the same frame(which would still be O(n)).

    Expanded 2nd algorithm - Similar to the original one, but a hash will be used to mark which method calls this method at a frame:

    alias clear_states_example clear_statesdef clear_states  clear_states_example  @state_change.each_value { |val| val = true }endalias add_new_state_example add_new_statedef add_new_state(state_id) add_new_state_example(state_id) @state_change.each_value { |val| val = true }endalias erase_state_example erase_statedef erase_state(state_id) erase_state_example(state_id) @state_change.each_value { |val| val = true }enddef battler_state_change?(def_name) return false unless @state_change[def_name] @state_change[def_name] = false trueendP.S.: To support multiple calls per frame from a single method, use def_name_call_x as keys of @last_states and an instance variable counter to count the number of calls from that method. Of course the performance can become quite ugly lol

    Off topic: This algorithm is the same as a real script I'm writing(although it doesn't just check for state changes). I've tested this algorithm and so far I didn't find any bugs(of course that alone doesn't mean this algorithm's correct). The original 2nd algorithm is a simplified one :)
    Performance - The time complexity is still O(1) per battler_state_change? call, but that of clear_states, add_new_state and erase_state is now O(n). The memory now has to deal with a hash of booleans when either of the formers' called(twice when it will return true), and a boolean in a hash when the latter's called.

    However, the time complexity per frame is still O(n) and the memory still only has to deal with extremely few hashes of booleans(but also possibly lots of booleans in a hash) per frame unless the state array may change lots of times(which would be O(n^2)).

    Performance Comparison - That of the 1st one is mainly determined by the maximum possible state array size and the maximum possible number of methods calling battler_state_change? at a frame, while that of the 2nd one is mainly determined by the maximum possible state changes at a frame and the maximum possible number of methods calling at a frame. So unless the maximum possible state array size is smaller than the maximum possible state changes at a frame(and I think it's extremely unlikely), the expanded 2nd algorithm will likely perform better in general.

    Compatibility Comparison - Those of them should be the same as the essence behind both expansions are basically the same.


    On a side note, if a state's added and removed again before calling battler_state_change? again, algorithm 1 will say the state array didn't change but algorithm 2 will say the state array changed(I don't know if there are some other cases which can also cause this). If this case's non-trivial, then pick the one giving the desired result, as correctness almost always comes before performance, unless all the correct choices have intolerable performance issues(and in this case they don't in general).

    P.S.:

    1. I should've noticed and voiced out the above case that causes behavior difference from both algorithms XD

    2. How did I spent nearly 3 hours writing this reply? My thinking algorithm surely sucks hard like hell in term of time lol
     
    Last edited by a moderator: Mar 21, 2015
    #3
    Evgenij likes this.
  4. Another Fen

    Another Fen Veteran Veteran

    Messages:
    520
    Likes Received:
    236
    First Language:
    German
    1)

    You may have to extend  clear_states  as well if you are planning to allow the "Recover all" command.

    2)

    While I don't think updating coordinates for five bars would be that of an performance impact, that's generally a nice way to reduce the impact of unneccessary updates. I don't know if  @type  or  @start_width  are supposed to be constant or if the rect coordinates can be modified outside of the method though.

    3)
    In Ruby, case statements are terminated after the first matching case anyway, so your second algorithm should not change a thing here.

    What you probably could do to improve performance would be caching the result, so you only have to do the note evaluation once for every item.
     
    Last edited by a moderator: Mar 10, 2015
    #4
    DoubleX likes this.
  5. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    1. You're right. I should alias clear_states as well. The 2nd algorithm's changed now.

    2. Actually it'll as it's 5 bars per enemy, and when there are 8 enemies at the same time, it's 40 bars. Also, I should've mentioned that both @type and @start_width will never change once initialized. The only method changing the rect coordinates is the one in both algorithms.

    3. I've checked again and you're right. I think I've mixed up the ruby case with the c++/java ones(My bad :) ). Now the 2nd algorithm is the 1st one and the 1st one has changed to something unrealistically stupid lol

    However, I can't get what you mean by caching the result, as each item has its own notebox and I don't know which item has which notetags before reading the notebox of each item.
     
    Last edited by a moderator: Mar 10, 2015
    #5
  6. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    As KilloZapit suggested the below algorithm for Example 1 earlier and I've briefly tested that it's correct(its correctness is the same as that of the 1st algorithm), I'll include it here:

    3rd algorithm - He/she writes a battler method being called per frame, which checks if the hash code of the current state array is the same as that in the last frame and returns the result:

    def battler_state_change? return false if @states.hash == @last_state_hash @last_state_hash = @states.hash trueend
    Performance - The time complexity of this algorithm's O(1) and the memory has to read and probably write(when the state array's changed) a numeric value.

    Expanded 3rd algorithm - Similar to the original one, but a hash will be used to mark which method calls this method at a frame:

    def battler_state_change?(def_name) return false if @states.hash == @last_state_hash[def_name] @last_state_hash[def_name] = @states.hash trueendP.S.: To support multiple calls per frame from a single method, use def_name_call_x as keys of @last_state_hash and an instance variable counter to count the number of calls from that method. Of course the performance can become quite ugly lol
    Performance - The time complexity is still O(1) per call and the memory has to read and probably write(when the state array's changed) a numeric value in a hash per call. The time complexity per frame is now O(n) and the memory has to read and probably write(when the state array's changed) lots of numeric values in a hash per frame.

    Performance Comparison - It's even better than the 2nd one as it doesn't have extreme cases(lots of state changes per frame) to worry about and has better compatibility(no method is aliased at all).

    How did I fail to come up with the 3rd algorithm - It's done by not knowing hash codes at all. It's due to my lack of knowledge in Ruby. If I knew that, the 3rd algorithm should be obvious to me. My bad lol


    Example 4

    Suppose I've to rewrite/alias/create a new method where it does A and B if script C is present and absent respectively. C can't be enabled or disabled on the fly and must be placed above this script.

    1st algorithm - I separated the parts that will be done in both A and B, and does the remaining of A and B if C is present and absent respectively. Whether C is present is checked per call:

    class Foo_Bar def foo does_something $imported[:script_C] ? bar_A : bar_B does_something_else end # fooend # Foo_Bar
    Performance - A boolean, symbol, numeric or string(depending on the value of $imported[:script_C]) in a hash needs to be read per call.

    2nd algorithm - Similar to the 1st one, but now the presence of C is checked per game execution:

    module This_Script FOO_BAR = %Q( def foo does_something #{$imported[:script_C] ? "bar_A" : "bar_B"} does_something_else end # foo )end # This_Scriptclass Foo_Bar module_eval(This_Script::FOO_BAR)end # Foo_Bar
    Performance - A boolean, symbol, numeric or string(depending on the value of $imported[:script_C]) in a hash needs to be read per game execution.

    Performance Comparison - The 2nd algorithm's better as it only checks once per game execution while the 1st one will almost always check lots of times per game execution. While normally the difference's negligible, when foo is called lots of times per frame, the performance difference might become noticeable.

    How did I come up with the 1st algorithm - It's done by not utilizing the fact that C can't be enabled or disabled on the fly and must be placed above this script. The former means $imported[:script_C] will always return the same value in the same game execution and the latter means $imported[:script_C] will always return the correct value. So it's safe to just check once per game execution, making such checking per method call redundant.


    Example 5(Slightly modified version of (DoubleX)YSA CATB ATB Addon)

    Suppose I've to let users set parts of a method's content. That method is likely called frequently.

    1st algorithm - I let users set a constant storing parts of a method's content. That constant is evaluated in that method when it's called:

    #==============================================================================|#  ** You only need to edit this part as it's about what this script does      |#------------------------------------------------------------------------------|module DoubleX_RMVXA  module YSA_CATB_ATB_Addon    # The atb increment per frame is:    # CATB_GAIN_MODIFIER * MAX_CATB_VALUE / $game_system.catb_fill_time    # MAX_CATB_VALUE is equal to 100000.0    # $game_system.catb_fill_time is DEFAULT_FILL_TIME or FILL_TIME_VARIABLE    # CATB_GAIN_MODIFIER default is "agi.to_f / BattleManager.average_agi"    # agi.to_f is battler's agi    # BattleManager.average_agi is the average agi of all battlers    CATB_GAIN_MODIFIER = "agi.to_f / BattleManager.average_agi"  end # YSA_CATB_ATB_Addonend # DoubleX_RMVXA#==============================================================================|#==============================================================================|#  ** You need not edit this part as it's about how this script works          |#------------------------------------------------------------------------------|if $imported["YEA-BattleEngine"] && $imported["YSA-CATB"] && $imported["DoubleX RMVXA Bug Fixes to YSA-CATB"]#------------------------------------------------------------------------------|class Game_Battler < Game_BattlerBase  #----------------------------------------------------------------------------|  #  Rewrite method: real_gain_catb                                            |  #----------------------------------------------------------------------------|  def real_gain_catb    # Rewritten to use the CATB_GAIN_MODIFIER    eval(DoubleX_RMVXA::YSA_CATB_ATB_Addon::CATB_GAIN_MODIFIER + " * base_gain_catb")    #  end # real_gain_catbend # Game_Battler#------------------------------------------------------------------------------|end # if $imported["YEA-BattleEngine"] && $imported["YSA-CATB"] && $imported["DoubleX RMVXA Bug Fixes to YSA-CATB"]#==============================================================================|
    Performance - A eval is used per call and x eval are used per frame, x being the number of alive battlers. The actual performance per eval depends on CATB_GAIN_MODIFIER.

    2nd algorithm - Similar to the 1st one, but now that constant is evaluated along with the other parts of the method per game execution:

    #==============================================================================|# ** You only need to edit this part as it's about what this script does |#------------------------------------------------------------------------------|module DoubleX_RMVXA module YSA_CATB_ATB_Addon # The atb increment per frame is: # CATB_GAIN_MODIFIER * MAX_CATB_VALUE / $game_system.catb_fill_time # MAX_CATB_VALUE is equal to 100000.0 # $game_system.catb_fill_time is DEFAULT_FILL_TIME or FILL_TIME_VARIABLE # CATB_GAIN_MODIFIER default is "agi.to_f / BattleManager.average_agi" # agi.to_f is battler's agi # BattleManager.average_agi is the average agi of all battlers CATB_GAIN_MODIFIER = "agi.to_f / BattleManager.average_agi"#==============================================================================|#==============================================================================|# ** You need not edit this part as it's about how this script works |#------------------------------------------------------------------------------| # Stores real_gain_catb REAL_GAIN_CATB = %Q( def real_gain_catb #{CATB_GAIN_MODIFIER} * base_gain_catb end # real_gain_catb ) end # YSA_CATB_ATB_Addonend # DoubleX_RMVXAif $imported["YEA-BattleEngine"] && $imported["YSA-CATB"] && $imported["DoubleX RMVXA Bug Fixes to YSA-CATB"]#------------------------------------------------------------------------------|class Game_Battler < Game_BattlerBase #----------------------------------------------------------------------------| # Rewrite method: real_gain_catb | #----------------------------------------------------------------------------| module_eval(DoubleX_RMVXA::YSA_CATB_ATB_Addon::REAL_GAIN_CATB)end # Game_Battler#------------------------------------------------------------------------------|end # if $imported["YEA-BattleEngine"] && $imported["YSA-CATB"] && $imported["DoubleX RMVXA Bug Fixes to YSA-CATB"]#==============================================================================|
    Performance - A module_eval is used per game execution. The actual performance of the module_eval depends on REAL_GAIN_CATB.

    Performance Comparison - The 2nd algorithm clearly outperforms the 1st one as real_gain_catb is probably called per battler per frame, while the module_eval is only called once per game execution.

    How did I come up with the 1st algorithm - It's done by not knowing eval is extremely expensive and not caring performance enough. At that time, I thought eval is extremely convenient without costing anything big. When I knew eval is that expensive(roughly 6080 ns per call in my test while a method call costs roughly 70 ns in my test), I didn't care about the particular use of eval in the atb addon as there's not even a 1 average fps drop in my tests and so far I didn't receive any performance concerns from the users. But the fact that the 2nd algorithm almost has no non-trivial drawbacks causes me to think that maybe I should use that algorithm instead, even if the performance boost is trivial.
     
    Last edited by a moderator: Mar 15, 2015
    #6
  7. Another Fen

    Another Fen Veteran Veteran

    Messages:
    520
    Likes Received:
    236
    First Language:
    German
    About the Example 1)
    Array#hash is actually O(n) and as far as I know its result is not cached within the array, so I'd expect that to be a bit slower than an ordinary array comparison. Despite from that it's quite unlikely, but not impossible that two different arrays have the same hash, as there are only about one billion different hashes, but a lot more possible arrays.
     

    When your state window can't display more than - for example - 12 states at once, you could reduce the array comparison to the first 12 elements. That would actually make the array comparison O(1), but probably won't improve the performance unless you are dealing with a lot of states at once.
     
    The expansion should work fine. As an alternative, you could do the states comparisons within the different users of that method instead or implement a counter value like this (based on your second algorithm):

    class Game_Battler   def refresh_count    # Number of updates:    @refresh_count || 0  end   def update_refresh_count    # Increase number of updates after a change:    @refresh_count = refresh_count + 1  end   alias_method:)old_definition_of_example, :example_method)  def example_method    old_definition_of_example    update_refresh_count  endend class Window_StatusExample   def update    # Refresh when the actors status has changed:    refresh  if @last_counter != (@last_counter = actor.refresh_count) endend
     Example 4)
    This should be a quite performant way to do it. It might become confusing though.
    If I didn't go with your first variant I'd probably do something like this (which is still a bit slower than the eval variant):

    class Foo_Bar   def foo    does_something    bar_resolved    does_something_else  end   if $imported[:script_C]    def bar_resolved; bar_A; end  else    def bar_resolved; bar_B; end  endend
     
    Example 5)
     
    At the moment, formulas like this probably won't work correctly because of the high precedence of the * operator:
    1 + agi.to_f / BattleManager.average_agiDespite from that, this thread might be interesting: how to speed up formula eval code.

    If you have one formula that works for all battlers, you could instruct the users to modify a method instead of a string in the first place of course.
     
    Last edited by a moderator: Mar 15, 2015
    #7
    DoubleX likes this.
  8. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    My informal benchmark suggests that you're right:

    block = -> x { [1, 1, 1, 1, 1, 1, 1, 1] == [1, 1, 1, 1, 1, 1, 1, 1] }p("block")10_000_000.times(&block)p("block")# Roughly 19 secondsblock = -> x { [1, 1, 1, 1, 1, 1, 1, 1].hash == [1, 1, 1, 1, 1, 1, 1, 1].hash }p("block")10_000_000.times(&block)p("block")# Roughly 47 seconds
    However, I recall that he/she said that the algorithm might have to rewrite the Array class or make a subclass or a wrapper class. So either I get his/her algorithm wrong, or his/her algorithm is worse than the 1st one :)

    What do you mean by Example 3? The one using a slightly modified version of (DoubleX)Intercept Magic? But it seems to me that your code is doing something completely differently. Maybe your code is actually an alternative to the expanded 1st and/or 2nd algorithm in example 1(checking state change).
     
    #8
  9. Another Fen

    Another Fen Veteran Veteran

    Messages:
    520
    Likes Received:
    236
    First Language:
    German
    Sorry, accidently posted before I was finished. Example 3 was meant to be "Expansion of 3rd alorithm" :)
     
    #9
  10. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    Actually, Example 1 can be something other than displaying states :)

    For example, in an atb system where states can use notetags altering the atb gain rate, when the state changes, the notetags altering the atb gain rate can also change, and gaining atb can happen at any frame.

    For Example 4, I think your way has better readability and probably maintainability, especially when the methods addressed are complicated. Reading methods constructed by module_eval can sometimes be like putting all the pieces of a puzzle by the readers themselves lol

    The operator precedence problem in Example 5 can be easily dealt with:

    (1 + agi.to_f / BattleManager.average_agi)
    Letting users to edit the method content can also be a choice, but here I think letting them to edit a string instead will be probably less error-prone, especially when I don't want users to touch the base_gain_catb part.

    About that Tsukihime's post, actually I knew how dreaded eval is thanks to that. But now I'll also try to avoid using lambda/proc either, as they're still less performant than method calls(roughly 220-230ns vs roughly 70ns per call in my machine) and saving instances with lambdas/procs need quite some work. If the custom code contents won't change once it's set, I'll try to use module_eval, otherwise I'll try to create methods on the fly(but it won't be feasible in some special cases).
     
    Last edited by a moderator: Mar 15, 2015
    #10
  11. Another Fen

    Another Fen Veteran Veteran

    Messages:
    520
    Likes Received:
    236
    First Language:
    German
    For the record:

    I doubt that strictly getting rid of blocks, instance variables and one-line-methods is going to get you anywhere when dealing with performance issues. Of course there are slightly less expensive alternatives, but they usually don't really matter if you don't use them excessively and if you are looking for ultimate performance using Ruby probably isn't the best choice to begin with anyway.

    While it makes sense to get rid of a proc if it is just a waste, but before using those micro optimizations when struggling with performance issues it's probably more effective to reconsider the general structure of your system (of course there are exceptions to this).

    I don't think you meant it that way in the first place, but you don't have to stop using  Array#each  just because it uses a block. :)

    The O-notation is helpful to determine if an algorithm is suited for large scales, but is not really significant for small values. For example, retrieving the value of one parameter of a battler is actually O(n²) (with n being the number of the battlers feature objects, assuming all objects have about the same amount of features), but usually still has a bearable performance that is often overshadowed by the amount of actual features (O(n)).
     
    Last edited by a moderator: Mar 15, 2015
    #11
  12. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    1. Actually I've been thinking of outsourcing some parts of some of my scripts to external dlls or something else, but as RMVXA uses RGSS3 which is Ruby, I doubt how many parts can be actually outsourced.

    In a script I'm writing, simply shaving off most of the small inefficiencies(most 1-2 line methods, redundant method calls and variables, etc) boosted the average fps by nearly 10(from roughly 100 to roughly 110 in a general situation), while upgrading some algorithms to the some performant ones(one of them is changing the expanded 1st algorithm to the expanded 2nd algorithm in Example 1) boosted the average fps by nearly another 9(from roughly 110 to roughly 119 in the same general situation). Now in another situation the average fps is about 111, and I'm still thinking if it can be boosted further.

    Here I'm not looking for ultimate performance but the maximum performance I can have while still conforming all conditions(mainly the script requirements), constraints(including my current scripting proficiency) and the balance of all the other programming aspects(like modularity).

    And yes, I'd say that rethinking the general script structure when something goes wrong is important, unless it's certain that the structure does its job(changing the general script structure usually means tons of work though).

    2. Yes, O-notation can be deceiving when n is always or usually small(and the constants may even be the determining factor here), so when using it I'll think of some reasonable average and maximum value of n, and when possible, I'll test the algorithms to gain some actual data(that's why I mentioned the average fps data).

    For example, consider algorithms with n being the number states a battler has. If the maximum value of n is 4, the difference between O(n) and O(n^2) won't mean much(the constant here may actually play a bigger role); If the maximum value of n is 20(I recall a RMVXA game in which 20 isn't a rare value), the difference between O(n) and O(n^2) may actually mean something, especially if the algorithm's run x constant times per frame.
     
    Last edited by a moderator: Mar 15, 2015
    #12
  13. Another Fen

    Another Fen Veteran Veteran

    Messages:
    520
    Likes Received:
    236
    First Language:
    German
    In theory, when a proc call costs you 150ns, then boosting your FPS by one frame requires:
    100->101 : ~660 saved proc calls
    60->61 : ~1800 saved proc calls

    20->21 : ~16000 saved proc calls

    This may be a quite odd comparison, as it uses static FPS gains in a growing system, but what I wanted to emphasize was that a gain of 20% more FPS sounds good and certainly is a noticable difference on the screen, but 20% more speed is not a real answer to serious performance issues (when the game would drop to 20 FPS, it just drops to 24 FPS instead). In addition, as your system grows bigger, you have to continue those optimizations to keep the performance gain anyway, so it's a growing effort for a constant result.

    I don't want to discourage you or anybody else from optimizing their code, but for users who come to this thread looking for answers how to make their code less laggy, statements like "avoid procs, they are less performant than methods" while all that is about is saving single microseconds and hope they somehow sum up to a noticable performance gain will confuse more than help creating actually faster code.

    (Edited, Sorry for getting a bit off topic here :) )
     
    Last edited by a moderator: Mar 19, 2015
    #13
  14. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    Maybe you're right, and perhaps next time I may need to add disclaimers like:

    "Performance isn't everything, it isn't even always the most important thing. All the other programming aspects should be taken care of as well. Scripters should have a clear understanding of the importance and priority of each programming aspect in each part of their codes to ensure a working balance. They should also take their own current scripting proficiency into account to ensure none of their codes will ever be out of their control."

    (Actually I just added it at the top of the opening post)

    For the actual performance boost, I usually run the game several times to test and obtain the actual data to compare the performance of the control case and the new case.

    Also, it's machine dependent, so I'll have to take the users' machines' capabilities into account. Users here aren't only those directly using the scripts, but also the players of their games using those scripts.

    Moreover, as my machine(2 x 2GB DDR3 1333 + i7-3820) is obviously at least close to average among RMVXA developers and players, it's reasonable for me to think of significantly less powerful machines relative to mine.

    Besides, as it's likely that my scripts aren't the only resource consuming scripts their users are using, I'll generally place performance into a high priority for the resource consuming parts(and run extremely frequently) of my codes.

    In a script I'm writing, unless the values of the user configurations go crazy or my machine is already having a heavy workload, I can almost guarantee the average fps will seldom be below 70 in my machine, which is certainly more than enough. But I don't know if the same applies to less powerful machines or users using other resource consuming scripts.
     
    Last edited by a moderator: Mar 19, 2015
    #14
  15. Another Fen

    Another Fen Veteran Veteran

    Messages:
    520
    Likes Received:
    236
    First Language:
    German
    Perhaps I'm seeing this issue too strict in the first place. I wanted to avoid the impression using those constructs would come with a risk of making the code unperformant. This even might be true for some cases, but in the other cases avoiding them would probaby just make the code look optimized.

    Usually VXAce applications do not run at constant 60 FPS on my PC, so I'm glad for scripts that avoid unnecessary impact.

    (Edit: I think I have to apologize for hijacking here... Sorry :-( )
     
    Last edited by a moderator: Apr 27, 2015
    #15
  16. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    "Bug Fix" in Example 1:

    The part "The result is used at most once per frame" is changed to "The result is used by at most 1 method for at most once per frame". I didn't realize my requirements aren't accurate enough. My bad :)

    Example 6(Slightly modified version of a possibly upcoming script)

    Suppose I've to let users set the details of a window showing the current turn number, the number of frames passed in that turn and the number of frames of that turn:

    # Sets the atb clock window width and height # Its corresponding method's created under ECATB_Clock_Bar # It must return an array of positive real numbers and should return an # array of natural numbers # Example: To set the atb clock window width and height as the value of # variables with id x and y respectively, set this as # %Q(%Q([$game_variables[x], $game_variables[y]])) :ecatb_clock_window_wh => %Q(%Q([128, 56])), # Sets the atb clock window x, y and z positions # Its corresponding method's created under ECATB_Clock_Window # It must return an array of real numbers and should return an array of # integers # Example: To set the atb clock window x, y and z positions as the value # of variables with id a, b and c respectively, set this as # %Q(%Q([$game_variables[a], $game_variables, $game_variables[c]])) :ecatb_clock_window_xyz => %Q(%Q([0, 0, 0])), # Sets the turn portion bar colors # Its corresponding method's created under ECATB_Clock_Bar # It must return an array of colors # Example: To set the turn portion bar as text color c and r, g, b, a rgba # values, set this as # "[Colour.text_colour(c), Color.new(r, g, b, a)]" :ecatb_clock_bar_colors => %Q(%Q([Colour.text_colour(7), Colour.text_colour(8)])), # Sets the turn portion bar width and height # Its corresponding method's created under ECATB_Clock_Bar # It must return an array of positive real numbers and should return an # array of natural numbers # Example: To set the turn portion bar width and height as the value of # variables with id x and y respectively, set this as # %Q(%Q([$game_variables[x], $game_variables[y]])) :ecatb_clock_bar_wh => %Q(%Q([112, 16])), # Sets the turn portion bar x and y positions # Its corresponding method's created under ECATB_Clock_Bar # It must return an array of real numbers and should return an array of # integers # Example: To set the turn portion bar x and y positions as the value of # variables with id x and y respectively, set this as # %Q(%Q([$game_variables[x], $game_variables[y]])) :ecatb_clock_bar_xy => %Q(%Q([8, 8])), # Sets the way the turn portion and count text are shown # The current atb clcok can be referneced by cur_clock # The maximum atb clock can be referneced by max_clock # The turn count can be referenced by turn_count # Its corresponding method's created under ECATB_Clock_Window # It must return a string # Example: To set the way the turn portion and count text are shown as # "cur_clock/max_clock", set this as # %Q(%Q(cur_clock.to_s + "/" + max_clock.to_s)) :ecatb_clock_text => %Q(%Q(cur_clock.to_s + "/" + max_clock.to_s + " " + turn_count.to_s)), # Sets the turn portion and count text color # Its corresponding method's created under ECATB_Clock_Window # It must return a color # Example: To set the turn portion and count text color's rgba values as # r, g, b and a, set this as %Q(%Q(Color.new(r, g, b, a))) :ecatb_clock_text_color => %Q(%Q(Colour.text_colour(0))), # Sets the turn portion and count text size # Its corresponding method's created under ECATB_Clock_Window # It must return a positive real number and should return a natural number # Example: To set the turn portion and count text size as the value of # variable with id x, set this as %Q(%Q($game_variables[x])) :ecatb_clock_text_size => %Q(%Q(16)), # Sets the turn portion and count text x and y positions # Its corresponding method's created under ECATB_Clock_Window # It must return an array of real numbers and should return an array of # integers # Example: To set the turn portion and count text x and y positions as the # value of variables with id x and y respectively, set this as # %Q(%Q([$game_variables[x], $game_variables[y]])) :ecatb_clock_text_xy => %Q(%Q([0, 8]))
    Each of the above configuration value strings can return different values per frame.

    Also, the below result(the upper left window) is considered desirable(The text is "1033 / 1200 0" and its size is 16):

    [​IMG]
    While the below result(the upper left window) is considered undesirable(The text is "1016 / 1200 0" and its size is 40):

    [​IMG]
    I've to ensure that the above undesirable result can always be avoided.

    P.S.: Actually the same applies to the bar in the window, but for simplicity, let's just focus on the text in the window.

    1st algorithm - I check the window width and height against the text width, height and x and y positions, and set the formers if the text would otherwise not fit into the window, when a frame passes(strictly speaking, it should be "when the atb updates by 1 frame" instead):

      #----------------------------------------------------------------------------|  #  New method: refresh                                                       |  #  - Refreshes the atb clock window bar and text when the atb clock runs     |  #----------------------------------------------------------------------------|  # cur_clock: The number of frames passed in the current turn  # max_clock: The number of frames of the current turn  def refresh(cur_clock, max_clock)    text = ecatb_clock_text(cur_clock, max_clock, $game_troop.turn_count) # Ensures the atb clock text fits into the atb clock window self.width = [width, @last_text_x + text.width + 16].max self.height = [height, @last_text_y + contents.font.size + 16].max # # Clears the atb clock window contents and redraws the atb clock bar and text    contents.clear    @clock_bar.update_bar(cur_clock, max_clock)    draw_text(@last_text_x, @last_text_y, contents.width, contents.height, text, 0) #  end # refresh
    Performance - self.width, self.height, @last_text_x, @last_text_y and contents.font.size are read and self.width, self.height are possibly written, in addition of clearing the window contents and redrawing the bar and text.

    2nd algorithm - I don't check if the text will fit into the window myself, as the undesirable cases can be easily corrected by the users themselves:(Strictly speaking, the 2nd algorithm is actually nonexistent.)

    #----------------------------------------------------------------------------| # New method: refresh | # - Refreshes the atb clock window bar and text when the atb clock runs | #----------------------------------------------------------------------------| # cur_clock: The number of frames passed in the current turn # max_clock: The number of frames of the current turn def refresh(cur_clock, max_clock) text = ecatb_clock_text(cur_clock, max_clock, $game_troop.turn_count) # Clears the atb clock window contents and redraws the atb clock bar and text contents.clear @clock_bar.update_bar(cur_clock, max_clock) draw_text(@last_text_x, @last_text_y, contents.width, contents.height, text, 0) # end # refresh
    Performance - Only clearing the window contents and redrawing the bar and text are done.

    Performance comparison - The 2nd algorithm is clearly more performant as it obviously does less stuffs.

    How did I come up with the 1st algorithm - It's done by correcting the undesirable results myself when they can be easily corrected by the users themselves. In this case, it's easy as they just have to either increase the window width and/or height, or decrease the text size or text string length. After several such experiments, they should find values leading to desirable results.

    If, some users refuse to do even just a few trials and errors, they can still give me their current values and additional requirements. I can then test and returns some working values for them if there's any.

    If, some users even think that's too much work, I'm afraid to say that they just didn't give the minimum effort required to use the script, as I generally don't want to cater to such users.

    P.S.: Actually when you encounter similar cases, you should think of your targeting audiences. If you think that they're willing to correct the undesirable results themselves and doing so is easy, then it's likely better to leave those work to them. If that's not the case, then it may become necessary to handle the undesirable results yourselves.


    P.S.: I hope this example isn't too forced lol
     
    Last edited by a moderator: Mar 21, 2015
    #16
  17. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    Example 7

    Suppose in an atb system, the actors' atb bars always need to be shown in the status window to always correctly show their current atb value by showing the current atb bar length vs its maximum length and showing a text within the atb bar displaying the current atb fill percentage(current atb value / maximum atb value * 100%). The width, maximum length and the x and y positions of the atb bars are constants and the atb bar texts are always completely within the atb bars. For instance:

    [​IMG]

    (It also displays the atb bar type and the number of actions available for each actor, which isn't part of this example's requirements.)
    1st algorithm - I refresh the whole status window per atb update(a frame update that changes the atb values). The status window will redraw the atb bars and percentage texts per refresh:

    #------------------------------------------------------------------------------|# * Edit class: Window_BattleStatus |#------------------------------------------------------------------------------|class Window_BattleStatus < Window_Selectable #----------------------------------------------------------------------------| # Alias method: draw_item | #----------------------------------------------------------------------------| alias draw_item_atbs draw_item def draw_item(index) draw_item_atbs(index) # Added to redraw the atb bar and text draw_atb_bar(index) # end # draw_item_atbsend # Window_BattleStatus#------------------------------------------------------------------------------|# * Edit class: Window_BattleStatus |#------------------------------------------------------------------------------|class Scene_Battle < Scene_Base #----------------------------------------------------------------------------| # New method: atb_update | # - Updates the global atb clock by 1 frame | #----------------------------------------------------------------------------| def atb_update # Refreshes the status window and updates everything else other_atb_updates_1 @status_window.refresh other_atb_updates_2 # end # atb_updateend # Scene_Battle < Scene_Base
    Performance - Every content of the whole status window is cleared and then redrawn. In my tests(using Fomar0153's YanFly Compatible Customisable ATB/Stamina Based Battle System Script Version 1.4), the average fps dropped from roughly 119 to roughly 69 in my machine.

    (Of course it doesn't mean Fomar0153 used the 1st algorithm. Actually I doubt if atb system script writer will ever use it.)

    2nd algorithm - I only refresh the atb bars and percentage texts per atb update(a frame update that changes the atb values):

    #------------------------------------------------------------------------------|# * Edit class: Window_BattleStatus |#------------------------------------------------------------------------------|class Window_BattleStatus < Window_Selectable #----------------------------------------------------------------------------| # New method: refresh_atb_bars | # - Redraws all actors' atb bars and texts | #----------------------------------------------------------------------------| def refresh_atb_bars item_max.times{ |index| draw_atb_bar(index) } end # refresh_atb_barsend # Window_BattleStatus#------------------------------------------------------------------------------|# * Edit class: Window_BattleStatus |#------------------------------------------------------------------------------|class Scene_Battle < Scene_Base #----------------------------------------------------------------------------| # New method: atb_update | # - Updates the global atb clock by 1 frame | #----------------------------------------------------------------------------| def atb_update # Refreshes the status window and updates everything else other_atb_updates_1 @status_window.draw_atb_bars other_atb_updates_2 # end # atb_updateend # Scene_Battle < Scene_Base
    Performance - Only the actors' atb bars and texts are redrawn(and the old ones aren't even cleared). In my tests(using Fomar0153's YanFly Compatible Customisable ATB/Stamina Based Battle System Script Version 1.4), the average fps stays at roughly 119.

    Performance Comparison - The 2nd algorithm is clearly much, much more performant as the above data in my machine shows average fps difference between roughly 69 and roughly 119. It's unlikely that the difference will become trivial on most machines among all possible RMVXA users(perhaps except the most powerful ones).

    How did I come up with the 1st algorithm - It's done by not understanding how the status window works thoroughly enough. It's true that everything drawn on the status window becomes parts of its contents, which are, if even possible, rather hard to be partially cleared without making things really nasty, especially when texts are to be cleared. And in general, if the old contents aren't cleared right before drawing the new ones, some quite ugly results will probably come:

    [​IMG]

    (From a project given by an user asking me to do something for him/her)
    Also, if not everything is redrawn after clearing all contents, some other undesirable results will also likely come:

    [​IMG]

    (From a project given by an user asking me to do something for him/her)
    All these lead to a seemingly natural conclusion: Partial redraw is almost always extremely prone to nontrivial bugs.

    The above conclusion's actually not as true as it seems, however. If the 1st undesirable result is observed more closely, it should be noticeable that the new contents are actually drawn directly above the old ones, causing the formers to cover the latter, meaning that if the formers will always completely cover the latters, clearing the latters right before drawing the formers is no longer needed to ensure that the 1st undesirable result will never happen, which in turn implies that when only the things supposed to be redrawn are redrawn, redrawing everything else is no longer needed to ensure that the 2nd undesirable result will never happen as well.

    As the requirements state that "The width, maximum length and the x and y positions of the atb bars are constants and the atb bar texts are always completely within the atb bars", it's certain that the newly drawn atb bars will always completely cover the old atb bars and texts, causing the 2nd algorithm to be correct.

    P.S.: Actually the 2nd algorithm can be even more performant by redrawing the atb bar and text of an actor only if the new width of the former and/or the new text of latter will change if they're redrawn, as those values will almost always stay the same for a significant portion of the entire runtime and checking if those values changed is much, much cheaper. For example, if the old width of the atb bar is x pixels and the old percentage text is y%(also assuming the percentage text is always p%, p being the current atb fill percentage), and both x and y are unchanged, no redrawing needs to take place, as if they were redrawn, the result would be exactly the same as that without redrawing. It generally leads to more performant results as checking if x and y should change is rather cheap.

    P.P.S: I suspect that by separating the atb bars from the status window and using viewports to implement them instead, the result might be even more performant. But as I haven't tested that, I don't know how true it's.
     
    Last edited by a moderator: Mar 24, 2015
    #17
  18. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    Analogy to the algorithms in Example 1:

    1. Suppose this forum doesn't have notifications.

    When you want to check if x topics are updated every hour, you'll have to manually go to all those x topics per hour and see if there are new replies. Things will be even much worse if there's no timestamp.

    That's analogous to the 1st algorithm. It checks if the state array at the current frame is the same as that at the last frame, effectively checking if the state array changed per frame(unless more than 1 change occur at the same frame and the state array remain unchanged after those changes, the correctness will be different in this special case; it's like someone replied to a topic but then that reply's completely gone shortly afterwards).

    2. Suppose this forum has notifications, but not implemented for each user. Instead, a global notifications which is shared by all users is implemented(showing when there are new replies for topics). Also, no timestamp is shown on the global notifications.

    When an user checks the global notifications, all the new notifications in that global notifications will be mark as read. Then when another user checks the same global notifications, the previously new notifications are already marked as read and that user will miss them.

    That's analogous to the original 2nd algorithm. It uses a variable as a flag to mark that the state array changed when the state array changing methods are used, and that variable will be reset as false once it's used to query if the state array's changed. It works if there's only 1 method querying if the state array's changed, like the global notifications will work if only 1 user will ever use it, but when more than 1 method query if the state array's changed, the original 2nd algorithm will likely fail, like when more than 1 user will use the global notifications, it'll likely fail to serve its purpose.

    3. In reality, this forum does have user notifications, which is implemented for each user.

    When an user uses his/her notifications, all his/her new notifications will be mark as read, but all the other users' notifications will be unchanged as now the notifications are separated for each user. So when another user uses his/her notifications, all his/her notifications will still be correctly shown, thus ensuring the user notifications will serve its purpose.

    That's analogous to the expanded 2nd algorithm. It uses a hash of variables, each as a flag, to mark that the state array changed when the state array changing methods are used. Also, a key marking which method will use the flag is associated with respective variables. When those variables are used by their respective methods to query if the state array's changed, they'll be reset as false, but the rest of those variables will be unchanged. It works for multiple methods as a method's query won't affect the others' ones, like a user using his/her user notifications won't change other users' notifications.
     
    Last edited by a moderator: Apr 11, 2015
    #18
    Another Fen likes this.
  19. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    Example 8

    Suppose there's an actor/class/skill/item/weapon/armor/enemy/state notetag in a script:

    <run per frame: CODEX>Where CODEX is set in the script's user editable region:

    CODEX = "rgss3_codes"That notetag will be used by battlers having actor/class/skill/item/weapon/armor/enemy/state with that notetag, and each such battler can use each such notetag at most once per frame. The notetag values, being RGSS3 codes, need to be run under the current state(the state when using) of the battler using it(i.e., under the battler instance context). They can return different values even if nothing of the battler using it has changed.

    If a actor/class/skill/item/weapon/armor/enemy/state has more than 1 such notetags in its notebox, all those notetags should be used sequentially.

    Also, all values of notetags of a data need to be stored in that data thorough the entire game execution.

    All the notetag reading, value storing and using need to be implemented.

    1st algorithm - I use direct eval to execute the notetag values at runtime:

    class RPG::BaseItem #----------------------------------------------------------------------------| # New public instance variables | #----------------------------------------------------------------------------| attr_accessor :run_per_frame #----------------------------------------------------------------------------| # New method: load_notetags_run_per_frame | # - Loads each <run_per_frame: CODEX> notetag from its notebox | #----------------------------------------------------------------------------| def load_notetags_run_per_frame # Stores all CODEX from matching lines sequentially @run_per_frame = [] @note.split(/[\r\n]+/).each { |line| next unless line =~ /<run per frame:\s*(\w+)\s*>/ @run_per_frame.push(eval("RUN_PER_FRAME::#{$1}")) } # end # load_notetags_run_per_frameend # RPG::BaseItem
    Code:
    class Game_Battler < Game_BattlerBase  #----------------------------------------------------------------------------|  #  New method: exec_run_per_frame                                            |  #  - Executes each run per frame notetag value for the passed data           |  #----------------------------------------------------------------------------|  # data: The data having the notetag to be executed  def exec_run_per_frame(data)    # Executes all CODEX sequentially    data.run_per_frame.each { |rpf| eval(rpf) }    #  end # exec_run_per_frameend # Game_Battler
    Performance - A string's used to store a CODEX, which is executed using direct eval. My informal benchmark shows the time performance of direct eval in my machine:

    class Test def initialize @test = "test" end def control block = -> x { eval("@test") } p("eval") 10_000_000.times(&block) p("eval") end # Roughly 66 secondsend
    2nd algorithm - I use lambda with instance_exec to execute the notetag values at runtime:

    class RPG::BaseItem #----------------------------------------------------------------------------| # New public instance variables | #----------------------------------------------------------------------------| attr_accessor :run_per_frame #----------------------------------------------------------------------------| # New method: load_notetags_run_per_frame | # - Loads each <run_per_frame: CODEX> notetag from its notebox | #----------------------------------------------------------------------------| def load_notetags_run_per_frame # Stores all CODEX from matching lines sequentially @run_per_frame = [] @note.split(/[\r\n]+/).each { |line| next unless line =~ /<run per frame:\s*(\w+)\s*>/ @run_per_frame.push(eval("-> battler { battler.instance_exec { #{eval("RUN_PER_FRAME::#{$1}")} } }")) } # end # load_notetags_run_per_frameend # RPG::BaseItem
    Code:
    class Game_Battler < Game_BattlerBase  #----------------------------------------------------------------------------|  #  New method: exec_run_per_frame                                            |  #  - Executes each run per frame notetag value for the passed data           |  #----------------------------------------------------------------------------|  # data: The data having the notetag to be executed  def exec_run_per_frame(data)    # Executes all CODEX sequentially    data.run_per_frame.each { |rpf| rpf.call(self) }    #  end # exec_run_per_frameend # Game_Battler
    Performance - A lambda's used to store a CODEX, which is executed via calling its associated lambda with instance_exec. My informal benchmark shows the time performance of lambda with instance_exec in my machine:

    class Test def initialize @test = "test" endendtest = Test.newblock1 = -> test { test.instance_exec{ @test } }block2 = -> x { block1.call(test) }p("lambda instance_exec")100_000_000.times(&block2)p("lambda instance_exec")# Roughly 81 seconds
    Performance Comparison - Storing codes in lambdas costs more memory than storing them in strings, but executing codes in lambdas is much, much faster than evaluating codes in strings. So the 1st algorithm costs less memory but have worse time performance, while the 2nd algorithm costs more memory but have better time performance. Normally, time performance's much, much more important than memory usage when talking about using RMVXA in most modern computers. This is especially the case when the notetag values are likely frequently used(like once per frame). As each notetag can be executed at most once per frame, time performance severely outclasses memory usage here, causing the 2nd algorithm to be much, much more performant than the 1st one. In some extreme test cases of mine, the average fps boosted by roughly 7(by roughly 88 to roughly 95) by using the 2nd algorithm instead of the 1st one.

    How did I come up with the 1st algorithm - It's done by not understanding the power of using lambda with instance_exec. While I know both lambda and instance_exec individually, I never even think of combining them. While I know eval is extremely unperformant in terms of time, I thought it was necessary evil as CODEX needs to be executed under the current state of the battler using it. Without instance_exec or similar tricks, using lambda will suffer from the binding issues. While I found another way to get rid of those binding issues, it ends up asking users to do quite some extra works, and I think the increased time performance isn't worth that. But after learning the combination of lambda and instance_exec, I realized the only cost it brings is the increased memory usage for storing the lambdas, and in this case, it's severely outweighed by the increased time performance.


    Example 9(Closely related to Example 8)

    The situation is the same as Example 8, except that all CODEX will only be run at most once per battle.

    1st algorithm - Same as the 2nd algorithm in Example 8

    Performance - Same as that of the 2nd algorithm in Example 8

    2nd algorithm - Same as the 1st algorithm in Example 8

    Performance - Same as that of the 1st algorithm in Example 8

    Performance Comparison - Similar to that of Example 8, but in this case, memory usage will likely be at least slightly more important than time performance, as now all CODEX will only be run at most once per battle. Even if time performance were much, much more important than memory usage here, it still doesn't mean I don't need to worry about the latter at all. When there are lots of notetags(like 1000+), if lambdas are used, lots of lambdas will be stored thorough the entire game execution, probably making the increase of memory usage nontrivial. On the other hand, even if such many notetags need to be used at the start of a battle, if direct eval is used, the time spent might not even rival a single @status_window.refresh call, which is even hardly noticeable in terms of time spent at the start of a battle. This causes the 2nd algorithm to be at least slightly more performant than the 1st one.

    How did I come up with the 1st algorithm - It's done by blindly treating the 2nd algorithm in Example 8 as a sliver bullet without even trying to digest the performance comparison in Example 8 first. It isn't golden rules that can be applied to anywhere at anytime, but is specific to the situation in Example 8. When the situation changes, the performance comparison will likely change too. Now the situation changed from "each such battler can use each such notetag at most once per frame" to "all CODEX will only be run at most once per battle", causing the performance comparison in Example 8 to fail here, as it doesn't claim that time performance's always much, much more important than memory usage when talking about using RMVXA in most modern computers. It just claims that it's normally the case and is especially the case when the notetag values are likely frequently used. In fact, the more frequent the notetag values are used, the more true the performance comparison in Example 8 is, and vice versa.
    Further reading for Example 8 and Example 9:

    Some ways to store RGSS3 code strings in notetags -

    http://forums.rpgmakerweb.com/index.php?/topic/39250-some-ways-to-store-rgss3-code-strings-in-notetags/
     
    Last edited by a moderator: May 6, 2015
    #19
  20. DoubleX

    DoubleX Just a nameless weakling Veteran

    Messages:
    1,462
    Likes Received:
    542
    First Language:
    Chinese
    Primarily Uses:
    N/A
    Memory performance comparisons for Example 8 and Example 9:

    The below is my informal benchmark testing the memory usage of storng lambdas of code strings vs storing nothing vs storing code strings in my machine(i7-3820 + 2 * 2GB DDR3 1333):

    test = []#~ block = -> x { nil } # Control#~ block = -> x { test << -> { nil } } # Lambda#~ block = -> x { test << "nil" } # String1_000_000.times(&block)p("block")block = -> x { nil }100_000_000.times(&block)p("block")# Control: Roughly 39MB Memory# Lambda: Roughly 207MB Memory# String: Roughly 62MB Memory
    It shows that in my machine, storing the code string "nil" directly and using lambda costs roughly (62 - 39)MB / 1,000,000 = 23B and (207 - 39)MB / 1,000,000 =176B respectively. Their difference is roughly 176B - 23B = 153B.

    I've also tried to use several different code string with varying length and complexities in my machine, and the memory usage difference between the former and the latter stays roughly the same.

    This might suggest that regardless of the code string itself, that memory usage difference is roughly 153B in my machine.

    While storing 1,000,000 lambdas are clearly unrealistic, both of the aforementioned memory performances should still be noted.
     
    Last edited by a moderator: May 15, 2015
    #20

Share This Page