Skip to content

Function is 3X slower than regular JS #1184

@nischayv

Description

@nischayv

The following function is roughly 3x slower for me compared to regular js for large values. Am I doing something wrong or is there something I can do to speed it up? Any kind of help is appreciated!

function map_page_rank(pages: Array<Int32Array>, page_ranks: Float64Array, maps: Array<Float64Array>, noutlinks: Int32Array, n: i32): void {
  const t1 = performanceNow();
  for (let i=0; i < n; ++i) {
    const outbound_rank = unchecked(page_ranks[i])/unchecked(noutlinks[i]);
    for(let j=0; j < n; ++j) {
      if (unchecked(pages[i][j]) === 0) {
        unchecked(maps[i][j] = 0);
      } else {
        unchecked(maps[i][j] = unchecked(pages[i][j])*outbound_rank);
      }
    }
  }
  const t2 = performanceNow();
  consoleLog(((t2 - t1)/1000).toString());
}

Activity

jtenner

jtenner commented on Mar 25, 2020

@jtenner
Contributor

Using unchecked() will get you some of the way, but it does appear that JS is good at doing JS things.

You can probably speed up your code a lot by using StaticArray<T> instead of Array<T>, Int32Array and Float64Array

If pages must be resized, you will need regular Array, instead.

jtenner

jtenner commented on Mar 25, 2020

@jtenner
Contributor
  for (let i=0; i < n; ++i) {
    let outbound_rank = unchecked(page_ranks[i])/unchecked(noutlinks[i]);
    let map = unchecked(maps[i]); // this will speed you up in the inner for loop
dcodeIO

dcodeIO commented on Mar 25, 2020

@dcodeIO
Member

Would imagine two reasons for the slowdown there: One is the level of indirection of normal and typed arrays, as mentioned by jtenner (using StaticArray can help there), and the other is reference counting overhead inside the loop. For instance let map = unchecked(maps[i]) will retain a reference to the i'th Float64Array within maps, use it and release it again when continuing or breaking. In general it's fair to say that AssemblyScript isn't super fast when it comes to idiomatic TypeScript code like in this snippet and will require two things to become better at it: Wasm GC, so we can get rid of reference counting overhead, and an optimization pass to directize array accesses (for example by representing arrays as multi-value structs, with one value being a direct pointer to the backing buffer).

nischayv

nischayv commented on Mar 25, 2020

@nischayv
Author

Thanks this helped a lot! let map = unchecked(maps[i]) had a much more significant impact on the performance than StaticArray, but both improved the performance!

nischayv

nischayv commented on Mar 25, 2020

@nischayv
Author

I had a follow up question with a similar issue. I'm doing a transpose of a matrix, I need to access the j'th element in the inner loop like matrix[j][i], so I can't maintain a reference for the inner loop like the above situation. Is there anything else that can be done to speed it up? Exact code is below. I'm working on building some benchmarks to see how Assemblyscript compiled wasm performs in comparison to js for scientific programming.

class PolarArray {
  constructor(
    public r: Float64Array,
    public i: Float64Array
  ) { }
}

function transpose(m: StaticArray<PolarArray>): void {
  let tempr: f64, tempi: f64;
  const N = m.length;
  
  for(let i = 0; i < N; ++i){
    let r_array = unchecked(m[i].r);
    let i_array = unchecked(m[i].i);

    for(let j = 0; j < i; ++j){
      tempr = unchecked(r_array[j]);
      tempi = unchecked(i_array[j]);

      unchecked(r_array[j] = m[j].r[i]);
      unchecked(i_array[j] = m[j].i[i]);

      unchecked(m[j].r[i] = tempr);
      unchecked(m[j].i[i] = tempi);
    }
  }
}
dcodeIO

dcodeIO commented on Mar 25, 2020

@dcodeIO
Member

Looks like the ideal thing to to there is to map the multi dimensional StaticArray<PolarArray> to a one dimensional StaticArray<Float64Array> with two subsequent elements per entry.

MaxGraey

MaxGraey commented on Mar 26, 2020

@MaxGraey
Member

also try cache all matrix elements:

function transpose(m: StaticArray<PolarArray>): void {
  let tempr: f64, tempi: f64;
  const N = m.length;
  
  for(let i = 0; i < N; ++i) {
    let mi   = unchecked(m[i]);
    let mi_r = mi.r;
    let mi_i = mi.i;

    for(let j = 0; j < i; ++j) {
      let mj   = unchecked(m[j]);
      let mj_r = mj.r;
      let mj_i = mj.i;

      tempr = unchecked(mi_r[j]);
      tempi = unchecked(mi_i[j]);

      unchecked(mi_r[j] = mj_r[i]);
      unchecked(mi_i[j] = mj_i[i]);

      unchecked(mj_r[i] = tempr);
      unchecked(mj_i[i] = tempi);
    }
  }
}
nischayv

nischayv commented on Apr 13, 2020

@nischayv
Author

Caching all elements helped a bit, but the best solution is probably turning it into a 1D array. Before I close the issue, @dcodeIO could you explain this statement a bit more - an optimization pass to directize array accesses (for example by representing arrays as multi-value structs, with one value being a direct pointer to the backing buffer).? How are arrays being represented currently then?

Also regarding the reference counting overhead just to be clear, is the reference to the ith element released after each iteration of j? Appreciate the help!

dcodeIO

dcodeIO commented on Apr 13, 2020

@dcodeIO
Member

How are arrays being represented currently then?

Unless using StaticArray, the representation currently involves one level of indirection when accessing an array element. This leads to two loads, Array#buffer first, then loading the element from that buffer, on each access. Lowering the array struct to multiple values (one we have multi value support), passing these around instead of just a pointer to the array, might help to get rid of indirection, but whether this also has a caveat in actual Wasm engines needs to be investigated.

is the reference to the ith element released after each iteration of j?

I might not be understanding the question fully, but references are not re-retained when entering an inner code block. So, what's referenced in the outer block remains retained during execution of the inner block. Perhaps looking at the generated text format will give a few hints what's exactly happening there. Instructions to look out for are calls to __retain and __release, modifying reference counts. There are usually fewer of those after optimizations, because there is a pass eliminating unnecessary ones.

nischayv

nischayv commented on Apr 16, 2020

@nischayv
Author

I just misunderstood. I looked at the array code and it makes sense now. Thanks for the help! Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @dcodeIO@MaxGraey@jtenner@nischayv

        Issue actions

          Function is 3X slower than regular JS · Issue #1184 · AssemblyScript/assemblyscript