Slow performance on python wrapper #275

brupelo · 2020-05-26T17:50:08Z

Let's start by considering this snippet:

import time
from pathlib import Path

import tinyobjloader


def load_obj(filename):
    path = Path(filename).resolve()
    start = time.time()
    reader = tinyobjloader.ObjReader()
    if not reader.ParseFromFile(str(path)):
        raise Exception(f"Problem loading '{path}'")
    attrib = reader.GetAttrib()
    start = time.time()
    num_vertices = 0
    num_normals = 0
    num_texcoords = 0

    for shape in reader.GetShapes():
        num_vertices += len(attrib.vertices)
        num_normals += len(attrib.normals)
        num_texcoords += len(attrib.texcoords)

        for f in shape.mesh.indices:
            attrib.vertices[f.vertex_index * 3 + 0]
            attrib.vertices[f.vertex_index * 3 + 1]
            attrib.vertices[f.vertex_index * 3 + 2]
            attrib.normals[f.normal_index * 3 + 0]
            attrib.normals[f.normal_index * 3 + 1]
            attrib.normals[f.normal_index * 3 + 2]
            attrib.texcoords[f.texcoord_index * 2 + 0]
            attrib.texcoords[f.texcoord_index * 2 + 1]

    print(
        f"num_vertices={num_vertices} num_normals={num_normals} num_texcoords={num_texcoords}"
    )
    print(time.time() - start)


if __name__ == "__main__":
    load_obj(r"mcve.obj")

And here's the link to both mcve.obj and mcve.mtl, this object is a very small & trivial scene, that looks like this:

Problem being... if you run that snippet you'll get this output:

num_vertices=36150 num_normals=38400 num_texcoords=27950
13.565775871276855

That's really slow & unusable for such a trivial scene. I wanted to san miguel's scene from https://casual-effects.com/data/ but after seeing these poor results on such a trivial scene I'm not even considering it till I figure out how to proceed here.

So yeah, the main question would be, how could we use the tinyobjloader effectively on python so we can test real-world scenes?

Thanks in advance.

Ps. I've got allocated a huge chunk of memory for interleaved data (vertices+normals+texcoords) so I'm not sure how difficult would be providing a wrapper to make a fast memcpy or similar... Main problem seems to be the whole overhead of wrapping c++ to python

The text was updated successfully, but these errors were encountered:

syoyo · 2020-05-26T18:29:25Z

For some reason, attrib array access(e.g. attrib.vertices[f.vertex_index * 3 + 0]) is super slow in python.

Parsing mcve.obj is enougly fast. 0.004 secs on Ryzen 3900X, and without attrib array access whole processing time can be reduced to 0.01 secs.

Parse time: 0.004872322082519531
num_vertices=36150 num_normals=38400 num_texcoords=27950
0.011868000030517578

Please try numpy_*** interface. This exposes raw memory buffer to python world.

tinyobjloader/python/sample.py

Line 76 in 4cb4af6

print("numpy_indices = {}".format(shape.mesh.numpy_indices()))

NOTE: numpy API is only available in 2.0.0rc5 or later version(pip install tinyobjloader==2.0.0rc5)

paulmelnikow · 2020-05-26T18:44:40Z

Yea, agreed, the numpy_ methods are the way to go. They wrap the C++ arrays with Python objects.

This is the loader code I'm using. It's not super elegant yet, and I imagine we could improve this interface some, but it is very fast.

https://github.com/lace/lacecore/blob/f1c6b549201b03ac3f577b65ef0571b13d68a1cb/lacecore/_obj/loader.py#L25-L84

brupelo · 2020-05-26T22:28:04Z

Hey guys, thanks for your fast response... yeah, I'll give it a shot to numpy as that seems a much faster solution.

@paulmelnikow But... is that code of yours dealing also with normals and texcoords or just with positions? (numpy newbie here)

I'm still trying to figure out how to create my vertices & faces... The numpy interface seems to be a bit trickier than the good old python lists :)

paulmelnikow · 2020-05-26T22:49:30Z

The positions are in tinyobj_vertices (i.e. attrib.numpy_vertices().reshape(-1, 3)). I am working with the faces too. Though not the normals and texcoords.

Yea, the performance gains from NumPy can be really huge, though it definitely has a bit of a learning curve.

brupelo · 2020-05-27T07:45:42Z

Ok guys, I think this issue has already become clear and we can close it, can't we? Only thing I'd would probably add, while numpy performance seems to be great in comparison to vanilla python it's adding an extra dependency that may or may not be convenient for some people. For example, let's say you want to deploy your python app using tinyobjloader using nuitka, by adding numpy you'll be probably adding a lot of overhead to the compilation (few months ago when checked compiling numpy was like ~20min)

Anyway, not sure how difficult would be getting rid of such dependency and providing a direct access to something like...

struct Vertex {
    bool operator==(const Vertex &rhs) const {
        return (position == rhs.position && normal == rhs.normal &&
                texcoord_0 == rhs.texcoord_0 && tangent == rhs.tangent &&
                bitangent == rhs.bitangent);
    }

    vec3 position;
    vec3 normal;
    vec2 texcoord_0;
    vec3 tangent;
    vec3 bitangent;
};

struct Vertices {
    Vertices(int _num_vertices) {
        num_vertices = _num_vertices;
        _data = new float[num_vertices*VERTEX_NUM_FLOATS];
        _vertex = reinterpret_cast<Vertex*>(_data);
    }

    ~Vertices() {
        delete[] _data;
    }

    Vertex* vertex(int i) {
        return &_vertex[i];
    }

    float* _data;
    int num_vertices;
    Vertex* _vertex;
};

Said otherwise, direct access to the filled parsed data without any intermediate layer.. Of course, that's up to you to think about.

To me, this has become quite clear. Although figuring out how to use the numpy accessors isn't very easy for users.

Thank you very much.

paulmelnikow · 2020-05-27T18:44:32Z

Out of curiosity, have you tried assigning attrib.vertices, attrib.normals, etc. to variables in the outer for loop?

I'm wondering if the for loop is what's making your code slow or if it's that you need to use byte arrays rather than native python data structures.

What you're proposing is an expanded layer of convenience accessors in C++, along with Python bindings, which is possible. Another approach, which is probably less to maintain, could be to provide the byte arrays to Python (even if you're not using NumPy).

syoyo · 2021-01-02T12:05:51Z

I am revisiting this issue and found assigning variable in the outer loop solved the issue(execution time become less than 0.01 secs)

    vertices = attrib.vertices # <- Make things fast!

    for shape in reader.GetShapes():
        num_vertices += len(attrib.vertices)
        num_normals += len(attrib.normals)
        num_texcoords += len(attrib.texcoords)

        for f in shape.mesh.indices:
            v = vertices[f.vertex_index * 3 + 0]

It looks pybind always create a python object(?) of attrib_t for each variable lookup.
But this behavior is a bit confusing for users.

Another way is providing direct array indexing access API which also gives enough performance(less than 0.01 secs)

  py::class_<attrib_t>(tobj_module, "attrib_t")
    .def(py::init<>())
    .def_readonly("vertices", &attrib_t::vertices)
    .def("vertices_at", [](attrib_t &instance, ssize_t idx) {
      // TODO: Return triple(x, y, z)
      return instance.vertices[idx];
    })

...

v = attrib.vertices_at(0)

...

I'd like to go with this API(along with numpy_vertices API). @paulmelnikow How do you think?

syoyo · 2021-01-02T12:14:50Z

could be to provide the byte arrays to Python

@paulmelnikow FYI I have tried bytearray and Python buffer protocol(https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html#buffer-protocol) but found both are not suited for tinyobj usecase(it simply exposes memory object to Python world so the user must know the buffer layout of each data to access the element in Python world)

syoyo added enhancement question labels May 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow performance on python wrapper #275

Slow performance on python wrapper #275

brupelo commented May 26, 2020

syoyo commented May 26, 2020 •

edited

paulmelnikow commented May 26, 2020

brupelo commented May 26, 2020

paulmelnikow commented May 26, 2020

brupelo commented May 27, 2020

paulmelnikow commented May 27, 2020

syoyo commented Jan 2, 2021

syoyo commented Jan 2, 2021

Slow performance on python wrapper #275

Slow performance on python wrapper #275

Comments

brupelo commented May 26, 2020

syoyo commented May 26, 2020 • edited

paulmelnikow commented May 26, 2020

brupelo commented May 26, 2020

paulmelnikow commented May 26, 2020

brupelo commented May 27, 2020

paulmelnikow commented May 27, 2020

syoyo commented Jan 2, 2021

syoyo commented Jan 2, 2021

syoyo commented May 26, 2020 •

edited