Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance on python wrapper #275

Open
brupelo opened this issue May 26, 2020 · 8 comments
Open

Slow performance on python wrapper #275

brupelo opened this issue May 26, 2020 · 8 comments

Comments

@brupelo
Copy link

brupelo commented May 26, 2020

Let's start by considering this snippet:

import time
from pathlib import Path

import tinyobjloader


def load_obj(filename):
    path = Path(filename).resolve()
    start = time.time()
    reader = tinyobjloader.ObjReader()
    if not reader.ParseFromFile(str(path)):
        raise Exception(f"Problem loading '{path}'")
    attrib = reader.GetAttrib()
    start = time.time()
    num_vertices = 0
    num_normals = 0
    num_texcoords = 0

    for shape in reader.GetShapes():
        num_vertices += len(attrib.vertices)
        num_normals += len(attrib.normals)
        num_texcoords += len(attrib.texcoords)

        for f in shape.mesh.indices:
            attrib.vertices[f.vertex_index * 3 + 0]
            attrib.vertices[f.vertex_index * 3 + 1]
            attrib.vertices[f.vertex_index * 3 + 2]
            attrib.normals[f.normal_index * 3 + 0]
            attrib.normals[f.normal_index * 3 + 1]
            attrib.normals[f.normal_index * 3 + 2]
            attrib.texcoords[f.texcoord_index * 2 + 0]
            attrib.texcoords[f.texcoord_index * 2 + 1]

    print(
        f"num_vertices={num_vertices} num_normals={num_normals} num_texcoords={num_texcoords}"
    )
    print(time.time() - start)


if __name__ == "__main__":
    load_obj(r"mcve.obj")

And here's the link to both mcve.obj and mcve.mtl, this object is a very small & trivial scene, that looks like this:

showcase

Problem being... if you run that snippet you'll get this output:

num_vertices=36150 num_normals=38400 num_texcoords=27950
13.565775871276855

That's really slow & unusable for such a trivial scene. I wanted to san miguel's scene from https://casual-effects.com/data/ but after seeing these poor results on such a trivial scene I'm not even considering it till I figure out how to proceed here.

So yeah, the main question would be, how could we use the tinyobjloader effectively on python so we can test real-world scenes?

Thanks in advance.

Ps. I've got allocated a huge chunk of memory for interleaved data (vertices+normals+texcoords) so I'm not sure how difficult would be providing a wrapper to make a fast memcpy or similar... Main problem seems to be the whole overhead of wrapping c++ to python

@syoyo
Copy link
Collaborator

syoyo commented May 26, 2020

For some reason, attrib array access(e.g. attrib.vertices[f.vertex_index * 3 + 0]) is super slow in python.

Parsing mcve.obj is enougly fast. 0.004 secs on Ryzen 3900X, and without attrib array access whole processing time can be reduced to 0.01 secs.

Parse time: 0.004872322082519531
num_vertices=36150 num_normals=38400 num_texcoords=27950
0.011868000030517578

Please try numpy_*** interface. This exposes raw memory buffer to python world.

print("numpy_indices = {}".format(shape.mesh.numpy_indices()))

NOTE: numpy API is only available in 2.0.0rc5 or later version(pip install tinyobjloader==2.0.0rc5)

@paulmelnikow
Copy link
Contributor

Yea, agreed, the numpy_ methods are the way to go. They wrap the C++ arrays with Python objects.

This is the loader code I'm using. It's not super elegant yet, and I imagine we could improve this interface some, but it is very fast.

https://github.com/lace/lacecore/blob/f1c6b549201b03ac3f577b65ef0571b13d68a1cb/lacecore/_obj/loader.py#L25-L84

@brupelo
Copy link
Author

brupelo commented May 26, 2020

Hey guys, thanks for your fast response... yeah, I'll give it a shot to numpy as that seems a much faster solution.

@paulmelnikow But... is that code of yours dealing also with normals and texcoords or just with positions? (numpy newbie here)

I'm still trying to figure out how to create my vertices & faces... The numpy interface seems to be a bit trickier than the good old python lists :)

@paulmelnikow
Copy link
Contributor

The positions are in tinyobj_vertices (i.e. attrib.numpy_vertices().reshape(-1, 3)). I am working with the faces too. Though not the normals and texcoords.

Yea, the performance gains from NumPy can be really huge, though it definitely has a bit of a learning curve.

@brupelo
Copy link
Author

brupelo commented May 27, 2020

Ok guys, I think this issue has already become clear and we can close it, can't we? Only thing I'd would probably add, while numpy performance seems to be great in comparison to vanilla python it's adding an extra dependency that may or may not be convenient for some people. For example, let's say you want to deploy your python app using tinyobjloader using nuitka, by adding numpy you'll be probably adding a lot of overhead to the compilation (few months ago when checked compiling numpy was like ~20min)

Anyway, not sure how difficult would be getting rid of such dependency and providing a direct access to something like...

struct Vertex {
    bool operator==(const Vertex &rhs) const {
        return (position == rhs.position && normal == rhs.normal &&
                texcoord_0 == rhs.texcoord_0 && tangent == rhs.tangent &&
                bitangent == rhs.bitangent);
    }

    vec3 position;
    vec3 normal;
    vec2 texcoord_0;
    vec3 tangent;
    vec3 bitangent;
};

struct Vertices {
    Vertices(int _num_vertices) {
        num_vertices = _num_vertices;
        _data = new float[num_vertices*VERTEX_NUM_FLOATS];
        _vertex = reinterpret_cast<Vertex*>(_data);
    }

    ~Vertices() {
        delete[] _data;
    }

    Vertex* vertex(int i) {
        return &_vertex[i];
    }

    float* _data;
    int num_vertices;
    Vertex* _vertex;
};

Said otherwise, direct access to the filled parsed data without any intermediate layer.. Of course, that's up to you to think about.

To me, this has become quite clear. Although figuring out how to use the numpy accessors isn't very easy for users.

Thank you very much.

@paulmelnikow
Copy link
Contributor

Out of curiosity, have you tried assigning attrib.vertices, attrib.normals, etc. to variables in the outer for loop?

I'm wondering if the for loop is what's making your code slow or if it's that you need to use byte arrays rather than native python data structures.

What you're proposing is an expanded layer of convenience accessors in C++, along with Python bindings, which is possible. Another approach, which is probably less to maintain, could be to provide the byte arrays to Python (even if you're not using NumPy).

@syoyo
Copy link
Collaborator

syoyo commented Jan 2, 2021

I am revisiting this issue and found assigning variable in the outer loop solved the issue(execution time become less than 0.01 secs)

    vertices = attrib.vertices # <- Make things fast!

    for shape in reader.GetShapes():
        num_vertices += len(attrib.vertices)
        num_normals += len(attrib.normals)
        num_texcoords += len(attrib.texcoords)

        for f in shape.mesh.indices:
            v = vertices[f.vertex_index * 3 + 0]

It looks pybind always create a python object(?) of attrib_t for each variable lookup.
But this behavior is a bit confusing for users.

Another way is providing direct array indexing access API which also gives enough performance(less than 0.01 secs)

  py::class_<attrib_t>(tobj_module, "attrib_t")
    .def(py::init<>())
    .def_readonly("vertices", &attrib_t::vertices)
    .def("vertices_at", [](attrib_t &instance, ssize_t idx) {
      // TODO: Return triple(x, y, z)
      return instance.vertices[idx];
    })
...

v = attrib.vertices_at(0)

...

I'd like to go with this API(along with numpy_vertices API). @paulmelnikow How do you think?

@syoyo
Copy link
Collaborator

syoyo commented Jan 2, 2021

could be to provide the byte arrays to Python

@paulmelnikow FYI I have tried bytearray and Python buffer protocol(https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html#buffer-protocol) but found both are not suited for tinyobj usecase(it simply exposes memory object to Python world so the user must know the buffer layout of each data to access the element in Python world)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants