-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
context refactoring #135
Comments
units: Sec |
class Allocator {
protected: Allocator::Allocator() { template template S_TENSOR Allocator::getPtr(TName _name) {
still going:
|
Some thoughts:
|
S_TENSOR is heavier than Tensor *. In order to reduce memory usage, should we consider to deprecate it? |
|
@Knight-X can you explain your thought process for |
I always imagined the MemAllocators as separate entities. Also things get really convenient if we define our own TensorId type instead of just strings: class DefaultUtensorAllocator;
class TensorId {
/** Some hashable tensor lookup metadata **/
public:
...
void * where() { return loc; }
// Rest of interface
uint32_t operator (uint32_t) () { return hash(); }
...
protected:
virtual uint32_t hash() = 0;
private:
void * _loc;
};
// may not need this but makes life convenient
template <typename T>
class uBlock {
public:
uBlock(TensorShape shape) {
mini_buffer.reserve(shape.linear_space);
}
//This class just exposes a read write interface to cache blocks of a Tensor
private:
std::vector<T> mini_buffer;
};
// Add additional checks so this never gets created on the stack
template <typename Allocator=DefaultUtensorAllocator>
class Tensor : public uTensor {
...
TensorId* tid;
public:
/**
Use Tensor construction to register with the context class as this information is relatively small
Only a handful of tensors in a graph
*/
Tensor(TensorId* tid) : tid(tid) {
Context::register_tensor(tid, this); //This can be reverse lookuped if necessary (returns a TensorId);
}
// Override new and delete so we can control where all tensor
void* operator new(size_t size) {
void* p = Allocator::allocate(size);
return p;
}
void operator delete(void* ptr) {
Allocator::deallocate(ptr);
}
protected:
void associate_data(void* data){ this->data = data; }
};
// Easy peasy
class RomTensorId {
public:
RomTensorId(void* data) { where() = data; }
uint32_t hash() { return (uint32_t) where(); }
};
class RomTensor : public uTensor {
RomTensor() {} // TensorId must be specified for RomTensors
public:
RomTensor(RomTensorId id, TensorShape shape) : ::Tensor(&id, shape) {
//Rom is easy sinced the TensorId _loc points to an address directly
associate_data(id.where());
}
...
};
class RamTensorId {
int id;
public:
RamTensorId(int id) : id(id) { where() = &id; }
uint32_t hash() { return (uint32_t) id; }
};
//Essentially Rom tensor except has RAM allocation bits
class RamTensor : public uTensor {
private:
public:
RamTensor(RamTensorId, TensorShape shape) : ::Tensor(&id, shape) {
associate_data(uTensorRamAllocator::allocate(id.hash(), shape.linear_space)); // uTensorRamAllocator returns a pointer to the data field associated with the tensor info
}
~RamTensor() {
uTensorRamAllocator::deallocate(id->hash());
};
/**
Really this class doesnt even need to be this smart
*/
class DefaultUtensorAllocator {
public:
static void* allocate(size_t size) { //be dumb for now, but really should allocate based on requested size
void* p = mem_cache.insert();
}
static void deallocate( void * key){
mem_cache.remove(key);
}
private:
/** modified hash returns key on insert */
FixedEntryHeap<sizeof(Tensor), NUM_HEAP_ENTRIES> mem_cache; //Lookup Tensor*
}; |
Allocator maintains a bunch of memory buffer to allocate tensor data first and leaves the space for storing tensor object for optimization. During the codegen process, the memory offset is calculated and assigned to allocator to make buffer usage as efficient as possible. After a lifetime of a tensor, the allocator will make the data buffer free for other tensors. We just cache the tensor object itself now, but it also could be allocated from memory buffer for further optimization. |
Memory Arrangement: a customized new/delete function. Essentially, pre-allocate a block of memory during the context initialization. The memory will be used for caching ops and memory management of tensor as an attempt to reduce the overhead associated with new/delete of the current approach. Moreover, we can arrange memory precisely because of known memory access pattern.
rewrite context class:
Instead of allocating memory for op everytime, we would just record the name of op and allocate the memory once or in lazy way.
using op_table to register the class
remove rtable
In the push function, we need to check whether the op is allocated or allocate the op.
The text was updated successfully, but these errors were encountered: