Memory error with large network - centralities #52

alberto-bracci · 2019-11-11T16:21:12Z

Hi,

I am just starting with Teneto. Installed with pip on anaconda - windows.
I am trying to load the temporal network from here

I put a line "i,j,t" at the beginning of the file, then loaded it with pandas as dataframe and used:
teneto.TemporalNetwork(from_df=dataframe) but I receive a memory error:

emoryError Traceback (most recent call last)
in
----> 1 tnet2 = tnet.TemporalNetwork(from_edgelist=[list(d) for d in D.values])

C:\ProgramData\Anaconda3\lib\site-packages\teneto\classes\network.py in init(self, N, T, nettype, from_df, from_array, from_dict, from_edgelist, timetype, diagonal, timeunit, desc, starttime, nodelabels, timelabels, hdf5, hdf5path, forcesparse)
131 self.network_from_df(from_df)
132 if from_edgelist is not None:
--> 133 self.network_from_edgelist(from_edgelist)
134 elif from_array is not None:
135 self.network_from_array(from_array, forcesparse=forcesparse)

C:\ProgramData\Anaconda3\lib\site-packages\teneto\classes\network.py in network_from_edgelist(self, edgelist)
257 colnames = ['i', 'j', 't']
258 self.network = pd.DataFrame(edgelist, columns=colnames)
--> 259 self._update_network()
260
261 def network_from_dict(self, contact):

C:\ProgramData\Anaconda3\lib\site-packages\teneto\classes\network.py in _update_network(self)
220 """
221 self._calc_netshape()
--> 222 self._set_nettype()
223 if self.nettype:
224 if self.nettype[1] == 'u':

C:\ProgramData\Anaconda3\lib\site-packages\teneto\classes\network.py in _set_nettype(self)
172 self.nettype = 'xu'
173 G1 = teneto.utils.df_to_array(
--> 174 self.network, self.netshape, self.nettype)
175 self.nettype = 'xd'
176 G2 = teneto.utils.df_to_array(

C:\ProgramData\Anaconda3\lib\site-packages\teneto\utils\utils.py in df_to_array(df, netshape, nettype)
749 if len(df) > 0:
750 idx = np.array(list(map(list, df.values)))
--> 751 G = np.zeros([netshape[0], netshape[0], netshape[1]])
752 if idx.shape[1] == 3:
753 if nettype[-1] == 'u':

MemoryError:

Am I doing something wrong or maybe this representation cannot handle large networks?

wiheto · 2019-11-11T17:18:58Z

Try adding the argument `forcedense=False`. At the moment it is trying to create a numpy array for your network. This will make sure a HDF5 representation is created and should be fine. if that doesn’t work, could you tell me how big your network is.

…

11 nov. 2019 kl. 08:21 skrev Alberto Bracci ***@***.***>: Hi, I am just starting with Teneto. Installed with pip on anaconda - windows. I am trying to load the temporal network from here I put a line "i,j,t" at the beginning of the file, then loaded it with pandas as dataframe and used: teneto.TemporalNetwork(from_df=dataframe) but I receive a memory error: emoryError Traceback (most recent call last) in ----> 1 tnet2 = tnet.TemporalNetwork(from_edgelist=[list(d) for d in D.values]) C:\ProgramData\Anaconda3\lib\site-packages\teneto\classes\network.py in init(self, N, T, nettype, from_df, from_array, from_dict, from_edgelist, timetype, diagonal, timeunit, desc, starttime, nodelabels, timelabels, hdf5, hdf5path, forcesparse) 131 self.network_from_df(from_df) 132 if from_edgelist is not None: --> 133 self.network_from_edgelist(from_edgelist) 134 elif from_array is not None: 135 self.network_from_array(from_array, forcesparse=forcesparse) C:\ProgramData\Anaconda3\lib\site-packages\teneto\classes\network.py in network_from_edgelist(self, edgelist) 257 colnames = ['i', 'j', 't'] 258 self.network = pd.DataFrame(edgelist, columns=colnames) --> 259 self._update_network() 260 261 def network_from_dict(self, contact): C:\ProgramData\Anaconda3\lib\site-packages\teneto\classes\network.py in _update_network(self) 220 """ 221 self._calc_netshape() --> 222 self._set_nettype() 223 if self.nettype: 224 if self.nettype[1] == 'u': C:\ProgramData\Anaconda3\lib\site-packages\teneto\classes\network.py in _set_nettype(self) 172 self.nettype = 'xu' 173 G1 = teneto.utils.df_to_array( --> 174 self.network, self.netshape, self.nettype) 175 self.nettype = 'xd' 176 G2 = teneto.utils.df_to_array( C:\ProgramData\Anaconda3\lib\site-packages\teneto\utils\utils.py in df_to_array(df, netshape, nettype) 749 if len(df) > 0: 750 idx = np.array(list(map(list, df.values))) --> 751 G = np.zeros([netshape[0], netshape[0], netshape[1]]) 752 if idx.shape[1] == 3: 753 if nettype[-1] == 'u': MemoryError: Am I doing something wrong or maybe this representation cannot handle large networks? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

alberto-bracci · 2019-11-11T17:31:34Z

It seems that argument is not present, at least in my version. I tried 'forcesparse=True' and also 'hdf5=True' without success. What's more, the error is the same, which I wouldn't expect in the latter case as it should use a different format.
The network has around 900 nodes and 33720 time-stamped links

wiheto · 2019-11-11T17:50:55Z

Sorry I meant forcesparse=True (was sitting on a train and didn't doublecheck the argument). So the HDF5 compatibility was never complete/optimized as it was also slowing down processing on smaller networks. And it is on my todo list to fix all this in December when I have time to contribute to this instead of other projects. So bare with me here. There may be one or two errors, but we can probably get them sorted quite easily when they arise.

But this problem seems to be the function trying to figure out what type of network your input and it is trying to make the dataframe a numpy array to determine this (not optimal). So if you add the argument: nettype='bu' (or 'bd', 'wu', 'wd') depending on if you network is binary/weighted undirected/directed, this function shouldn't be called.

That is slightly bigger than most of the networks I usually use (ca 500 nodes and 1000 time points). But the HDF5 representation should work.

alberto-bracci · 2019-11-11T18:02:13Z

this indeed worked! setting the type also makes 'forcespars' or HDF5 seemingly unnecessary.
Quick unrelated question (to avoid opening another issue): is it possible to have references for the centrality measures implemented in the library? Like the formula or article they are referring to.

Thanks for your quick help!
Alberto

wiheto · 2019-11-11T18:16:55Z

Which centrality measure in particular are you after?

I generally follow Masuda and Lambiotte's book "A guide to temporal networks" for the maths for many of the measures. Adding citations to all the docstrings is also on the todo list. Some of them already have quite detailed information in the docmentation (e.g. here, but I've not had time to write one for every measure yet).

So if there is any you want want me to find, I can find them for you and also add them to the docstrings and provide the references for you here too.

alberto-bracci · 2019-11-11T18:28:33Z

I was mainly interested in the centrality for now. So closeness, betwenness and degree are the ones missing. I am asking because I found different definitions in different papers, and as of now I am not able to get a copy of the book to look for them by myself.
Really appreciate your help here!

wiheto · 2019-11-11T18:37:19Z

Alright. I have some writing time assigned later today. So I'll add them then. So within 24 hours I'll have the the documentation of all three of those. And, especially for closeness and betweenness I'll add to the documentation of shortest temporal paths as well (as that is the place I've seen the most diffferences in equations).

wiheto · 2019-11-12T07:07:52Z

You may want to update from the developer branch: https://github.com/wiheto/teneto/tree/develop as some argument names are changing in the upcoming 0.5.0, so the documentation isn't fully in line with the functions in 0.4.6

The more in depth documentation is here:

https://teneto.readthedocs.io/en/develop/networkmeasures/temporal_closeness_centrality.html#module-teneto.networkmeasures.temporal_closeness_centrality

https://teneto.readthedocs.io/en/develop/networkmeasures/temporal_degree_centrality.html#module-teneto.networkmeasures.temporal_degree_centrality

As with a lot of teneto's documentation, I write far too quickly to get doc coverage, and sometimes loose clarity. Just leave an issue whenever anything is unclear

2 changes still to make.

So the shortest temporal paths is HDF5 ready but the calculation of closeness centrality is not. It is an easy fix. but I want to test it tomorrow to make sure it works. But since you will need the shortest temporal paths for both bet centrality and closeness, you may as well precompute that first anyway and save it.

I didn't get round to betweenness centrality docs. I'll also try and do that tomorrow.

wiheto · 2019-11-13T06:38:13Z

https://teneto.readthedocs.io/en/develop/api/teneto.networkmeasures.temporal_betweenness_centrality.html#teneto.networkmeasures.temporal_betweenness_centrality

I've also updated the normalization to follow the reference before for 0.5.0. Previously it did not divide by sigma_jk. I need to write a test to make sure this is working as expected (today or tomorrow)

Otherwise, can I close this issue now? Seems like the problems are sorted.

alberto-bracci · 2019-11-13T09:31:44Z

Yes, everything should be fine. Just a question: how quick you expect the shortest path function to be? I tried it with a network of around 90 nodes and 300 links and after 6 hours it wasn't finished yet (core i7 on laptop).

Also, it is better to first compute the shortest paths and then use them as argument for closeness and betweenness right?

alberto-bracci · 2019-11-13T16:37:58Z

Also, there might be another issue with the shortest path function:
Whereas with a 'bd' network the behavior is as described above, the same network but loaded as 'bu' returns the following error:

File "", line 1, in
shortest_paths = tnt.networkmeasures.shortest_temporal_path(t)

File "C:\ProgramData\Anaconda3\lib\site-packages\teneto\networkmeasures\shortest_temporal_path.py", line 201, in shortest_temporal_path
network = tnet.get_network_when(ij=list(ij), t=t)

File "C:\ProgramData\Anaconda3\lib\site-packages\teneto\classes\network.py", line 483, in get_network_when
return teneto.utils.get_network_when(self, **kwargs)

File "C:\ProgramData\Anaconda3\lib\site-packages\teneto\utils\utils.py", line 993, in get_network_when
network['j'].isin(ij))), (network['t'].isin(t)))]

TypeError: and_ expected 2 arguments, got 1

wiheto · 2019-11-14T05:30:29Z

Yes, everything should be fine. Just a question: how quick you expect the shortest path function to be? I tried it with a network of around 90 nodes and 300 links and after 6 hours it wasn't finished yet (core i7 on laptop).

So when making the HDF5 compatible objects I compromised on speed. This is the major backbones issues regarding speed that has to be solved that is planned for the end of December (the start of #36 is relevant here).

Also, it is better to first compute the shortest paths and then use them as argument for closeness and betweenness right?

Yes, cause otherwise you have to calculate the paths twice, and that is the most computationally intense part.

Regarding the error. Interesting. I'm going to open up a new issue about that as that is about undirected HDF5 network referencing.

wiheto · 2019-11-14T06:00:28Z

Also, regarding speed of shortest_temporal_paths: to minimize the possible path space, you could change the value of steps_per_t.

The default parameter of steps_per_t in shortest_temporal_path is 'all'. This means that, at each time-point, a path can travel multiple nodes.This is not a reasonable assumption in many temporal networks. If you set this parameter to an integer (e.g. to 1 meaning that only one edge can be traveling per time-point per path), it will speed up the calculation.

wiheto · 2019-11-14T06:03:05Z

And another possible way to speed it up at the moment is to set i argument and run it in parallel (so for 90 nodes you can run 90 jobs at once. But will require access to a cluster).

wiheto · 2019-11-15T05:02:01Z

Aside from the computational time, I think all the issues here have been solved. So closing this issues.

alberto-bracci changed the title ~~Memory error with large network (?)~~ Memory error with large network - centralities Nov 13, 2019

wiheto mentioned this issue Nov 14, 2019

error in dataframe undirected network #53

Closed

wiheto closed this as completed Nov 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory error with large network - centralities #52

Memory error with large network - centralities #52

alberto-bracci commented Nov 11, 2019

wiheto commented Nov 11, 2019 via email

alberto-bracci commented Nov 11, 2019

wiheto commented Nov 11, 2019 •

edited

Loading

alberto-bracci commented Nov 11, 2019

wiheto commented Nov 11, 2019 •

edited

Loading

alberto-bracci commented Nov 11, 2019

wiheto commented Nov 11, 2019

wiheto commented Nov 12, 2019 •

edited

Loading

wiheto commented Nov 13, 2019

alberto-bracci commented Nov 13, 2019

alberto-bracci commented Nov 13, 2019

wiheto commented Nov 14, 2019

wiheto commented Nov 14, 2019

wiheto commented Nov 14, 2019 •

edited

Loading

wiheto commented Nov 15, 2019

Memory error with large network - centralities #52

Memory error with large network - centralities #52

Comments

alberto-bracci commented Nov 11, 2019

wiheto commented Nov 11, 2019 via email

alberto-bracci commented Nov 11, 2019

wiheto commented Nov 11, 2019 • edited Loading

alberto-bracci commented Nov 11, 2019

wiheto commented Nov 11, 2019 • edited Loading

alberto-bracci commented Nov 11, 2019

wiheto commented Nov 11, 2019

wiheto commented Nov 12, 2019 • edited Loading

2 changes still to make.

wiheto commented Nov 13, 2019

alberto-bracci commented Nov 13, 2019

alberto-bracci commented Nov 13, 2019

wiheto commented Nov 14, 2019

wiheto commented Nov 14, 2019

wiheto commented Nov 14, 2019 • edited Loading

wiheto commented Nov 15, 2019

wiheto commented Nov 11, 2019 •

edited

Loading

wiheto commented Nov 11, 2019 •

edited

Loading

wiheto commented Nov 12, 2019 •

edited

Loading

wiheto commented Nov 14, 2019 •

edited

Loading