Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.1 Introduction - Simple Example, procedural example #43

Open
tweigel-vaisala opened this issue Jan 12, 2017 · 8 comments
Open

2.1 Introduction - Simple Example, procedural example #43

tweigel-vaisala opened this issue Jan 12, 2017 · 8 comments

Comments

@tweigel-vaisala
Copy link

The random_walk function is not a direct equivalent to the RandomWalker class method. A strict equivalent would be this:

def random_walk(n):
    position = 0
    for i in range(n):
        yield position
        position += 2*random.randint(0,1)-1

It is still not much faster, but it's a more fair comparison.

@rougier
Copy link
Owner

rougier commented Jan 12, 2017

Yes, you're right but it does as you noticed it does not make a real difference.

@datajanko
Copy link

datajanko commented Jan 13, 2017

additionally,
in
def random_walk_faster(n=1000): from itertools import accumulate steps = random.sample((-1,+1)*n, k=n) return [0]+list(accumulate(steps))
`
the random numbers are not drawn from the same distribution as in the other variants. And, in the timing you also measure the import costs of accumulate. However, there is no huge difference in computation time

@rougier
Copy link
Owner

rougier commented Jan 13, 2017

Yes, it has be noticed in #37 but the proposed solution require Python 3.6. However, from the documentation, the proposed sampling method should do a shuffle.

@datajanko
Copy link

datajanko commented Jan 16, 2017

Sorry, I didn't read the closed threads!

A list comprehension like

steps = [2* random.randint(0,1)-1 for _ in range(n)]

won't require python 3.6. You may even want to use a generator expression, which might be more efficient.

@rougier
Copy link
Owner

rougier commented Jan 16, 2017

Oh nice, I like it. We need to benchmark it and if it faster or approximately the same, we can include it. Could you make a PR ?

@datajanko
Copy link

datajanko commented Jan 16, 2017

Benchmarks

def random_walk(n):
    position = 0
    walk = [position]
    for i in range(n):
        position += 2*random.randint(0, 1)-1
        walk.append(position)
    return walk
def random_walk_faster(n=1000):
    steps = random.sample((-1,+1)*n, k=n)
    return [0]+list(accumulate(steps))
def random_walk_faster_2(n=1000):
    steps = [2*random.randint(0,1)-1 for _ in range(n)]
    return [0]+list(accumulate(steps))
def random_walk_faster_3(n=1000):
    steps = (2*random.randint(0,1)-1 for _ in range(n))
    return [0]+list(accumulate(steps))

%%timeit
random_walk(100000)
10 loops, best of 3: 141 ms per loop

%%timeit 
random_walk_faster(100000)
10 loops, best of 3: 78.1 ms per loop

%%timeit
random_walk_faster_2(100000)
10 loops, best of 3: 135 ms per loop

%%timeit
random_walk_faster_3(100000)
10 loops, best of 3: 139 ms per loop

So it looks like, that using random.sample from a false distribution yields a huge speedup.
The naive approach can compete with the list comprehension version.

So the problem here is essentially, that numpy improves two things.
Sampling and aggregating.
Maybe an example, that used the numpy sampling, but accumulate in aggregating might be a better solution.
Unfortunately, the random.choices function only has been introduced in python 3.6.
If it is faster than the list comprehension, using choices probably would be the best solution in terms of didactics

@rougier
Copy link
Owner

rougier commented Jan 16, 2017

Sorry, in the end I see your PR but the problem now is that the proposed solution does not correspond to the text so it is not good. What exactly is wrong with the random.sample method ? I think documentation says in some circumstances (k=len(x)) it is equivalent to a shuffle, no ?

@datajanko
Copy link

Hey, that's what I feared.
So why might the use of random.sample be a bad choice?
Firstly, it is absolutely not clear why using sample will provide the same algorithm.

Secondly, you are reducing the variance/standard deviation by a huge amount. Though you approximate the same mean for large repetions, the distribution you'll use will be too narrow. This solution penalizes solutions that are far out too much.

I think, this is easiest seen for the case n=2.
Using your sampling method: you find after the first draw a probability of 2/3 to move in the other direction. With the usual sampling the probability is still 1/2.
Hence, your method, will have a probability of 2/3 to end in the center and probability 1/6 to end at two places left or right off the center.
The other approach gives probabilities of 1/4, 1/2 and 1/4.

This difference in variance carries over to higher number of repetitions (and it is significant!)

So, one way would be to use np.randint() in all of the functions, maybe the best way would just be to comment on the issues with the variance and keep your solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants