Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shout-out from Stata Corp. #387

Open
amichuda opened this issue Apr 21, 2021 · 7 comments
Open

Shout-out from Stata Corp. #387

amichuda opened this issue Apr 21, 2021 · 7 comments

Comments

@amichuda
Copy link

Stata 17 has the pystata package which lets users run Stata from python. Guess who they acknowledged?!

https://www.stata.com/python/pystata/ack.html

I think the package is closed source, so they didn't really follow the spirit of your package, but still pretty cool!

Once again, really great job on this package, from what I've seen in my research and other institutions (at least in economic development work), the stata kernel has made a splash!

@mcaceresb
Copy link
Collaborator

Well, if it actually used any of the code here they'd have to publish it, right? I'm assuming they must have re-done it from scratch.

@amichuda
Copy link
Author

Yes, I think that's why they used the word "inspired," because otherwise I think you'd have a lawsuit against then if they used your code in a closed source software right (not a lawyer, so have no idea).

But not even sure how the software is being handled.

@kylebarron
Copy link
Owner

This is interesting. Some thoughts:

  • Yeah probably not using any of our code. Don't think they'd be dumb enough to include a GPL3 dependency in their code...

  • It's hard to know much about their code because only their stata_setup module is public

  • Because it's not pip-installable, you need to tell Python users to change their PYTHONPATH every time so they can import the pacakge. I'm sure there'll be a ton of support requests of Python not finding the pystata import

  • It doesn't define a Jupyter kernel. Instead it uses plain Python and just defines a few IPython magics. But the code is all running inside the regular Python kernel.

  • Since it doesn't define its own kernel, I'm curious if they're able to maintain data state on the Stata side. From stata_setup and their example, it looks like they maintain a running Stata session. Is it in a subprocess? How does Stata keep data in sync on the Python and Stata sides? Seems like it would be a pain in the butt for users to have to keep track of "is my Python data the same as my Stata data"

    image

    sys.path.append(os.path.join(path, 'utilities'))
    from pystata import config 
    config.init(edition)

    I assume data isn't persisted on the Python side and sent to Stata every time a Stata command is called... That would be a horribly slow experience for large data.

  • It's curious that they're integrating so much with Python... I imagine (hope) there will be some users for whom this introduces them more to Python's huge data science ecosystem, and then say "why am I paying so much for Stata, when I see I can do everything I need here in Python". But clearly StataCorp thinks this integration will be positive for them 🤷‍♂️

@mcaceresb
Copy link
Collaborator

mcaceresb commented Apr 21, 2021

@kylebarron I would assume it's using the existing python interface they introduced in Stata 16? My assumption is that the python data and Stata data are separate; I don't see how else it could work, at least from skimming the docs. My assumption is:

  • There is a persistent Stata session.
  • Data created in Stata stays there until imported to python.
  • Data created in python stays there until imported to Stata.

They might use frames to cache some data but I can't imagine they by default copy every data created in python into Stata and the converse (i.e. without the user telling the kernel to do it).

@mcaceresb
Copy link
Collaborator

  • It's curious that they're integrating so much with Python... I imagine (hope) there will be some users for whom this introduces them more to Python's huge data science ecosystem, and then say "why am I paying so much for Stata, when I see I can do everything I need here in Python". But clearly StataCorp thinks this integration will be positive for them man_shrugging

For their core demo, which is social scientists with high switching costs, I don't know this will make such a big difference either way. I assume Stata is betting that this will encourage enough newcomers to stick around. At least the ones that don't might, as you say, get exposed to Python instead of unhappily languishing in Stata.

@roblem
Copy link

roblem commented Apr 23, 2021

I have this on order and will report back on how things are happening. Two comments unrelated to the inner workings of the Stata Corp python module:

  1. The new pricing model which requires annual subscriptions is ridiculously expensive
  2. In my own research I never use Stata (use tensorflow and jax) but for the models I teach in an upper level econometrics class, statsmodels isn't there yet and R is too clunky as every Model we cover has a different API which isn't going to work for my students. Since all of my colleagues use and teach with Stata, it would be unfair to students for me to force another workfow on them, although with the new prices I am having a difficult time seeing how my university can pay for all of these subscriptions.

@roblem
Copy link

roblem commented Apr 30, 2021

Have been testing this out this morning (on linux) having just upgraded to Stata 17. Observations:

  1. No syntax highlighting (although fenced stata codeblocks in markdown cells are highlighted)
  2. No completions
  3. Stata must be running as a background process since variables and the dataset exist across codeblocks, although a ps -ef | grep stata doesn't show anything.
  4. Copying python data into stata using %%stata -d some_dataframe_from_python creates a static copy of the python object/data that is not updated if the underlying python data changes.

The only advantage of the stata corp way is the mixing of stata and python in a single notebook, which I don't believe is possible with stata_kernel.

@kylebarron kylebarron mentioned this issue Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants