Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supreme Court Oral Arguments Corpus: Update Years #168

Open
kakeith opened this issue Jul 8, 2022 · 5 comments
Open

Supreme Court Oral Arguments Corpus: Update Years #168

kakeith opened this issue Jul 8, 2022 · 5 comments

Comments

@kakeith
Copy link

kakeith commented Jul 8, 2022

For a recent project I'm working on, we're using ConvoKit's implementation of the Supreme Court Oral Argument Corpus. However, we'd really like to include data from after 2019.

How difficult would it be to run scripts to update the dataset for cases after 2019?

Thanks,
Katie

@cristiandnm
Copy link
Contributor

Hi Katie,

Happy to hear you are finding this data useful in your project! @tisjune developed this corpus, so she might be able to chime in and help with updating it. Although I don't really know how hard it is (e.g., if it involves any manual fixes) or if she has the time at the moment.

Cristian

@tisjune
Copy link
Collaborator

tisjune commented Jul 11, 2022

Hi Katie -- Unfortunately I don't have a script (or, I forgot the password to the machine that stores the collection of files that more or less document what I did) that pulls/can update the dataset, and there is some manual tinkering involved. In short, if you want to get started:

  • the data from Oyez is quite well-formatted, especially for more recent years. so a lot of what I say might only apply to older cases, but is nonetheless worth keeping in mind.
  • A lot of metadata from Oyez can be found in the html source. I used this metadata and heavily filled it in with info from SCDB. (I don't think Oyez has a neat database of metadata beyond whatever generated the html source.)
  • sometimes it's not actually clear which side the speaker is on, and Oyez doesn't consistently provide vote info from justices. I know the convokit documentation says, rather annoyingly, "documentation forthcoming" on the procedure for inferring speaker side...but basically: 1. rely on the order in which advocates make their case; 2. merge with/check against info from SCDB.
  • some cases are heard over multiple "conversations" -- there is usually one main "conversation" and some precursors/followups (where I guess justices verbally decide to postpone the hearing or something?)
  • IIRC there are some inconsistencies in case ID-ing between SCDB and Oyez. There was a database somewhere containing justice opinions that I used to match in the few cases the case IDs did not totally correspond. SCDB contains richer information about case outcome than Oyez, so I think that even for more recent years I'd rely on it to provide that information.

@kakeith
Copy link
Author

kakeith commented Jul 12, 2022

@tisjune @cristiandnm thanks for replying so quickly!

I'll pass on this info to my collaborators and see if there's interest in trying to update the corpus. If so, would you be interested in us contributing scripts to ConvoKit to make sure this corpus can continued to be updated in the future?

Thanks and best,
Katie

@cristiandnm
Copy link
Contributor

Thanks Katie,

Yes, we would be definitely interested in updating the dataset and having scripts ready for future updates. Let us know if we can help along the way.

@biaoyanf
Copy link

Hi, @kakeith,
I'm also interested in using this data with more updated years. How far have you got? Would that be publicly available if you have the updated data? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants