Constrained Relational Topic Models

Implementation of Constrained Relational Topic Models (C-RTM), proposed in the paper "Constrained Relational Topic Models" [https://doi.org/10.1016/j.ins.2019.09.039] accepted in Information Sciences, 2020. CRTM is a family of topic models that extend the well-know Relational Topic Models (Chang, 2009). It models the structure of a document network and incorporates other types of relational information obtained by prior domain knowledge. This implementation extends the code from the package of (Weiwei Yang's).

Execution of the program in Command Line

java -cp YWWTools.jar:deps.jar yang.weiwei.Tools --tool lda --model lda --constrained true --vocab <vocab-file> --corpus <corpus-file> --trained-model <model-file>

Required arguments
- --constrained true: it must be set to true to allow the incorporation of prior knowledge constraints.
- <vocab-file>: Vocabulary file. Each line contains a unique word.
- <corpus-file>: Corpus file in which documents are represented by word indexes and frequencies. Each line contains a document in the following format
```
 <doc-len> <word-type-1>:<frequency-1> <word-type-2>:<frequency-2> ... <word-type-n>:<frequency-n>
```
  <doc-len> is the total number of tokens in this document. <word-type-i> denotes the i-th word in <vocab-file>, starting from 0. Words with zero frequency can be omitted.
- <model-file>: Trained model file in JSON format. Read and written by program.
- --train-c-file <constraint-file>: File containing the document constraints. Each line contains a constraint in the following format
```
<constraint-type> <document-1> <document-2>
```
  <document-1> is row-id of document-1. <document-2> is row-id of document-2. <constraint-type> must be set to M (if it is a must-constraint) or C (if it is a cannot-constraint).
Optional arguments
- --model <model-name>: The topic model you want to use (default: LDA). Tested <model-name> (case unsensitive) are
  - LDA: Constrained LDA
  - RTM: Constrained Relational topic model.
- other models as extensions of LDA implemented by Weiwei Yang can be used and are already provided in the code.
- --newfun <boolean>: Type of potential function of the constrained model. Default: false. If true, it is normalized. Otherwise it corresponds to the potential function described in SC-LDA.
- --lambda <lambda>: Strength parameter for the potential function described in SC-LDA. It is valid only if --newfun false.
- --no-verbose: Stop printing log to console.
- --alpha <alpha-value>: Parameter of Dirichlet prior of document distribution over topics (default: 1.0). Must be a positive real number.
- --beta <beta-value>: Parameter of Dirichlet prior of topic distribution over words (default: 0.1). Must be a positive real number.
- --topics <num-topics>: Number of topics (default: 10). Must be a positive integer.
- --iters <num-iters>: Number of iterations (default: 100). Must be a positive integer.
- --update: Update alpha while sampling (default: false). It does not work well.
- --update-int <update-interval>: Interval of updating alpha (default: 10). Must be a positive integer.
- --theta <theta-file>: File for document distribution over topics. Each line contains a document's topic distribution. Topic weights are separated by space.
- --output-topic <topic-file>: File for showing topics.
- --topic-count <topic-count-file>: File for document-topic counts.
- --top-word <num-top-word>: Number of words to give when showing topics (default: 10). Must be a positive integer.
- --burn-in <burnin>: Number of burn-in iterations. Default: 0.

Datasets

Three benchmark relational datasets are included in their related folders. They are already preprocessed and ready to be used as input for the model. Notice that the file labels.txt can be used to create the must- and cannot-constraints. Two random documents can be extracted and if their labels are the same, a must-constraint may be added to the <constraint-file>, otherwise a cannot-constraint may be added.

References

SC-LDA: Sparse Constrained LDA

Yang, Y., Downey, D., Boyd-Graber, J.: Efficient Methods for Incorporating Knowledge into Topic Models. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 308-317 (2015)

RTM: Relational Topic Models

Jonathan Chang, David M. Blei: Relational Topic Models for Document Networks. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS) 2009: 81-88

Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
citeSeer		citeSeer
cora		cora
doc		doc
lib/dict		lib/dict
sample-data		sample-data
src/yang/weiwei		src/yang/weiwei
webKB		webKB
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

citeSeer

citeSeer

cora

cora

doc

doc

lib/dict

lib/dict

sample-data

sample-data

src/yang/weiwei

src/yang/weiwei

webKB

webKB

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

pom.xml

pom.xml

Repository files navigation

Constrained Relational Topic Models

Execution of the program in Command Line

Datasets

References

SC-LDA: Sparse Constrained LDA

RTM: Relational Topic Models

About

Releases 2

Packages

Contributors 3

Languages

License

MIND-Lab/Constrained-RTM

Folders and files

Latest commit

History

Repository files navigation

Constrained Relational Topic Models

Execution of the program in Command Line

Datasets

SC-LDA: Sparse Constrained LDA

RTM: Relational Topic Models

About

Resources

License

Stars

Watchers

Forks

Languages