Updates from Overleaf

yamanksingla · Jan 5, 2024 · 22db1c6 · 22db1c6
1 parent b323c64
commit 22db1c6
Show file tree

Hide file tree

Showing 4 changed files with 51 additions and 18 deletions.
diff --git a/chapter-3.tex b/chapter-3.tex
@@ -0,0 +1,4 @@
+\chapter{Generating Content Leading to Optimal Behavior}
+\label{chatper:Generating Content Leading to Optimal Behavior}
+
+
diff --git a/images/Major_levels_of_linguistic_structure.png b/images/Major_levels_of_linguistic_structure.png
diff --git a/images/levels of analysis.pdf b/images/levels of analysis.pdf
diff --git a/main.tex b/main.tex
@@ -160,9 +160,16 @@ \chapter*{Acknowledgments}
 
 \\
 \\\\\\\\\\\\\\\\
-\begin{sanskrit}यत: प्रवृत्तिर्भूतानां येन सर्वमिदं ततम् |\end{sanskrit}\\
-\begin{sanskrit}स्वकर्मणा तमभ्यर्च्य सिद्धिं विन्दति मानव: ||\end{sanskrit}BG \begin{sanskrit}18:46||\end{sanskrit}\\
- \end{tabular}
+\begin{sanskrit}मयि सर्वाणि कर्माणि संन्यस्याध्यात्मचेतसा |\end{sanskrit}\\
+\begin{sanskrit}निराशीर्निर्ममो भूत्वा युध्यस्व विगतज्वर: ||\end{sanskrit}BG \begin{sanskrit}3:30||\end{sanskrit}\\
+\\\\\\\\\\\\\\\\
+\begin{sanskrit}
+ नैव किञ्चित्करोमीति युक्तो मन्येत तत्त्ववित् |\end{sanskrit}\\
+\begin{sanskrit} पश्यञ्शृण्वन्स्पृशञ्जिघ्रन्नश्नन्गच्छन्स्वपञ्श्वसन् ||\end{sanskrit}\\
+\begin{sanskrit}प्रलपन्विसृजन्गृह्ण्न्नुन्मिषन्निमिषन्नपि |\end{sanskrit}\\
+\begin{sanskrit}इन्द्रियाणीन्द्रियार्थेषु वर्तन्त इति धारयन् ||\end{sanskrit}BG \begin{sanskrit}5:8-9||\end{sanskrit}\\
+
+\end{tabular}
 \end{table}
 
 \clearpage
@@ -198,11 +205,6 @@ \chapter{Introduction: The Two Cultures of Social Science}
 \end{comment}
 
 
-\begin{figure*}[h]
- \centering
- \includegraphics[width=1.0\textwidth]{images/factors of communication.pdf}
- \caption{Communication process can be defined by seven factors: Communicator, Message, Time of message, Channel, Receiver, Time of receipt, and Effect. Any message is created to serve an end goal. For marketers, the end goal is to bring in the desired receiver effect (behavior) (like clicks, purchases, likes, and customer retention). The figure presents the key elements in the communication pipeline - the marketer, message, channel, receivers, and finally, the receiver effect. \label{fig:factors-of-communication}}
-\end{figure*}
 
 
 Communication includes all of the procedures by which one mind may affect another \cite{shannon-weaver-1949}. This includes all forms of expression, such as words, gestures, speech, pictures, and musical sounds.
@@ -217,20 +219,47 @@ \chapter{Introduction: The Two Cultures of Social Science}
 Effect (or behavior) over a content can also enable us to understand about the content, the communicator, the receiver, or the time. Therefore, efforts have also been made to extract information about the content itself from the behavior it generates. For instance, using keystroke movements \cite{plank2016keystroke} and eye movements to improve natural language processing \cite{klerke2016improving,khurana-etal-2023-synthesizing}. Similarly, the fields of human alignment and reinforcement learning with human feedback (RLHF) try to use human behavioral signals of likes, upvotes, downloads, and annotations of a response's helpfulness to improve content generation - both text \cite{kreutzer2018can,stiennon2020learning,ziegler2019fine,nakano2021webgpt,si2023long} and images \cite{lee2023aligning,pressman2023simulacra,wu2023better,khurana2023behavior}.
 
 
+
+\begin{figure*}[!th]
+ \centering
+ \includegraphics[width=1.0\textwidth]{images/factors of communication.pdf}
+ \caption{Communication process can be defined by seven factors: Communicator, Message, Time of message, Channel, Receiver, Time of receipt, and Effect. Any message is created to serve an end goal. For marketers, the end goal is to bring in the desired receiver effect (behavior) (like clicks, purchases, likes, and customer retention). The figure presents the key elements in the communication pipeline - the marketer, message, channel, receivers, and finally, the receiver effect. \label{fig:factors-of-communication}}
+\end{figure*}
+
+
 % XXX: to be improved
 In the more traditional social science and computational social science cultures, research is carried out to discover causal effects and model them. 
 For instance, propaganda and mass communication studies \cite{mcquail1987mass,krippendorff2018content,lasswell1948structure,lasswell1971propaganda} try to understand the culture, time, authors, recipients in a non-invasive manner using the messages exchanged, and persuasion studies \cite{petty1981effects,chaiken1980heuristic} where the persuasion strategy present in the content is identified and correlated with (un)successful efforts of persuasion. 
 
 
 
+
 A common theme that runs through both research cultures in behavioral sciences is the intent to control behavior. Explanation and prediction are intermediate steps to control and hence optimize behavior. Optimizing behavior means to fulfill the communicator's objectives by controlling the other six parts of the communication process (Fig.~\ref{fig:factors-of-communication}). Due to the problem space being large, the solution needs a general understanding of human behavior as opposed to being domain-specific. 
 
 
-The characteristic that marks the digital age is the prevalence of human behavioral data in huge repositories. This data is big (allowing to model heterogeneity), always-on (allowing to look in the past as well as live measurements), observational (as opposed to reactive), but also incomplete (does not capture all that is happening everywhere everytime in a single repository) and algorithmically confounded (generated as a byproduct of an engineering process with a goal) \cite{salganik2019bit}. While the predictive culture has tried to make use of some of this data in the form of social media datasets like Twitter \cite{tumasjan2010predicting,asur2010predicting} and Instagram \cite{kim2020multimodal}, Google trends \cite{choi2012predicting,carriere2013nowcasting}, Wikipedia \cite{generous2014global,de2021general,mestyan2013early}, shopping websites \cite{krumme2013predictability,de2015unique} and other data sources \cite{brockmann2006scaling,song2010limits,miritello2013limited}, these efforts are limited, in the sense of being dependent on one or a few chosen platforms, able to answer a limited set of questions, and restricted by access to private data. We want a model that can understand (predict and explain) human behavior as opposed to modeling a particular effect (retweet prediction) on a particular platform (\textit{e.g.} Twitter) for a certain type of users.
+The characteristic that marks the digital age is the prevalence of human behavioral data in huge repositories. This data is \textit{big} (allowing to model heterogeneity), \textit{always-on} (allowing to look in the past as well as live measurements), observational (as opposed to reactive), but also \textit{incomplete} (does not capture all that is happening everywhere everytime in a single repository) and \textit{algorithmically confounded} (generated as a byproduct of an engineering process with a goal) \cite{salganik2019bit}. While the predictive culture has tried to make use of some of this data in the form of social media datasets like Twitter \cite{tumasjan2010predicting,asur2010predicting} and Instagram \cite{kim2020multimodal}, Google trends \cite{choi2012predicting,carriere2013nowcasting}, Wikipedia \cite{generous2014global,de2021general,mestyan2013early}, shopping websites \cite{krumme2013predictability,de2015unique} and other data sources \cite{brockmann2006scaling,song2010limits,miritello2013limited}, these efforts are limited, in the sense of being dependent on one or a few chosen platforms, able to answer a limited set of questions, and restricted by access to private data. We want a model that can understand (predict and explain) \textit{human behavior in general} as opposed to modeling a particular effect (retweet prediction) on a particular platform (\textit{e.g.} Twitter) for a certain type of users.
 This problem carries parallels with the problem being solved in the natural language processing (NLP) community, where supervised models in NLP are limited by the amount of supervision available and being able to answer one question (for which the supervised model was trained). The problem was solved by developing Large Language Models (LLMs), which are general purpose models capable of \textit{understanding language}, and hence can solve natural language tasks like sentiment analysis, question answering, email generation, and language translation in zero-shot (\textit{i.e.} without needing any explicit training for that task) \cite{devlin2018bert,brown2020language,radford2018improving,raffel2020exploring,radford2019language}.
 
 
-Similarly, how do we develop a model capable of understanding behavior \textit{in general}? With the intent to answer this question, we take motivation from LLMs where the idea is to train a model on a data-rich task. The task chosen to train LLMs is the next-word prediction, and the dataset is the text collected from the entire internet. We leverage the human behavior repositories available on the internet for this general-purpose human behavior model. The format of this data is the general communication model shown in Fig.~\ref{fig:factors-of-communication} consisting of communicator, message, time of message, channel, receiver, time of receipt, and effect. Due to the incomplete nature of behavioral repositories, all the factors are usually not always available. However, a subset is always available, and we show that the data scale helps make a general behavior understanding model \cite{khandelwal2023large}. We show that the model is capable of predicting behavior, explaining it, and also generating message to bring about certain behavior \cite{khurana2023behavior,si2023long,khandelwal2023large}.
+
+
+
+\begin{figure*}[h]
+ \centering
+ \includegraphics[width=1.0\textwidth]{images/levels of analysis.pdf}
+ \caption{Levels of content analysis. The image lists tasks, and their sample outputs are arranged in a hierarchy \cite{shannon-weaver-1949}.}
+\end{figure*}
+
+
+Similarly, how do we develop a model capable of understanding behavior \textit{in general}? With the intent to answer this question, we take motivation from LLMs, where the idea is to train a model on a data-rich task. The task chosen to train LLMs is the next-word prediction, and the dataset is the text collected from the entire internet. We leverage the human behavior repositories available on the internet for this general-purpose human behavior model. The format of this data is the general communication model shown in Fig.~\ref{fig:factors-of-communication} consisting of communicator, message, time of message, channel, receiver, time of receipt, and effect. Due to the incomplete nature of behavioral repositories, all the factors are usually not always available. However, a subset is always available, and we show that the data scale helps make a general behavior understanding model \cite{khandelwal2023large}. We show that the model is capable of predicting behavior, explaining it, and also generating message to bring about certain behavior \cite{khurana2023behavior,si2023long,khandelwal2023large}.
+
+
+
+\textit{Why are general LLMs unable to solve behavioral problems?}
+
+
+
+Why do we think 
 
 
 
@@ -948,7 +977,7 @@ \subsubsection{Video Verbalization}
 
 \begin{table*}[!h]\centering
 % \scriptsize
-\begin{adjustbox}{max width = 0.9\textwidth}
+\begin{adjustbox}{width =\textwidth}
 \begin{tabular}{llccccccc}\toprule[1.5pt]
 \textbf{Method} &\textbf{Frame Extraction} &\textbf{METEOR} & \textbf{CIDEr} &\textbf{Rougle-l} &\textbf{BLEU-1}&\textbf{BLEU-2}&\textbf{BLEU-3}&\textbf{BLEU-4}\\\toprule[0.5pt]
 GPT-3.5 & Uniform Sampling & 24.8 & 102.4 & 24.3 & 63.8 & 56.4 & 47.2 & 38.6 \\
@@ -965,7 +994,7 @@ \subsubsection{Video Verbalization}
 
 \begin{figure*}[!t]
  \centering
- \includegraphics[width=0.8\textwidth]{images/example-stories.pdf}
+ \includegraphics[width=\textwidth]{images/example-stories.pdf}
  \caption{An example of a story generated by the proposed pipeline along with the predicted outputs of the video-understanding tasks on the generated story. The generated story captures information across scenes, characters, event sequences, dialogues, emotions, and the environment. This helps the downstream models to get adequate information about the video to reason about it correctly. The original video can be watched at \url{https://youtu.be/_amwPjAcoC8}.}
  \label{fig:example-story}
 \end{figure*}
@@ -1481,8 +1510,8 @@ \chapter{Content and Behavior Models}
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\chapter{Generating Content Leading to Optimal Behavior}
-\label{chatper:Generating Content Leading to Optimal Behavior}
+
+\include{chapter-3}
 
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -1493,13 +1522,13 @@ \chapter*{Publications}
 
  \item Kumar, Y., Jha, R., Gupta, A., Aggarwal, M., Garg, A., Malyan, T., Bhardwaj, A., Ratn Shah, R., Krishnamurthy, B., \& Chen, C. (2023). Persuasion Strategies in Advertisements. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1), 57-66. \url{https://doi.org/10.1609/aaai.v37i1.25076}
 
- \item Bhattacharya, A., Singla, Y. K., Krishnamurthy, B., Shah, R. R., \& Chen, C. (2023). A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9822–9839, Singapore. Association for Computational Linguistics.
+ \item Bhattacharya, A., Singla, Y. K., Krishnamurthy, B., Shah, R. R., \& Chen, C. (2023). A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9822–9839, Singapore. Association for Computational Linguistics. (Nominated for the best paper award!)
 
- \item Khandelwal, A., Agrawal, A., Bhattacharyya, A., Singla, Y.K., Singh, S., Bhattacharya, U., Dasgupta, I., Petrangeli, S., Shah, R.R., Chen, C. and Krishnamurthy, B., 2023. Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior. arXiv preprint arXiv:2309.00359. (Under review at ICLR-24). 
+ \item Khandelwal, A., Agrawal, A., Bhattacharyya, A., Singla, Y.K., Singh, S., Bhattacharya, U., Dasgupta, I., Petrangeli, S., Shah, R.R., Chen, C. and Krishnamurthy, B., 2023. Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior. arXiv preprint arXiv:2309.00359. (Under review). 
 
- \item S I, H., Singh, S., K Singla, Y., Krishnamurthy, B., Chen, C., Baths V., \& Ratn Shah, R. (2023). Sharingan: How Much Will Your Customers Remember Your Brands After Seeing Your Ads?. arxiv preprint (Under review at NAACL-24).
+ \item S I, H., Singh, S., K Singla, Y., Krishnamurthy, B., Chen, C., Baths V., \& Ratn Shah, R. (2023). Sharingan: How Much Will Your Customers Remember Your Brands After Seeing Your Ads?. arxiv preprint (Under review).
 
- \item Khurana, V., Singla, Y.K., Subramanian, J., Shah, R.R., Chen, C., Xu, Z. and Krishnamurthy, B., 2023. Behavior Optimized Image Generation. arXiv preprint arXiv:2311.10995. (Under review at CVPR-24)
+ \item Khurana, V., Singla, Y.K., Subramanian, J., Shah, R.R., Chen, C., Xu, Z. and Krishnamurthy, B., 2023. Behavior Optimized Image Generation. arXiv preprint arXiv:2311.10995. (Under review)
 
 \end{enumerate}