Skip to content

IRobotizeInternet/RuRay

Repository files navigation

If you have any questions on how to setup/contribue to the project please let me know here [email protected] I will add details on how to setup/contribute later to this README.md


What is RuRay

  • Control Facebook actions (Navigation, click, posting and everything else) through voice commands.
  • Design to support with multiple languages of Facebook.
  • Add custom macros to automate certain tasks through voice command.
  • All the services are designed in small managable micro services (Toolbox, Business Logic, Rest API, Automated text conversation using RASA, Voice to text using Google)

How it works.

  • Listen to the users request through microphone and convert it to text using google apis.
  • The converted text then fed into Rasa to understand the meaning and call appropriate api.
  • Once request is received through api, it will make a call to api server with the users request.
  • At the server the request is executed using Selenium, JavaScript, bash, c#, etc.

RuRay

Core projects in solution:

  1. RuRayToolbox
  2. RuRay.BLL (Core)
  3. RuRay.API
  4. WindowGrid
  5. RASAAPI
  6. RobotizeTTSAndSTT

Toolbox contains all common components that can be found in any website such as textbox, label, radio button, dropdown, etc. Each component class comes with common functionality. For example consider dropdown, it provides scrolling up/down, click, getting or setting list item and more. All components inherit from a base class called BaseDOMObject.cs.

The base class BaseDOMObject.cs implements and/or override OpenQA.Selenium.IWebElement interface. It also contains core functionality of DOM object manipulation through Javascript.

Business logic layer contains two parts

i. Facebook UI mapping (Pages, dialogs, popups, etc).
ii. API serives backend.

The entire facebook website (excluding some pages) have been mapped using Meet Me In the Middle methodology. Which means all the pages are broken down in to managable sections and then coded in multiple classes in a more object oriented way. Mapping is done through UI text, since Facebook encrypt all possible DOM identifiers. So I decided to use UI text to identify each element while mapping the website.

Using UI text is bit tricky when identifying the elements when language changes on the Facebook. To overcome this problem, all the UI text on the website is converted into resource strings. This will help us to run our code with just adding equivalent resource string in different language, without needing to add/update code. Home page is good starting point

Notable things in UI mappings:

  • Fluent API's for complex pages, ex: Marketplace page - GridMarketPlace. This is bit complex, however, it is to enforce filtering and searching in marketplace page through object oriented approach.
  • Bash scripts to get around with Selenium limitations ex: Creating chrome shortcut on desktop with automation support args, getting cursors location and sending keys, etc.
  • Since most of the mapping is separated with action usage, changes in the Facebook UI will minimum effort to fix the broken mapping.

It listens to the api requrests from RuRay.API and uses the Facebook UI mapping to execute the actions, such as adding a new post, Display & Accessibility setting change or anything that user can do through Facebook UI.

This contains consumable public apis that run code on the server for each request. These servies are used by RASA when user request certain action in natural language. Swagger is used to describe all RESTful APIs to access mapping and executing actions.

Lets call few apis from one browser and see live changes into another. There aren't many apis added to the services; however, adding new api is very easy. Lets tests these apis.

* Navigate to home page.
* Scroll up few times.
* Change display to dark mode. 

Animation

This is a windows desktop application which is helpful when user need to click on certain location of the screen, they can request to show/hide the grid. The grid appears with indexes, where the user can command to click on a given index. Please note everything is controlled through apis. This is demonstrated below:

  • Call grid API to identify the location.
  • Right click on the given index, identified through grid earlier.
  • Removing the grid.

Animation1

Now that you know how we mapped entire Facebook UI, and how we can call apis to execute certain actions. Since we have apis to do just anything on Facebook, we just need to teach machine learning program Rasa to learn what api to call when a user request something. I would encourage you to go to www.rasa.com and learn how Rasa works. It is not just string to string matching, instead it understand the meaning of the text and call appropriate api. Here are some examples how we provide training data to Rasa.

image

6. RobotizeTTSAndSTT ( Incomplete )

Finally the last part which complete the cycle, that is listening to the users requests from microphone and converting into text. This projects uses google apis to convert sound to text. There is not much added in this project yet.