Skip to content

Artifician is an event-driven framework designed to simplify and accelerate the process of preparing datasets for Artificial Intelligence models.

License

Notifications You must be signed in to change notification settings

Plato-solutions/artifician

Repository files navigation

codecov CI/CD GitHub release (latest by date) PyPI version GitBook License Conda


Logo

Artifician

Artifician is an event driven library developed to simplify and speed up the process of preparation of the datasets for Artificial Intelligence models.


Getting Started

Pre-requisites

Installation

Binary installers for the latest released version are available at the Python Package Index (PyPI) and on Conda

# or PyPI
pip install artifician
# conda
conda install -c plato_solutions artifician

Documentation

Please visit Aritfician Docs

Usage

from artifician.dataset import *
from artifician.feature_definition import *
from artifician.processors.normalizer import *

  
def extract_domain_name(sample):  
    """function for extracting the path from the given URL"""
    domain_name = sample.split("//")[-1].split('/')[0] 
 
    return domain_name  
 
input_data = ['https://www.google.com/', 'https://www.youtube.com/']  
  
dataset = Dataset() # initializing dataset object
url_domain = FeatureDefinition(extract_domain_name, dataset) # initializing feature_definition and passing extractor function name as a parameter and subscribing it to dataset
normalizer = Normalizer(PropertiesNormalizer(), url_domain delimiter = {'delimiter': ["."]})  # Initializing normalizer (processor) and passing properties normalizer as a parameter and subscribing it to url_domain
  
  
""" Now we are all set to go, all we have to do is call add_samples method on the dataset object and pass the input data
after calling the add_samples, url_domain will start its execution and extract the data using extract_domain_name function, as soon url_domain
feature is processed normalizer will start it execution and furthur is will process the data extracted by url_domain. The processed data is then
passed back to the dataset. Following diagram will make it more clear for you. """ 

prepared_data = dataset.add_samples(input_data)  
print(prepared_data)  
  

Output

0 1
0 https://www.google.com/ [(www, 0), (google, 1), (com, 2)]
1 https://www.youtube.com/ [(www, 0), (youtube, 1), (com, 2)]