Draft of the Storage Scheme for the Email Spider #43

alexsunxl · 2023-08-31T07:41:03Z

Configuration File Path

The configuration file for the email scraping program is located at rootfs/email/config.toml.

The configuration file includes the following fields:

EMAIL_IMAP_SERVER: This field is for the IMAP server of your email service. For example, "imap.gmail.com".
EMAIL_ADDRESS: This field is for the email address that you want to scrape. Please replace with your own email address.
EMAIL_PASSWORD: This field is for the password of your email account. Please replace with your own password.
EMAIL_IMAP_PORT: This field is for the port number of your IMAP server. For Gmail, this is typically 993.
LOCAL_DIR: This field is for the local directory where you want to store the scraped emails. For example, 'rootfs/data'.

Please note that you should keep your email address and password confidential and ensure they are securely stored.

Sure, here's how you can incorporate this information:

Title: File Organization and Storage Scheme for Email Scraping Program

File Storage Path

The scraped email files will be stored in the directory rootfs/data/[email protected]/. And also could change it by LOCAL_DIR filed

Creation of Email Folders

Each email will be processed through its name and time to generate a unique MD5 hash. We then use this hash to create a unique folder to store the corresponding email content.

Email Content Storage

Within each email's folder, we create two files to store the main information of the email:

email.txt: This file stores the body content of the email.
meta.json: This file stores the header information of the email.

In addition, this folder can also be used to store attachments, images, and other files related to the email.

The above is the file organization and storage scheme for our email scraping program. We welcome your feedback and suggestions so that we can continuously optimize and improve this scheme.

The text was updated successfully, but these errors were encountered:

alexsunxl · 2023-08-31T07:43:30Z

Maybe look like this:

├── data
│   └── [email protected]
│       └── 5de3e52f3a6b90cabe6cbdd4ae3a5c5b
│           ├── email.txt
│           └── meta.json

Add a service: email spider #43

lurenpluto · 2023-09-11T09:35:27Z

Individual emails are stored in a separate directory, the name of the content inside needs to be fixed, so that we can use a fixed builder for each email processing, in addition to the mail inside the image, video, voice and other content, you should to use a separate directory for storage, easy parsing

A complete directory structure might look like the one shown below:

├── email.txt
└── meta.json
   ├── image
   │   ├── image1.jpg
   │   ├── image2.jpg
   │   └── ...
   ├── video
   │   ├── video1.mp4
   │   ├── video2.mv
   │   └── ...
   └── audio
      ├── audio1.m4a
      ├── audio2.flac
      └── ...

alexsunxl · 2023-09-11T10:43:07Z

It might be better to distinguish between images in email attachments and images in the body by placing them in different folders.

what do you think?
@waterflier @lurenpluto

waterflier · 2023-09-11T21:49:49Z

To align with mental models, I suggest that we adopt a structure where each directory corresponds to a single email. As for attachments, I believe there is no need to store them in separate directories. Typically, the number of attachments for a single email isn't excessive, so a separate directory may not be necessary.

From the perspective of Named Data Networking (NDN), we can store all videos and images by their respective hashes. We can then reference these existing files in the email directory using soft links. This approach should provide an efficient and intuitive way to manage our data.

Add a service: email spider fiatrete#43

alexsunxl added a commit to alexsunxl/OpenDAN-Personal-AI-OS that referenced this issue Aug 31, 2023

Add a service: email spider fiatrete#43

82dbcfa

alexsunxl added a commit to alexsunxl/OpenDAN-Personal-AI-OS that referenced this issue Aug 31, 2023

Add a service: email spider fiatrete#43

a6f03ff

waterflier added a commit that referenced this issue Sep 1, 2023

Merge pull request #44 from alexsunxl/MVP

27f163d

Add a service: email spider #43

photosssa pushed a commit to photosssa/OpenDAN-Personal-AI-OS that referenced this issue Sep 19, 2023

Add a service: email spider fiatrete#43

a12df24

photosssa pushed a commit to photosssa/OpenDAN-Personal-AI-OS that referenced this issue Sep 19, 2023

Merge pull request fiatrete#44 from alexsunxl/MVP

56fe444

Add a service: email spider fiatrete#43

photosssa pushed a commit to photosssa/OpenDAN-Personal-AI-OS that referenced this issue Sep 19, 2023

Add a service: email spider fiatrete#43

f9db996

photosssa pushed a commit to photosssa/OpenDAN-Personal-AI-OS that referenced this issue Sep 19, 2023

Merge pull request fiatrete#44 from alexsunxl/MVP

d928156

Add a service: email spider fiatrete#43

photosssa pushed a commit to photosssa/OpenDAN-Personal-AI-OS that referenced this issue Sep 20, 2023

Add a service: email spider fiatrete#43

29c01cc

photosssa pushed a commit to photosssa/OpenDAN-Personal-AI-OS that referenced this issue Sep 20, 2023

Merge pull request fiatrete#44 from alexsunxl/MVP

8b91d8f

Add a service: email spider fiatrete#43

photosssa pushed a commit to photosssa/OpenDAN-Personal-AI-OS that referenced this issue Sep 21, 2023

Add a service: email spider fiatrete#43

3cff241

photosssa pushed a commit to photosssa/OpenDAN-Personal-AI-OS that referenced this issue Sep 21, 2023

Merge pull request fiatrete#44 from alexsunxl/MVP

f2b4e53

Add a service: email spider fiatrete#43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft of the Storage Scheme for the Email Spider #43

Draft of the Storage Scheme for the Email Spider #43

alexsunxl commented Aug 31, 2023

alexsunxl commented Aug 31, 2023

lurenpluto commented Sep 11, 2023 •

edited

alexsunxl commented Sep 11, 2023

waterflier commented Sep 11, 2023

Draft of the Storage Scheme for the Email Spider #43

Draft of the Storage Scheme for the Email Spider #43

Comments

alexsunxl commented Aug 31, 2023

Configuration File Path

File Storage Path

Creation of Email Folders

Email Content Storage

alexsunxl commented Aug 31, 2023

lurenpluto commented Sep 11, 2023 • edited

alexsunxl commented Sep 11, 2023

waterflier commented Sep 11, 2023

lurenpluto commented Sep 11, 2023 •

edited