Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot analyze messages because "no such file or directory @ rb_sysopen" #22

Open
victorialo opened this issue May 3, 2018 · 15 comments

Comments

@victorialo
Copy link

victorialo commented May 3, 2018

Upon trying to analyze messages using the command
$ DEBUG=true ruby analyze_facebook_data.rb /Users/victoria/Downloads/facebook-victorialo123, I get this output:

Analyzing Messages |Time: 00:00:00 | ======================================================================= | Time: 00:00:00
/Users/victoria/Documents/facebook_data_analyzer/classes/analyzeables/messages. `initialize': No such file or directory @ rb_sysopen - /Users/victoria/Downloads/facebook-victorialo123/index.htm (Errno::ENOENT)
	from /Users/victoria/Documents/facebook_data_analyzer/classes/analyzeables/messages. `open'

messages.rb:31 points to the line
@me ||= Nokogiri::HTML(File.open("#{@catalog}/index.htm")).title.split(' - Profile')[0].to_sym
Might want to look into this.

Thank you!

@thnukid
Copy link
Contributor

thnukid commented May 3, 2018

@victorialo
There was a discussion in #7 that there are different kind of archives exported. You might also have such an archive.

What would be great if you can provide the folder / file structure that you have, so that in future development these archives also can be supported.

If you can run this from your terminal and post the result in this issue, that would be great help:

ls -RGng /Users/victoria/Downloads/facebook-victorialo123 > archive_list.txt

@victorialo
Copy link
Author

victorialo commented May 3, 2018

image

Here's a screenshot of the folder structure of the downloaded data.

I looked at the creation of archive_list.txt and it contains some personal information, such as friends' names. Should I send this separately instead?

If there's a different type of archive exported, that's entirely possible - my account is from 2009 and I've used essentially every feature of Facebook, from page and group admins to life events and hundreds of thousands (if not millions) of messages.


I noticed that the snippet I shared was looking for anindex.htm but I had an index.html. I updated the line as such
@me ||= Nokogiri::HTML(File.open("#{@catalog}/index.html")).title.split(' - Profile')[0].to_sym
and got something else this time:

$ DEBUG=true ruby analyze_facebook_data.rb /Users/victoria/Downloads/facebook-victorialo123
Analyzing Messages |Time: 00:00:00 | ======================================================================= | Time: 00:00:00
/Users/victoria/Documents/facebook_data_analyzer/classes/analyzeables/messages.rb:96:in `count_by_sender': undefined method `each' for nil:NilClass (NoMethodError)
	from /Users/victoria/Documents/facebook_data_analyzer/classes/analyzeables/messages.rb:128:in `block in message_statistics_sheet'

@thnukid
Copy link
Contributor

thnukid commented May 3, 2018

@victorialo
The friend names are sensitive information, so you want to keep that private.

What is currently parsed are these files:
<facebook-archive>/html/friends.htm
<facebook-archive>/html/contact_info.htm
<facebook-archive>/html/friends.htm
<facebook-archive>/messages/*.html
<facebook-archive>/index.htm

The main interest is in the file extension and the folder structure itself. From that we could for instance create a Class that will generate different type of archives (/folder structures), based on the community feedback, so that the lib can handle different types.

@victorialo
Copy link
Author

Yeah, I'm looking, and I have no such folder called <facebook-archive>/html and I have index.html instead of index.htm.

The /messages/*.html seems to exist just fine, though.

@thnukid
Copy link
Contributor

thnukid commented May 4, 2018

Maybe because you have more information gathered the html directory is missing and split into folders.

<facebook-archive>/html/contact_info.htm might map to your profile_information folder
<facebook-archive>/html/friends.htm might map to your friends folder

What is the folder/file structure inside profile_information and friends folder?

@victorialo
Copy link
Author

image

image

@northcott-j
Copy link
Contributor

northcott-j commented May 4, 2018

@victorialo have you run without DEBUG=true? The script needs to run at least once without the DEBUG flag to parse the messages into .json files. After that, that flag can be used to avoid reading every message again.

I'd remove the Contacts.new from the analyzeables array in analyze_facebook_data.rb because it's unclear what file that would map to in your folder structure.

I'd try updating the friends script to target friends_added.html. If that doesn't work, I'd remove it from the array too. I'm not familiar with your folder structure, but it should at least work for messages.

@victorialo
Copy link
Author

victorialo commented May 8, 2018

I removed Contacts.new and I've updated initialize in analyzeables/friends.rb to be the following. Am I doing it right?

def initialize(catalog:)
    @catalog = catalog
    # @directory = "#{catalog}/html/"
    # @file_pattern = 'friends.htm'
    @directory = "#{catalog}/friends/"
    @file_pattern = 'friends_added.html'
    @friends = []

    super()
  end

I still get

$ ruby analyze_facebook_data.rb /Users/victoria/Downloads/facebook-victorialo123
Parsing Messages |Time: 00:00:00 | ========================================================================= | Time: 00:00:00
Analyzing Messages |Time: 00:00:00 | ======================================================================= | Time: 00:00:00
/Users/victoria/Documents/facebook_data_analyzer/classes/analyzeables/messages.rb:96:in `count_by_sender': undefined method `each' for nil:NilClass (NoMethodError)
	from /Users/victoria/Documents/facebook_data_analyzer/classes/analyzeables/messages.rb:128:in `block in message_statistics_sheet'

Removing the Friends feature also doesn't help:

analyzeables = [Messages.new(catalog: catalog, parallel: true)] #Contacts.new(catalog: catalog), Friends.new(catalog: catalog)]
$ ruby analyze_facebook_data.rb /Users/victoria/Downloads/facebook-victorialo123
Parsing Messages |Time: 00:00:00 | ========================================================================= | Time: 00:00:00
Analyzing Messages |Time: 00:00:00 | ======================================================================= | Time: 00:00:00
/Users/victoria/Documents/facebook_data_analyzer/classes/analyzeables/messages.rb:96:in `count_by_sender': undefined method `each' for nil:NilClass (NoMethodError)
	from /Users/victoria/Documents/facebook_data_analyzer/classes/analyzeables/messages.rb:128:in `block in message_statistics_sheet'

If you look at the error message, it's something wrong with messages.

@northcott-j
Copy link
Contributor

@victorialo - you're right, it does look like the issue is with messages. Are there any .json files in your messages folder? If so, can you delete them and try again?

If that doesn't work, my guess is that the HTML structure of your conversations is different from other archives which will involve some refactoring to fix. It'd be nice if the different formats could be generalized so a user could specify the archive type as a command line arg.

@victorialo
Copy link
Author

Yeah, there was a _.messages.html.json in that folder. I deleted it and tried again but got the same output, and the file was just recreated.

Yeah, I was thinking that too - that would be nice.

I could send you one of the files maybe to work off of to refactor?

@northcott-j
Copy link
Contributor

I do not have a messages.html file in my messages folder. My messages folder looks like this. Does yours look similar?
screen shot 2018-05-08 at 10 50 10 am

I'm hoping that this issue is purely a directory structure one, and not that the individual files are also different. But if your html files are also different, having one would be super helpful for a refactor

@victorialo
Copy link
Author

image
Mine is quite a bit different from yours....

Some of the folders have a photos, some have a files, some have neither.

@northcott-j
Copy link
Contributor

northcott-j commented May 8, 2018

Could you try updating your @file_pattern in messages to **/message.html and then change this line to File.open("_#{file.split('/')[0]}.json", 'w') do |json|.

The new file_pattern will look for all message.html files in subdirectories of messages

Changing L52 and adding the split will use the unique folder name when saving the json file. Without the split on the /, the json file will be saved in a new subdirectory when it should be saved in the root directory.

If these changes work (assuming it's only a directory structure issue and the files are the same), the README should be updated with archive types and the correct @file_patterns. The split will have to be added to messages.rb, but that change should work for all archive types we've currently seen. The code should also be updated to look for index.htm or index.html.

@thnukid
Copy link
Contributor

thnukid commented May 11, 2018

@victorialo I have create PR #33 - maybe you can check out that branch and see if this removes all sensitive information from your export.

Then @northcott-j and the others can deeper analyse your archive without posting screenshots.

@thnukid
Copy link
Contributor

thnukid commented Jun 1, 2018

@victorialo can you run
bin/anonymized_facebook_export --catalog <your_facebook_export_directory> --export <anonymized_export_to_directory
and share the results with us (maybe my making a pull request).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants