Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance custom element handling #205

Open
mmcdole opened this issue Mar 25, 2023 · 5 comments
Open

Enhance custom element handling #205

mmcdole opened this issue Mar 25, 2023 · 5 comments

Comments

@mmcdole
Copy link
Owner

mmcdole commented Mar 25, 2023

When gofeed encounters an element that it doesn't know expect, it will call parseText on it and attempt to store the value in a item.Custom map. parseText calls decode element and this will fail if the the element has nested structure. The handling of this seems pretty insufficient.

If we wanted to revamp custom element handling, where Gofeed encounters a tag in a feed that isn't a part of the spec, does anyone have feedback between two possible ways of representing it?

Option A - Nested Maps

func parseUnknownElement(p *xpp.XMLPullParser) (map[string]interface{}, error) {
	element := make(map[string]interface{})
	elementName := p.Name

	// Store attributes
	for _, attr := range p.Attrs {
		attrName := attr.Name.Local
		attrValue := attr.Value
		element["_attr_"+attrName] = attrValue
	}

	for {
		tok, err := shared.NextTag(p)
		if err != nil {
			return nil, err
		}

		if tok == xpp.EndTag && p.Name == elementName {
			break
		}

		if tok == xpp.StartTag {
			name := p.Name
			innerElement, err := parseUnknownElement(p)
			if err != nil {
				return nil, err
			}
			element[name] = innerElement
		} else {
			text, err := shared.ParseText(p)
			if err != nil {
				return nil, err
			}
			element["_text"] = text
		}
	}

	return element, nil
}

Where, if it encountered this torrent tag, the value would look something like:

"torrent": map[string]interface{}{
    "_attr_xmlns": "http://xmlns.ezrss.it/0.1/",
    "fileName": map[string]interface{}{
        "_text": "Cat.torrent",
    },
    "infoHash": map[string]interface{}{
        "_text": "A%2B%10%D5l%F2%BC%B3%F3%5E%D1%14qM%15%F2%26%C4%F0Z",
    },
    "contentLength": map[string]interface{}{
        "_text": "62643697701",
    },
    "contentLengthHR": map[string]interface{}{
        "_text": "58.34 GiB",
    },
}

Option B - Decode Element to a CustomElement struct

type CustomElement struct {
	XMLName xml.Name
	Data    string `xml:",innerxml"`
	Attrs   []xml.Attr `xml:",any,attr"`
}

func parseUnknownElement(p *xpp.XMLPullParser) (CustomElement, error) {
	element := CustomElement{}

	err := p.DecodeElement(&element)
	if err != nil {
		return element, err
	}

	return element, nil
}

Where the value would be roughly:

"torrent": CustomElement{
    XMLName: xml.Name{
        Space: "http://xmlns.ezrss.it/0.1/",
        Local: "torrent",
    },
    Data: `
        <fileName><![CDATA[Cat.torrent]]></fileName>
        <infoHash><![CDATA[A%2B%10%D5l%F2%BC%B3%F3%5E%D1%14qM%15%F2%26%C4%F0Z]]></infoHash>
        <contentLength>62643697701</contentLength>
        <contentLengthHR>58.34 GiB</contentLengthHR>
    `,
    Attrs: []xml.Attr{
        {
            Name: xml.Name{
                Space: "",
                Local: "xmlns",
            },
            Value: "http://xmlns.ezrss.it/0.1/",
        },
    },
}

@cristoper, @imirzadeh or @rodmcelrath do any of you have opinions on what seems like a better approach?

@mmcdole mmcdole changed the title Enhance Custom Element Handling Enhance custom element handling Mar 25, 2023
@mmcdole
Copy link
Owner Author

mmcdole commented Mar 25, 2023

Should I also consider handling custom elements within other elements? Like, within an <author> element for example?

Should each child struct have a customelement or map?

@mmcdole
Copy link
Owner Author

mmcdole commented Mar 25, 2023

We also have our extension capability that parses arbitrary extensions.

Perhaps we should get rid of the CustomElement idea and just tweak the extensions capability so it will parse anything, even non-namespaced extensions

https://github.com/mmcdole/gofeed/blob/master/rss/parser.go#L144

@rodmcelrath
Copy link
Contributor

rodmcelrath commented Mar 26, 2023 via email

@rodmcelrath
Copy link
Contributor

rodmcelrath commented Mar 26, 2023 via email

@rodmcelrath
Copy link
Contributor

rodmcelrath commented Mar 26, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants