Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tables with nested columns (almost right) #155

Open
danielduckworth opened this issue Apr 23, 2024 · 0 comments
Open

Tables with nested columns (almost right) #155

danielduckworth opened this issue Apr 23, 2024 · 0 comments
Assignees

Comments

@danielduckworth
Copy link

First of all, this is the best document (with tables) indexing service I've tried so far. The use of markdown is highly appreciated. In my own testing, I've found markdown to be much more effective than plain text when it comes to tables. So Kudos!

I just wanted to give some feedback about what the LlamaParse pipeline got wrong. The PDF document I uploaded (an appendix from a public research report) is attached for reference.

ICILS 2018 Appendix E Student percentages.pdf

Below is the parsing instruction and the markdown that resulted. Interestingly the second table was closer to the original than the first.

For comparison, I have also included a similar prompt used with ChatGPT4 and the result (which accurately represented the structure of both tables). If LlamaParse uses a system prompt, tuning the instruction to flatten nested columns might help.

Parsing instruction:

The provided document is Appendix E from the ICILS 2018 international report. It contains tables. The columns after "Country" are groups with two sub-columns each. Table E.1 has the following groups and sub-columns: Gender (%): Males | Females, Parental occupation (%): SEI below 50 | SEI of 50 or above, Parental education (%): No parents with university degree | At least one parent with university degree, Books at home (%): 25 books or less | More than 25 books, Computer resources at home (%): Fewer than two computers | Two computers or more. Table E.2 has the following groups and sub-columns: Immigrant background (%): No immigrant background | Immigrant background, Test language (%): Students speaking another language at at home | Student speaking the survey language at home, Computer resources at home (%): Fewer than two computers | Two computers or more, Study of ICT-related subject (%): Not studying ICT-related subject | Studying ICT-related subject

Result:

ICILS 2018 International Report - Appendix E

Table E.1: Percentages of students in categories for dichotomous variables used in Chapters 3, 4, 5, and 6

Country Gender (%) Parental occupation (%) Parental education (%) Books at home (%) Computer resources at home (%)
Males Females SEI below 50 SEI of 50 or above No parents with university degree
Chile 51 (1.7) 49 (1.7) 68 (1.7) 32 (1.7) 75 (1.4)
Denmark †¹ 51 (0.9) 50 (0.9) 37 (1.3) 63 (1.3) 59 (1.6)
Finland 51 (0.8) 49 (0.8) 52 (1.4) 48 (1.4) 45 (1.3)
France 50 (0.6) 50 (0.6) 50 (1.1) 50 (1.1) 69 (1.2)
Germany 52 (0.9) 49 (0.9) 54 (1.4) 46 (1.4) 70 (1.4)
Italy² 52 (0.6) 48 (0.6) 59 (1.5) 41 (1.5) 69 (1.3)
Kazakhstan¹ 52 (0.7) 48 (0.7) 51 (1.4) 49 (1.4) 65 (1.5)
Korea, Republic of 53 (0.9) 47 (0.9) 43 (1.5) 57 (1.5) 27 (1.4)
Luxembourg 53 (0.7) 47 (0.7) 58 (0.6) 43 (0.6) 56 (0.5)
Portugal ††¹ 51 (1.0) 49 (1.0) 56 (1.5) 44 (1.5) 67 (1.6)
Uruguay 49 (1.0) 51 (1.0) 71 (1.5) 30 (1.5) 78 (1.6)
ICILS 2018 average 51 (0.3) 49 (0.3) 54 (0.4) 46 (0.4) 62 (0.4)
Not meeting sample participation requirements United States 51 (0.4) 49 (0.4) 39 (1.2) 62 (1.2)
Benchmarking participants meeting sample participation requirements Moscow (Russian Federation) 51 (0.7) 49 (0.7) 21 (1.1) 79 (1.1)
North Rhine-Westphalia (Germany) 52 (1.1) 48 (1.1) 46 (1.5) 54 (1.5)

Notes: Standard errors appear in parentheses. Because results are rounded to the nearest whole number, some totals may appear inconsistent. SEI = socioeconomic index.

† Met guidelines for sampling participation rates only after replacement schools were included.

†† Nearly met guidelines for sampling participation rates after replacement schools were included.

1 National defined population covers 90% to 95% of the national target population.

2 Country surveyed target grade in the first half of the school year.

ICILS 2018 International Report - Appendix E

Table E.2: Percentages of students in categories for dichotomous variables used in Chapters 3, 4, 5, and 6

Country Immigrant background (%) Test language (%) Computer resources at home (%) Study of ICT-related subject (%)
No immigrant background Immigrant background Students speaking another language at home Student speaking the survey language at home Fewer than two computers Two computers or more Not studying ICT-related subject Studying ICT-related subject
Chile 95 (1.2) 5 (1.2) 3 (0.3) 98 (0.3) 37 (1.4) 63 (1.4) 14 (2.1) 86 (2.1)
Denmark † ¹ 90 (0.8) 10 (0.8) 7 (0.6) 93 (0.6) 8 (0.6) 92 (0.6) 85 (1.0) 15 (1.0)
Finland 97 (0.4) 3 (0.4) 7 (0.7) 93 (0.7) 26 (1.1) 74 (1.1) 70 (1.5) 30 (1.5)
France 85 (1.1) 15 (1.1) 13 (0.9) 87 (0.9) 29 (1.0) 72 (1.0) 18 (1.0) 82 (1.0)
Germany 77 (1.1) 23 (1.1) 20 (0.9) 80 (0.9) 27 (1.1) 73 (1.1) 63 (2.4) 37 (2.4)
Italy² 90 (0.6) 11 (0.6) 21 (1.0) 79 (1.0) 42 (1.1) 58 (1.1) 10 (0.9) 90 (0.9)
Kazakhstan¹ 92 (0.8) 9 (0.8) 8 (1.0) 93 (1.0) 71 (1.1) 29 (1.1) 9 (0.8) 91 (0.8)
Korea, Republic of 100 (0.1) 0 (0.1) 1 (0.2) 99 (0.2) 39 (1.3) 61 (1.3) 73 (2.9) 27 (2.9)
Luxembourg 50 (0.8) 50 (0.8) 81 (0.6) 20 (0.6) 20 (0.5) 80 (0.5) 51 (0.5) 49 (0.5)
Portugal †† ¹ 93 (0.6) 7 (0.6) 4 (0.4) 96 (0.4) 33 (1.0) 67 (1.0) 12 (1.7) 89 (1.7)
Uruguay 98 (0.3) 2 (0.3) 3 (0.4) 97 (0.4) 45 (1.2) 55 (1.2) 2 (0.8) 98 (0.8)
ICILS 2018 average 88 (0.2) 12 (0.2) 15 (0.2) 85 (0.2) 34 (0.3) 66 (0.3) 37 (0.5) 63 (0.5)
United States 94 (0.5) 6 (0.5) 17 (1.2) 83 (1.2) 30 (0.8) 70 (0.8) 51 (1.3) 49 (1.3)
Moscow (Russian Federation) 90 (0.9) 10 (0.9) 4 (0.5) 96 (0.5) 29 (1.4) 71 (1.4) 2 (0.3) 98 (0.3)
North Rhine-Westphalia (Germany) 73 (1.8) 27 (1.8) 23 (1.8) 77 (1.8) 28 (1.3) 72 (1.3) 67 (2.2) 33 (2.2)

Notes: Standard errors appear in parentheses. Because results are rounded to the nearest whole number, some totals may appear inconsistent.

† Met guidelines for sampling participation rates only after replacement schools were included.

†† Nearly met guidelines for sampling participation rates after replacement schools were included.


ChatGPT prompt:

The provided document is Appendix E from the ICILS 2018 international report. It contains tables. The columns after "Country" are groups with two sub-columns each. Table E.1 has the following groups and sub-columns: Gender (%): Males | Females, Parental occupation (%): SEI below 50 | SEI of 50 or above, Parental education (%): No parents with university degree | At least one parent with university degree, Books at home (%): 25 books or less | More than 25 books, Computer resources at home (%): Fewer than two computers | Two computers or more. Table E.2 has the following groups and sub-columns: Immigrant background (%): No immigrant background | Immigrant background, Test language (%): Students speaking another language at at home | Student speaking the survey language at home, Computer resources at home (%): Fewer than two computers | Two computers or more, Study of ICT-related subject (%): Not studying ICT-related subject | Studying ICT-related subject

Convert the tables to markdown tables.

Result:

Here are the converted markdown tables based on the information from Appendix E of the ICILS 2018 international report.

Table E.1: Percentages of students in categories for dichotomous variables used in Chapters 3, 4, 5, and 6

Country Gender (%) Males Gender (%) Females Parental Occupation (%) SEI Below 50 Parental Occupation (%) SEI 50 or Above Parental Education (%) No University Degree Parental Education (%) University Degree Books at Home (%) 25 or Less Books at Home (%) More Than 25 Computer Resources at Home (%) Fewer Than Two Computer Resources at Home (%) Two or More
Chile 51 (1.7) 49 (1.7) 68 (1.7) 32 (1.7) 75 (1.4) 26 (1.4) 63 (1.8) 37 (1.8) 37 (1.4) 63 (1.4)
Denmark 51 (0.9) 50 (0.9) 37 (1.3) 63 (1.3) 59 (1.6) 41 (1.6) 28 (1.1) 72 (1.1) 8 (0.6) 92 (0.6)
Finland 51 (0.8) 49 (0.8) 52 (1.4) 48 (1.4) 45 (1.3) 55 (1.3) 27 (1.3) 73 (1.3) 26 (1.1) 74 (1.1)
France 50 (0.6) 50 (0.6) 50 (1.1) 50 (1.1) 69 (1.2) 31 (1.2) 41 (1.2) 59 (1.2) 29 (1.0) 72 (1.0)
Germany 52 (0.9) 49 (0.9) 54 (1.4) 46 (1.4) 70 (1.4) 30 (1.4) 24 (1.2) 76 (1.2) 27 (1.1) 73 (1.1)
Italy 52 (0.6) 48 (0.6) 59 (1.5) 41 (1.5) 69 (1.3) 31 (1.3) 31 (1.3) 69 (1.3) 42 (1.1) 58 (1.1)
Kazakhstan 52 (0.7) 48 (0.7) 51 (1.4) 49 (1.4) 65 (1.5) 35 (1.5) 55 (1.6) 45 (1.6) 71 (1.1) 29 (1.1)
Korea, Republic of 53 (0.9) 47 (0.9) 43 (1.5) 57 (1.5) 27 (1.4) 73 (1.4) 13 (0.8) 88 (0.8) 39 (1.3) 61 (1.3)
Luxembourg 53 (0.7) 47 (0.7) 58 (0.6) 42 (0.6) 56 (0.5) 44 (0.5) 28 (0.5) 72 (0.5) 20 (0.5) 80 (0.5)
Portugal 51 (1.0) 49 (1.0) 56 (1.5) 44 (1.5) 67 (1.6) 33 (1.6) 38 (1.2) 62 (1.2) 33 (1.0) 67 (1.0)
Uruguay 49 (1.0) 51 (1.0) 71 (1.5) 30 (1.5) 78 (1.6) 23 (1.6) 69 (1.5) 31 (1.5) 45 (1.2) 55 (1.2)
United States 51 (0.4) 49 (0.4) 39 (1.2) 62 (1.2) 52 (1.3) 48 (1.3) 37 (0.8) 63 (0.8) 30 (0.8) 70 (0.8)

Table E.2: Percentages of students in categories for dichotomous variables used in Chapters 3, 4, 5, and 6

Country Immigrant Background (%) No Immigrant Background (%) Yes Test Language (%) Other Test Language (%) Survey Computer Resources at Home (%) Fewer Than Two Computer Resources at Home (%) Two or More Study of ICT-Related Subject (%) No Study of ICT-Related Subject (%) Yes
Chile 95 (1.2) 5 (1.2) 3 (0.3) 98 (0.3) 37 (1.4) 63 (1.4) 14 (2.1) 86 (2.1)
Denmark 90 (0.8) 10 (0.8) 7 (0.6) 93 (0.6) 8 (0.6) 92 (0.6) 85 (1.0) 15 (1.0)
Finland 97 (0.4) 3 (0.4) 7 (0.7) 93 (0.7) 26 (1.1) 74 (1.1) 70 (1.5) 30 (1.5)
France 85 (1.1) 15 (1.1) 13 (0.9) 87 (0.9) 29 (1.0) 72 (1.0) 18 (1.0) 82 (1.0)
Germany 77 (1.1) 23 (1.1) 20 (0.9) 80 (0.9) 27 (1.1) 73 (1.1) 63 (2.4) 37 (2.4)
Italy 90 (0.6) 11 (0.6) 21 (1.0) 79 (1.0) 42 (1.1) 58 (1.1) 10 (0.9) 90 (0.9)
Kazakhstan 92 (0.8) 9 (0.8) 8 (1.0) 93 (1.0) 71 (1.1) 29 (1.1) 9 (0.8) 91 (0.8)
Korea, Republic of 100 (0.1) 0 (0.1) 1 (0.2) 99 (0.2) 39 (1.3) 61 (1.3) 73 (2.9) 27 (2.9)
Luxembourg 50 (0.8) 50 (0.8) 81 (0.6) 20 (0.6) 20 (0.5) 80 (0.5) 51 (0.5) 49 (0.5)
Portugal 93 (0.6) 7 (0.6) 4 (0.4) 96 (0.4) 33 (1.0) 67 (1.0) 12 (1.7) 89 (1.7)
United States 94 (0.5) 6 (0.5) 17 (1.2) 83 (1.2) 30 (0.8) 70 (0.8) 51 (1.3) 49 (1.3)

These tables are formatted to display the percentage of students within each sub-category along with the standard error in parentheses, following the structure provided in the ICILS 2018 Appendix E.

@hexapode hexapode self-assigned this Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants