+++ to secure your transactions use the Bitcoin Mixer Service +++

 

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue Convert to python3 #2

Merged
merged 30 commits into from
Jun 3, 2018
Merged

Continue Convert to python3 #2

merged 30 commits into from
Jun 3, 2018

Conversation

MasterOdin
Copy link
Member

@MasterOdin MasterOdin commented May 27, 2018

This PR continues the work started in #1 (and also merging the 8.6.10 commits from asciidoc master) to port asciidoc to python3. I fixed the encoding errors I was getting when running the asciidoc test suite in the provided Dockerfile. The tests all pass, except for lang-ja, which asciidoc3 fixes compared.

See Travis-CI for running the test suite against Python 3.

Note: I've also pulled in the commits that were made in asciidoc/asciidoc (for the 8.6.10 release) into this PR so that it more fully "EOL" the old repo.

moon-chilled and others added 24 commits June 24, 2017 20:37
… asciidoc.py --doctest. testasciidoc.py still doesn't work, though
Bump required Python version to 2.6 instead of 2.4
…ild_remove_unneeded_ppa_

Fix build by removing reference to obsolete and no longer needed PPA
@MasterOdin
Copy link
Member Author

MasterOdin commented May 28, 2018

Using the following script to compare output from asciidoc vs asciidoc3:

import difflib
import os

for entry in os.listdir('tests/data'):
    if not entry.endswith('html'):
        continue
    with open('tests/data/' + entry, 'r') as new, open('../asciidoc/tests/data/' + entry, 'r') as old:
        lines = []
        for line in difflib.unified_diff(old.readlines(), new.readlines(), n=0):
            lines.append(line)
        if len(lines) > 0:
            print(entry)
            with open('/tmp/' + entry, 'w') as open_file:
                for line in lines:
                    print(line, file=open_file)

Which gives:

lang-ja-book-test-html5.html
lang-ja-book-test-html4.html
lang-ja-book-test-xhtml11.html
lang-ja-article-test-xhtml11.html
lang-ja-article-test-html4.html
lang-ja-article-test-html5.html

due to the above regex for indexes.

- Update Python 2.6 and 2.7 references to be Python 3.4 and later
- Remove explicit reference to Jython as it only supports up to Python 2.7
- Use new style module loading for Python 3.5+
- Update shebang lines to be python3
@MasterOdin
Copy link
Member Author

MasterOdin commented May 28, 2018

So the reason the JA tests are failing is that the indexterm2 regex is (?<!\()[\\]?\(\((?P<attrlist>[^\s\(][^\(].*?)\)\)(?!\)) with the key bit being in the middle: ((?P[^\s\\(][^\\(].*?))) which implies that an index term must be greater than two characters (emphasized). It was working originally because the one JA character for monkey, 猿, was being internally represented as \xe7\x8c\xbf by Python2 and so the regex was matching it. However, now on Python3, it's properly internally represented as the one character and the regex fails. Fixing it requires removing the necessary second character to give just ((?P[^\s\\(].*?))). Putting it together gives (?<!\()[\\]?\(\((?P<attrlist>[^\s\(].*?)\)\)(?!\)).

I couldn't find any explicit mention in the asciidocs (either for asciidoc or asciidoctor, asciidoctor also seems to allow single character indexterms) that makes mention of minimum character length for an index term so I can't tell why this is, so I've committed the above change, which by all accounts brings asciidoc3 to match asciidoc output on all testcases.

I've modified my above test script to be stricter now to check and ensure output is byte indentical:

import os

for entry in os.listdir('tests/data'):
    if not entry.endswith('html'):
        continue
    with open('tests/data/' + entry, 'rb') as new, open('../asciidoc/tests/data/' + entry, 'rb') as old:
        if new.read() != old.read():
            print(entry)

which I'm happy to report it is and Travis-CI reports no failures for any test cases.

@elextr: I think this should now be ready for merging?

@elextr
Copy link
Contributor

elextr commented May 28, 2018

@MasterOdin thanks for continuing this work, hopefully will be able to progress it in the next few days.

Possibly the repo will be renamed as part of the first initial preparation for Asciidoc Python 2 EOL. IIUC github redirects links to the original repo to the new name, so your fork should still work.

I am pleasantly surprised you only found one regex that needed changing due to the difference between Python 2 ASCII and Python3 Unicode. I was expecting many more.

Once its committed users will be encouraged to test it for real.

@MasterOdin
Copy link
Member Author

MasterOdin commented May 29, 2018

Yeah, it was good to see that the only regexes that really needed modification wasn't even really for something that was failing for a "true" unicode issue. If anything, the regexes are now guaranteed to be more stable and easier to write (as evidenced below) due to not having to worry about characters being expanded to codepoints.

Turns out I made a mistake in my comparison (and forgot to compare docbooks).

It's left me with these failures:

lang-ja-book-test-docbook.xml
newtables-docbook.xml

The failures for the lang-ja-book-test-docbook.xml is:

---

+++

@@ -16 +16 @@

-<chapter id="_前書">

+<preface id="_前書">

@@ -19,2 +19,2 @@

-</chapter>

-<chapter id="_奥付">

+</preface>

+<colophon id="_奥付">

@@ -23 +23 @@

-</chapter>

+</colophon>

However, this looks like it's a bug in asciidoc2 (which is the - lines) as it wasn't properly matching:
https://github.com/asciidoc/asciidoc3/blob/0bea48486a7265ae0c4cc792ab041df95ca142f2/lang-ja.conf#L48-L50
against the Title attribute, due to the fact that the ? in the regexes was only being applied to the final character of the code point instead of the entire character in Python2.

Still investigating newtables-docbook.xml which has a bunch of columns with a colwidth of 42* instead of 43*. Not sure what's up there.

IIUC github redirects links to the original repo to the new name, so your fork should still work.

Yeah, Github redirects the original name to the new name so long as you don't make a repository at the original name. Not sure what you're referring to with 'my fork' or renaming though.

@MasterOdin
Copy link
Member Author

And fixed newtables-docbook.xml. Did not know that round() had changed from Python2 to Python3, so added a shim to get that functionality back for use for calculating column widths for docbook. With that, generated testcases are byte identical between asciidoc2 and asciidoc3 with the noted exception of lang-ja-book-test-docbook.xml, but that's a bug with asciidoc2 and its interaction of code points inside the JA regex string.

@adam820 adam820 mentioned this pull request May 29, 2018
@elextr elextr changed the title Convert to python3 Continue Convert to python3 Jun 3, 2018
@elextr elextr merged commit 67de2ad into asciidoc-py:master Jun 3, 2018
@elextr
Copy link
Contributor

elextr commented Jun 3, 2018

Note, PR title changed to "Continue ..." because I was getting confused about two PRs with the same title, nearly applied them in the wrong order :)

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jul 8, 2020
Version 9.0.1 (2020-06-26)

Bug fixes

-   Fix a2x crashing on decoding generated HTML pages

Building

-   Fix generated tar.gz not having files under top-level asciidoc folder

Testing

-   Test against Python 3.9

Version 9.0.0 (2020-06-02)

Additions and changes

-   Port asciidoc to run on Python 3.5+ (see
    https://github.com/asciidoc/asciidoc for the EOL Python 2 implementation)
-   Drop internal implementation of OrderedDict and use the standard library
    collections.OrderedDict instead
-   Implement Dockerfile for running asciidoc
-   Add Catalan translation
-   Add docbook5 backend
-   Fix misspellings in various files and documents
-   Use UTC for testing instead of Pacific/Auckland (which observes daylight
    saving time)
-   Use "with" context statement for opening and closing files instead of older
    try/finally pattern
-   Search sibling paths before system wide paths in asciidocapi
-   Add manpage for testasciidoc.py
-   Use argparse instead of optparse for argument parsing
-   Add simplified Chinese translation (thanks @muirmok)
-   vim-asciidoc: speed up the refresh process for big files (thanks
    @aerostitch)
-   Allow specifying floatstyle attribute for figures, tables, equations,
    examples in docbook (thanks @psaris)
-   Use trans python module (if available) to better handle character
    decomposition to ascii for ascii-ids (thanks @rkel)
-   Use lru_cache to memoize repeated calls to macro look-up, giving potential
    ~15% speed-up on parsing

Bug fixes

-   Fix index terms requiring two characters instead of just one (see
    asciidoc-py/asciidoc-py#2 (comment))
-   Properly capture and use colophon, dedication, and preface for docbooks in
    Japanese (see
    asciidoc-py/asciidoc-py#2 (comment))
-   make install did not include the unwraplatex.py filter
-   Fix a2x option collection from input file with non-ascii encoding
-   Fix options attribute not being properly parsed in Delimited Blocks
    attribute list

Building

-   Migrate from hierarchical A-A-P build system to top-level Makefile
-   Add make help target that prints out usage message for make
-   Fix double slash issue in Makefile when installing asciidoc or its docs

Testing

-   Commit generated test files to the repository for continuous integration
-   Test against Python 3.5+ on Travis-CI
-   Remove symlink tests/asciidocapi.py in favor of just appending to sys.path
-   Add requires directive to testasciidoc.conf to indicate necessary external
    dependencies (e.g. source-highlight)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants