Continue Convert to python3 #2

MasterOdin · 2018-05-27T22:02:09Z

This PR continues the work started in #1 (and also merging the 8.6.10 commits from asciidoc master) to port asciidoc to python3. I fixed the encoding errors I was getting when running the asciidoc test suite in the provided Dockerfile. The tests all pass, except for lang-ja, which asciidoc3 fixes compared.

See Travis-CI for running the test suite against Python 3.

Note: I've also pulled in the commits that were made in asciidoc/asciidoc (for the 8.6.10 release) into this PR so that it more fully "EOL" the old repo.

… asciidoc.py --doctest. testasciidoc.py still doesn't work, though

…he files to python3

For issue #103

Bump required Python version to 2.6 instead of 2.4

Prepare 8.6.10 release

…A (only 12.04 needed it)

…ild_remove_unneeded_ppa_ Fix build by removing reference to obsolete and no longer needed PPA

MasterOdin · 2018-05-28T03:11:16Z

Using the following script to compare output from asciidoc vs asciidoc3:

import difflib
import os

for entry in os.listdir('tests/data'):
    if not entry.endswith('html'):
        continue
    with open('tests/data/' + entry, 'r') as new, open('../asciidoc/tests/data/' + entry, 'r') as old:
        lines = []
        for line in difflib.unified_diff(old.readlines(), new.readlines(), n=0):
            lines.append(line)
        if len(lines) > 0:
            print(entry)
            with open('/tmp/' + entry, 'w') as open_file:
                for line in lines:
                    print(line, file=open_file)

Which gives:

lang-ja-book-test-html5.html
lang-ja-book-test-html4.html
lang-ja-book-test-xhtml11.html
lang-ja-article-test-xhtml11.html
lang-ja-article-test-html4.html
lang-ja-article-test-html5.html

due to the above regex for indexes.

- Update Python 2.6 and 2.7 references to be Python 3.4 and later - Remove explicit reference to Jython as it only supports up to Python 2.7 - Use new style module loading for Python 3.5+ - Update shebang lines to be python3

MasterOdin · 2018-05-28T21:27:49Z

So the reason the JA tests are failing is that the indexterm2 regex is (?<!\()[\\]?\(\((?P<attrlist>[^\s\(][^\(].*?)\)\)(?!\)) with the key bit being in the middle: ((?P[^\s\\(][^\\(].*?))) which implies that an index term must be greater than two characters (emphasized). It was working originally because the one JA character for monkey, 猿, was being internally represented as \xe7\x8c\xbf by Python2 and so the regex was matching it. However, now on Python3, it's properly internally represented as the one character and the regex fails. Fixing it requires removing the necessary second character to give just ((?P[^\s\\(].*?))). Putting it together gives (?<!\()[\\]?\(\((?P<attrlist>[^\s\(].*?)\)\)(?!\)).

I couldn't find any explicit mention in the asciidocs (either for asciidoc or asciidoctor, asciidoctor also seems to allow single character indexterms) that makes mention of minimum character length for an index term so I can't tell why this is, so I've committed the above change, which by all accounts brings asciidoc3 to match asciidoc output on all testcases.

I've modified my above test script to be stricter now to check and ensure output is byte indentical:

import os

for entry in os.listdir('tests/data'):
    if not entry.endswith('html'):
        continue
    with open('tests/data/' + entry, 'rb') as new, open('../asciidoc/tests/data/' + entry, 'rb') as old:
        if new.read() != old.read():
            print(entry)

which I'm happy to report it is and Travis-CI reports no failures for any test cases.

@elextr: I think this should now be ready for merging?

elextr · 2018-05-28T23:41:19Z

@MasterOdin thanks for continuing this work, hopefully will be able to progress it in the next few days.

Possibly the repo will be renamed as part of the first initial preparation for Asciidoc Python 2 EOL. IIUC github redirects links to the original repo to the new name, so your fork should still work.

I am pleasantly surprised you only found one regex that needed changing due to the difference between Python 2 ASCII and Python3 Unicode. I was expecting many more.

Once its committed users will be encouraged to test it for real.

MasterOdin · 2018-05-29T00:42:38Z

Yeah, it was good to see that the only regexes that really needed modification wasn't even really for something that was failing for a "true" unicode issue. If anything, the regexes are now guaranteed to be more stable and easier to write (as evidenced below) due to not having to worry about characters being expanded to codepoints.

Turns out I made a mistake in my comparison (and forgot to compare docbooks).

It's left me with these failures:

lang-ja-book-test-docbook.xml
newtables-docbook.xml

The failures for the lang-ja-book-test-docbook.xml is:

---

+++

@@ -16 +16 @@

-<chapter id="_前書">

+<preface id="_前書">

@@ -19,2 +19,2 @@

-</chapter>

-<chapter id="_奥付">

+</preface>

+<colophon id="_奥付">

@@ -23 +23 @@

-</chapter>

+</colophon>

However, this looks like it's a bug in asciidoc2 (which is the - lines) as it wasn't properly matching:
https://github.com/asciidoc/asciidoc3/blob/0bea48486a7265ae0c4cc792ab041df95ca142f2/lang-ja.conf#L48-L50
against the Title attribute, due to the fact that the ? in the regexes was only being applied to the final character of the code point instead of the entire character in Python2.

Still investigating newtables-docbook.xml which has a bunch of columns with a colwidth of 42* instead of 43*. Not sure what's up there.

IIUC github redirects links to the original repo to the new name, so your fork should still work.

Yeah, Github redirects the original name to the new name so long as you don't make a repository at the original name. Not sure what you're referring to with 'my fork' or renaming though.

MasterOdin · 2018-05-29T05:02:49Z

And fixed newtables-docbook.xml. Did not know that round() had changed from Python2 to Python3, so added a shim to get that functionality back for use for calculating column widths for docbook. With that, generated testcases are byte identical between asciidoc2 and asciidoc3 with the noted exception of lang-ja-book-test-docbook.xml, but that's a bug with asciidoc2 and its interaction of code points inside the JA regex string.

elextr · 2018-06-03T23:26:38Z

Note, PR title changed to "Continue ..." because I was getting confused about two PRs with the same title, nearly applied them in the wrong order :)

@aerostitch

Version 9.0.1 (2020-06-26) Bug fixes - Fix a2x crashing on decoding generated HTML pages Building - Fix generated tar.gz not having files under top-level asciidoc folder Testing - Test against Python 3.9 Version 9.0.0 (2020-06-02) Additions and changes - Port asciidoc to run on Python 3.5+ (see https://github.com/asciidoc/asciidoc for the EOL Python 2 implementation) - Drop internal implementation of OrderedDict and use the standard library collections.OrderedDict instead - Implement Dockerfile for running asciidoc - Add Catalan translation - Add docbook5 backend - Fix misspellings in various files and documents - Use UTC for testing instead of Pacific/Auckland (which observes daylight saving time) - Use "with" context statement for opening and closing files instead of older try/finally pattern - Search sibling paths before system wide paths in asciidocapi - Add manpage for testasciidoc.py - Use argparse instead of optparse for argument parsing - Add simplified Chinese translation (thanks @muirmok) - vim-asciidoc: speed up the refresh process for big files (thanks @aerostitch) - Allow specifying floatstyle attribute for figures, tables, equations, examples in docbook (thanks @psaris) - Use trans python module (if available) to better handle character decomposition to ascii for ascii-ids (thanks @rkel) - Use lru_cache to memoize repeated calls to macro look-up, giving potential ~15% speed-up on parsing Bug fixes - Fix index terms requiring two characters instead of just one (see asciidoc-py/asciidoc-py#2 (comment)) - Properly capture and use colophon, dedication, and preface for docbooks in Japanese (see asciidoc-py/asciidoc-py#2 (comment)) - make install did not include the unwraplatex.py filter - Fix a2x option collection from input file with non-ascii encoding - Fix options attribute not being properly parsed in Delimited Blocks attribute list Building - Migrate from hierarchical A-A-P build system to top-level Makefile - Add make help target that prints out usage message for make - Fix double slash issue in Makefile when installing asciidoc or its docs Testing - Commit generated test files to the repository for continuous integration - Test against Python 3.5+ on Travis-CI - Remove symlink tests/asciidocapi.py in favor of just appending to sys.path - Add requires directive to testasciidoc.conf to indicate necessary external dependencies (e.g. source-highlight)

moon-chilled and others added 24 commits June 24, 2017 20:37

Convert over to python3. Mostly manually, but the 2to3 utility helped

3c0659f

Change shebang to python3

4cf1f12

Remove (stub) encoding/decoding funcs, now that we're using str

b736187

Make travis use python3

aa30d05

Made asciidoc.py and testasciidocapi.py use python 3, and fix python3…

0c8e2fd

… asciidoc.py --doctest. testasciidoc.py still doesn't work, though

Fix ONE error. Now to (sigh) convert ALL the embedded python in all t…

59d053a

…he files to python3

Fix 'python3 asciidocapi.py'

aa4c4c2

Fix nearly all of the errors!

b1b4ed3

Make NEARLY all the tests pass. It think it's useable now!

5ace55d

Lex.__next__->Lex.next_element

d2a89bd

Bump required Python version to 2.6 instead of 2.4

a8a2739

add information about Asciidoctor and DocGist to the home page

6cbc866

set correct date and version of latest release in build website script

d4a31e6

Prepare 8.6.10 release

d310ffb

For issue #103

Merge pull request #37 from florentx/require26

91a909d

Bump required Python version to 2.6 instead of 2.4

Merge pull request #115 from bmwiedemann/release

cc9b6c8

Prepare 8.6.10 release

Typo fix latex-filter doc

f410a0e

Specify python 2 explicitly in makefile

a9c7074

Fix build by removing reference to obsolete and no longer unneeded PP…

5f87fd7

…A (only 12.04 needed it)

Merge pull request #124 from fidergo-stephane-gourichon/fix_travis_bu…

701c90f

…ild_remove_unneeded_ppa_ Fix build by removing reference to obsolete and no longer needed PPA

Update a2x.1.txt

cd295d5

Fix python3 encode errors

d5a99f5

decode the base_id back to str

169680e

Fix not including UTF-8 BOM if in source

b60acba

MasterOdin added 2 commits May 28, 2018 15:28

Update documentation and scripts to be more python3

d465a27

- Update Python 2.6 and 2.7 references to be Python 3.4 and later - Remove explicit reference to Jython as it only supports up to Python 2.7 - Use new style module loading for Python 3.5+ - Update shebang lines to be python3

Change indexterm2 to require 1+ instead of 2+ chars

1b1fa44

Add py2round method to get python2 rounding

6c838a6

MasterOdin added 2 commits May 29, 2018 02:18

Drop OrderedDict class for Python builtin

f59e8b7

migrate filters to python3

b87c6c7

adam820 mentioned this pull request May 29, 2018

PEP 8 fixes and formatting #3

Closed

revert to using getopt in testasciidoc.py

d2d11d8

elextr changed the title ~~Convert to python3~~ Continue Convert to python3 Jun 3, 2018

elextr merged commit 67de2ad into asciidoc-py:master Jun 3, 2018

MasterOdin mentioned this pull request May 25, 2019

Add generated testcases to repository #68

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continue Convert to python3 #2

Continue Convert to python3 #2

MasterOdin commented May 27, 2018 •

edited

MasterOdin commented May 28, 2018 •

edited

MasterOdin commented May 28, 2018 •

edited

elextr commented May 28, 2018

MasterOdin commented May 29, 2018 •

edited

MasterOdin commented May 29, 2018

elextr commented Jun 3, 2018

Continue Convert to python3 #2

Continue Convert to python3 #2

Conversation

MasterOdin commented May 27, 2018 • edited

MasterOdin commented May 28, 2018 • edited

MasterOdin commented May 28, 2018 • edited

elextr commented May 28, 2018

MasterOdin commented May 29, 2018 • edited

MasterOdin commented May 29, 2018

elextr commented Jun 3, 2018

MasterOdin commented May 27, 2018 •

edited

MasterOdin commented May 28, 2018 •

edited

MasterOdin commented May 28, 2018 •

edited

MasterOdin commented May 29, 2018 •

edited