[draft] import_page: support file scheme and use bs4 to workaround missing 'body' element#456
[draft] import_page: support file scheme and use bs4 to workaround missing 'body' element#456jirib wants to merge 6 commits intogetnikola:masterfrom
Conversation
v8/import_page/import_page.py
Outdated
| document = doc_template.format( | ||
| title=title, | ||
| slug=slug, | ||
| content=node.prettify() |
There was a problem hiding this comment.
Should we leave the HTML as it is? If it is 'article' should I just get 'article' content (mostly likely the website template would already have 'article' element) ???
There was a problem hiding this comment.
I tried to add a functionality for this topic, see ab15bed .
There was a problem hiding this comment.
I think we should remove the wrapper element (e.g. <article>) by default.
8c15716 to
ab15bed
Compare
| while args: | ||
| arg = args.pop(0) | ||
| if arg == "-s" and args: | ||
| selector = args.pop(0) | ||
| elif arg == "-e" and args: | ||
| extractor = args.pop(0) | ||
| else: | ||
| urls.append(arg) # Assume it's a page URL |
There was a problem hiding this comment.
You don’t need to parse args yourself, you should use the built-in support in doit. See just about any command plugin for an example.
There was a problem hiding this comment.
IIUC all plugins just use args in _execute() as "inputs" for its processing. But i'd like to introduce an optional option. That is:
<plugin> [-s extractor_file] arg...That is why I was parsing it... Or, do I misunderstand your comment?
| args = sys.argv[1:] | ||
| selector = None # 'body' | ||
| extractor = None # 'lambda node: BeautifulSoup(node.decode_contents(), "html.parser").prettify()' | ||
| urls = [] |
There was a problem hiding this comment.
I will come with a better solution, extractor_module so one can use an external file with its own code.
|
|
||
| doc_template = '''<!-- | ||
| .. title: {title} | ||
| .. slug: {slug} |
There was a problem hiding this comment.
Consider also adding date (defaulting to now is fine).
Support file scheme and use bs4 to workaround missing 'body' element