本文主要是介绍用pyton将word文档转成html和pdf,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
代码是从百度上找的,如下:
#!/usr/bin/env python
#coding=utf-8
from win32com import client as wc
word = wc.Dispatch('Word.Application')
doc = word.Documents.Open('e:/1.doc')
doc.SaveAs('e:/1.html', 8)
doc.SaveAs('e:/2.pdf', 17)
doc.SaveAs('e:/3.html', 10)
doc.Close()
word.Quit()'''
win32com download
http://sourceforge.net/projects/pywin32/files/pywin32/Build%20218这里测试的环境是:windows xp,office 2007,python 2.5.2,pywin32 build 213,原理是利用win32com接口直接调用office API,好处是简单、兼容性好,只要office能处理的,python都可以处理,处理出来的结果和office word里面“另存为”一致。原文地址:http://www.fuchaoqun.com/2009/03/use-python-convert-word-to-html-with-win32com/
view source
print
?
1.#!/usr/bin/env python
2.#coding=utf-8
3.from win32com import client as wc
4.word = wc.Dispatch('Word.Application')
5.doc = word.Documents.Open('d:/labs/math.doc')
6.doc.SaveAs('d:/labs/math.html', 8 )
7.doc.Close()
8.word.Quit()关键的就是doc.SaveAs(’d:/labs/math.html’, 8)这一行,网上很多文章写成:doc.SaveAs(’d:/labs/math.html’, win32com.client.constants.wdFormatHTML),直接报错:AttributeError: class Constants has no attribute ‘wdFormatHTML’当然你也可以用上面的代码将word文件转换成任意格式文件(只要office 2007支持,比如将word文件转换成PDF文件,把8改成17即可),下面是office 2007支持的全部文件格式对应表:wdFormatDocument = 0
wdFormatDocument97 = 0
wdFormatDocumentDefault = 16
wdFormatDOSText = 4
wdFormatDOSTextLineBreaks = 5
wdFormatEncodedText = 7
wdFormatFilteredHTML = 10
wdFormatFlatXML = 19
wdFormatFlatXMLMacroEnabled = 20
wdFormatFlatXMLTemplate = 21
wdFormatFlatXMLTemplateMacroEnabled = 22
wdFormatHTML = 8
wdFormatPDF = 17
wdFormatRTF = 6
wdFormatTemplate = 1
wdFormatTemplate97 = 1
wdFormatText = 2
wdFormatTextLineBreaks = 3
wdFormatUnicodeText = 7
wdFormatWebArchive = 9
wdFormatXML = 11
wdFormatXMLDocument = 12
wdFormatXMLDocumentMacroEnabled = 13
wdFormatXMLTemplate = 14
wdFormatXMLTemplateMacroEnabled = 15
wdFormatXPS = 18照着字面意思应该能对应到相应的文件格式,如果你是office 2003可能支持不了这么多格式。word文件转html有两种格式可选wdFormatHTML、wdFormatFilteredHTML(对应数字 8、10),区别是如果是wdFormatHTML格式的话,word文件里面的公式等ole对象将会存储成wmf格式,而选用 wdFormatFilteredHTML的话公式图片将存储为gif格式,而且目测可以看出用wdFormatFilteredHTML生成的HTML 明显比wdFormatHTML要干净许多。当然你也可以用任意一种语言通过com来调用office API,比如PHP.
'''
注意事项:
pywin32的版本要和python的版本一致,比如我的64位机器安装的是32位的python,如果安装64位的pywin32在运行时直接DLL报错,安装32位的则正常。
如果安装过程提示注册表错误如下:
那么直接运行下面这个python程序即可:
#
# script to register Python 2.0 or later for use with win32all
# and other extensions that require Python registry settings
#
# written by Joakim Loew for Secret Labs AB / PythonWare
#
# source:
# http://www.pythonware.com/products/works/articles/regpy20.htm
#
# modified by Valentine Gogichashvili as described in http://www.mail-archive.com/distutils-sig@python.org/msg10512.htmlimport sysfrom _winreg import *# tweak as necessary
version = sys.version[:3]
installpath = sys.prefixregpath = "SOFTWARE\\Python\\Pythoncore\\%s\\" % (version)
installkey = "InstallPath"
pythonkey = "PythonPath"
pythonpath = "%s;%s\\Lib\\;%s\\DLLs\\" % (installpath, installpath, installpath
)def RegisterPy():try:reg = OpenKey(HKEY_CURRENT_USER, regpath)except EnvironmentError as e:try:reg = CreateKey(HKEY_CURRENT_USER, regpath)SetValue(reg, installkey, REG_SZ, installpath)SetValue(reg, pythonkey, REG_SZ, pythonpath)CloseKey(reg)except:print "*** Unable to register!"returnprint "--- Python", version, "is now registered!"returnif (QueryValue(reg, installkey) == installpath andQueryValue(reg, pythonkey) == pythonpath):CloseKey(reg)print "=== Python", version, "is already registered!"returnCloseKey(reg)print "*** Unable to register!"print "*** You probably have another Python installation!"if __name__ == "__main__":RegisterPy()
这篇关于用pyton将word文档转成html和pdf的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!