SSM使用POI组件读取上传的word文档内容

最近毕设,有一个功能就是实现文档的上传并把上传的文档内容读取出来,然后保存到数据库中,之前课设用到过apache的poi,自然就想到了用poi组件,接下来就实现(记录)一下。

POI组件下载

直接去官网下载最新版zip包。

使用

解压后的目录如下:
image.png
image.png
image.png
为了后边导入excel表格做解析,我把需要的包全部导进去,有的在整合SSM的时候已经有了,就不用再导了。接下来就正式开始敲代码来实现一下。

JSP界面

1
2
3
4
5
<form class="form-horizontal" id="homework_submit">
<input id="enclosure" name="enclosure" type="file"
accept="application/msword,application/vnd.openxmlformats-officedocument.wordprocessingml.document">
<button type="button" onClick="homework_submit();">提交</button>
</form>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
function homework_submit() {
var url = .....
$.ajax({
type: 'POST',
url: url,
cache: false,
data: new FormData($('#homework_submit')[0]),
processData: false,
contentType: false,
success: function(data){
//...
},
error:function(data) {
//...
},
});
}

FormData默认表单enctype=”multipart/form-data”,故这里可以不指定,当然,上传表单的方式有很多种,作为新手的我,哪种简单方便就用。

Controller

1
2
3
4
5
6
7
//提交作业
@RequestMapping(value="/saveHomework/{sId}",method=RequestMethod.POST)
@ResponseBody
public Integer saveHomework(HttpServletResponse response, @ModelAttribute MultipartFile enclosure,Submitted submitted,HttpSession session) {
int result = submittedService.insertSubmitted(response,enclosure, submitted, session);
return result;
}

这里需要注意的是,@ModelAttribute MultipartFile enclosure要和前端对应,当然,由于使用的是SSM框架,需要在applicationContext.xml中配置文件上传的解析器

1
2
3
4
5
6
7
8
<!-- 定义文件上传解析器 -->
<bean id="multipartResolver"
class="org.springframework.web.multipart.commons.CommonsMultipartResolver">
<!-- 设定默认编码 -->
<property name="defaultEncoding" value="UTF-8"></property>
<!-- 设定文件上传的最大值5MB,5*1024*1024 -->
<property name="maxUploadSize" value="5242880"></property>
</bean>

Service

这里是逻辑的实现,具体如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
@Override
public int insertSubmitted(HttpServletResponse response,MultipartFile enclosure, Submitted submitted, HttpSession session) {
//MultipartFile转File
CommonsMultipartFile cf= (CommonsMultipartFile)enclosure;
DiskFileItem fi = (DiskFileItem)cf.getFileItem();
File file = fi.getStoreLocation();
String content = null;
if (! enclosure.isEmpty()) {
String originalFilename = enclosure.getOriginalFilename();
if (originalFilename.endsWith(".doc")) {
try {
FileInputStream fis = new FileInputStream(file);
@SuppressWarnings("resource")
HWPFDocument doc = new HWPFDocument(fis);
content = doc.getDocumentText();
System.out.println(content);
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}else if (originalFilename.endsWith(".docx")) {
try {
FileInputStream fis = new FileInputStream(file);
XWPFDocument xdoc = new XWPFDocument(fis);
@SuppressWarnings("resource")
XWPFWordExtractor extractor = new XWPFWordExtractor(xdoc);
content = extractor.getText();
System.out.println(content);
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}else {
//...
}
}
}

关于MultipartFile 转File,参考这里:http://www.cnblogs.com/hahaxiaoyu/p/5102900.html
后边发现Workbook wb = Workbook.getWorkbook(xxx .getInputStream());转换为输入流,直接读取,这个也挺好用。
content 就是读取出来的内容,insert存数据库,搞定。

poi的jar包对应的用途

Component Application type Maven artifactId Notes
POIFS OLE2 Filesystem poi Required to work with OLE2 / POIFS based files
HPSF OLE2 Property Sets poi
HSSF Excel XLS poi For HSSF only, if common SS is needed see below
HSLF PowerPoint PPT poi-scratchpad
HWPF Word DOC poi-scratchpad
HDGF Visio VSD poi-scratchpad
HPBF Publisher PUB poi-scratchpad
HSMF Outlook MSG poi-scratchpad
OpenXML4J OOXML poi-ooxml plus one of poi-ooxml-schemas, ooxml-schemas Only one schemas jar is needed, see below for differences
XSSF Excel XLSX poi-ooxml
XSLF PowerPoint PPTX poi-ooxml
XWPF Word DOCX poi-ooxml
Common SS Excel XLS and XLSX poi-ooxml WorkbookFactory and friends all require poi-ooxml, not just core poi