HugginFace 使用管道工具(学习笔记)

电脑版发表于：2023/10/24 16:28

![](https://img.tnblog.net/arcimg/hb/782c293bc3904ab0bb30af5ff454beae.png)
>#HugginFace 使用管道工具(学习笔记)
[TOC]

## 管道工具介绍

tn2>HuggingFace 有一个巨大的模型库，其中一些是已经非常成熟的经典模型，这些模型使不进行仍和训练也可以得到比较好的预测结果，也就是场所的Zero Shot Learning。
使用管道工具时，调用者需要做的只是告诉管道工具进行的任务类型，管道工具自动分配合适的模型，直接给出预测结果，如果这个预测结果对于调用者已经可以满足则不需要再次训练。。
管道工具的API简洁，隐藏了很多大量复杂的代码。

## 使用管道工具

### 常见任务演示

#### 文本分类

tn2>使用管道工具处理文本分类任务，代码如下：

```python
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

result = classifier("I hate you")[0]
print(result)

result = classifier("I love you")[0]
print(result)
```

tn2>把任务类型输入`pipeline()`函数中，返回值即为能执行具体预测任务的`classifier`对象，如果向具体的句子输入该对象，则会返回具体的预测结果。
举例预测`I hate you`和`I love you`两种情感分类，结果如下：

![](https://img.tnblog.net/arcimg/hb/f12a2eb208cd4e0789295f75d2615b9d.png)

tn2>从运行来看，前者结果为`NEGATIVE`（消极）后者结果为`POSITIVE`（积极），并且准确率很高。

#### 阅读理解

tn2>使用管道工具处理阅读理解任务，代码如下：

```python
#第5章/阅读理解
from transformers import pipeline

question_answerer = pipeline("question-answering")

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a 
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune 
a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
"""

result = question_answerer(
    question="What is extractive question answering?",
    context=context,
)
print(result)

result = question_answerer(
    question="What is a good example of a question answering dataset?",
    context=context,
)

print(result)
```

tn2>在这段代码中，首先以`question-answering`为参数调用了`pipeline()`函数，得到了`question_answerer`对象。
在调用`question_answerer`函数时应当传入相应的`question`问题和`context`内容，问题必须在`context`中有写到。

![](https://img.tnblog.net/arcimg/hb/7e457966fefe4aadb560784cd22a0f59.png)

tn2>这里的第一个问题翻译成中文是`什么是抽取式问答？`，模型给出的答案翻译成中文是`从给定文本中提取答案的任务`。
第二个问题是`问答数据集的一个好例子是什么？`，模型给出的答案翻译成中文是`SQuAD 数据集`。

#### 完形填空

tn2>使用管道工具处理完形填空，代码如下：

```python
#第5章/完形填空
from transformers import pipeline

unmasker = pipeline("fill-mask")

from pprint import pprint

sentence = 'HuggingFace is creating a <mask> that the community uses to solve NLP tasks.'

unmasker(sentence)
```

tn2>在这段代码中，`sentence`是一个句子，其中某些词`<mask>`符号替代了，表明这是需要让模型填空的空位，运行结果如下：

![](https://img.tnblog.net/arcimg/hb/ed8a3fcf51a04873880da05be7a3f9e1.png)

tn2>中文翻译是：HuggingFace正在创建一个社区用户，用于解决NLP任务的___。
它给了5个答案:`工具`、`框架`、`库`、`数据库`、`原型`。

#### 文本生成

```python
#第5章/文本生成
from transformers import pipeline

text_generator = pipeline("text-generation")

text_generator("As far as I am concerned, I will",
               max_length=50,
               do_sample=False)
```

tn2>这段代码中，得到了`text_generator`对象后，直接调用`text_generator`对象，入参为一个句子的开头，让`text_generator`接着往下续写，参数`max_length=50`表明要续写的长度运行结果如下：

![](https://img.tnblog.net/arcimg/hb/f688ca4fd05c4db88d861723c06458e0.png)

tn2>翻译：就我而言，我将是第一个承认我不喜欢“自由市场”的想法的人。我认为自由市场的想法有点牵强。我认为这个想法。

#### 命名实体

tn2>命名实体识别任务为找出一段文本中的人名、地名、组织机构名等。
使用管道工具命名实体识别任务，代码如下：

```python
from transformers import pipeline

ner_pipe = pipeline("ner")

sequence = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""

for entity in ner_pipe(sequence):
    print(entity)
```

![](https://img.tnblog.net/arcimg/hb/120cfaeb976049e683e09c0762485936.png)

tn2>可以看到结果当中对所有的都进行了一定的名称类型区分。

### 文本摘要

tn2>简单来说，简化。代码如下：

```python
from transformers import pipeline

summarizer = pipeline("summarization")

ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""

summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)
```

tn2>这里的`ARTICLE`是我们的文本内容，指定摘要的内容为`30-130`个词。

![](https://img.tnblog.net/arcimg/hb/d3c9b0ab52a34043a5d0b126fa7412a8.png)

#### 翻译

tn2>这里我们将英语的翻译成德语。

```python
from transformers import pipeline

translator = pipeline("translation_en_to_de")

sentence = "Hugging Face is a technology company based in New York and Paris"

translator(sentence, max_length=40)
```

![](https://img.tnblog.net/arcimg/hb/cf2acd25f0dc4108a5394f7857c115b2.png)

tn>由于默认的翻译任务底层调用的是`5t-base`模型，该模型只支持由英语翻译为德语、法语、罗马尼亚文，如果需要支持其他语言则需要替换模型。

## 替换模型执行任务

### 替换模型执行中译英任务

```python
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM

#要使用该模型，需要安装sentencepiece
# !pip install sentencepiece

tokenizer = AutoTokenizer.from_pretrained("KennStack01/Helsinki-NLP-opus-mt-zh-en")
model = AutoModelForSeq2SeqLM.from_pretrained("KennStack01/Helsinki-NLP-opus-mt-zh-en")


translator = pipeline(task="translation_zh_to_en",
                      model=model,
                      tokenizer=tokenizer)

sentence = "我叫萨拉，我住在伦敦。"

translator(sentence, max_length=20)
```

![](https://img.tnblog.net/arcimg/hb/32bc124e499248b0b85b03809f671144.png)

### 替换模型执行英译中任务

```python
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM

#要使用该模型，需要安装sentencepiece
# !pip install sentencepiece

tokenizer = AutoTokenizer.from_pretrained("KennStack01/Helsinki-NLP-opus-mt-en-zh")
model = AutoModelForSeq2SeqLM.from_pretrained("KennStack01/Helsinki-NLP-opus-mt-en-zh")


translator = pipeline(task="translation_en_to_zh",
                      model=model,
                      tokenizer=tokenizer)

sentence = "My name is Sarah, and I live in London."

translator(sentence, max_length=20)
```

![](https://img.tnblog.net/arcimg/hb/7dae0f65252b41199dee5e839e44024c.png)

尘叶心繁

HugginFace 使用管道工具(学习笔记)

{{item.title}}