T 2005/20 (Text input/SHENZHEN SHI JI GUANG SU INFORMATION TECHNOLOGY) 29-11-2022
Download and more information:
Character input method, device and system
I. The appellant (applicant) appealed against the decision of the examining division refusing European patent application No. 12811503.7.
II. The examining division decided that the subject-matter of the independent claims of the sole request lacked inventive step over the following document:
D1:|WO 2010/112841 A1, 7 October 2010.|
III. In its statement of grounds of appeal, the appellant maintained the sole request on which the contested decision was based.
IV. In a communication accompanying the summons to oral proceedings, the board raised a number of clarity objections against the claims of the sole request and expressed the view that the subject-matter of claim 1 lacked inventive step over document D1.
V. With a letter dated 27 September 2022, the appellant filed an amended set of claims.
VI. Oral proceedings took place on 29 November 2022. At the end of the oral proceedings, the Chair announced the board's decision.
VII. The appellant's final requests were that the decision under appeal be set aside and that a patent be granted on the basis of the set of claims filed with the letter dated 27 September 2022.
VIII. Claim 1 of the sole request reads as follows:
"A text processing method for a language modeling apparatus, the language modeling apparatus comprising a standard Ngram language model module, a cache module, a cache-based language modeling module and a mixed model module, wherein the standard Ngram language model module comprises a first word sequence frequency counting unit, a second word sequence frequency counting unit and a standard conditional probability calculating unit, the cache-based language modeling module comprises a third word sequence frequency counting unit, a fourth word sequence frequency counting unit, a time function value obtaining unit and a cache conditional probability calculating unit, and the mixed model module comprises an interpolation parameter storage unit, a first multiplying unit, a second multiplying unit, a mixed conditional probability calculating unit, a sentence probability calculating unit and an output sentence selecting unit, wherein
the standard Ngram language model module receives a user input, the user input being text, calculates a standard conditional probability of each word in the user input and outputs the standard conditional probabilities to the mixed model module, wherein
the conditional probability of each word is calculated by the standard Ngram language model module as follows:
FORMULA/TABLE/GRAPHIC
wherein C(wi-n+1...wi-1wi) denotes a number of times that a word sequence wi-n+1...wi-1wi emerges in training material of the standard Ngram language model;
wi denotes the ith word in a Chinese sentence; and
n is a preconfigured constant,
the cache module caches a sentence outputted by the mixed model module,
the cache-based language modeling module calculates a conditional probability of each word in the user input according to the user input and sentences cached in the cache module based on a preconfigured cache-based language modeling policy, and outputs the conditional probabilities to the mixed model module, wherein
the cache-based language modeling module calculates the conditional probability of each word according to a following formula:
FORMULA/TABLE/GRAPHIC
wherein C(wi-n+1...wi-1wi) denotes a number of times that the word sequence wi-n+1...wi-1wi emerges in training material of cache-based language model;
wi denotes the ith word in a Chinese sentence;
n is a preconfigured constant; and
f(ti) denotes a time function,
the mixed model module calculates a mixed conditional probability of each word according to the standard conditional probability and the conditional probability calculated by the cache-based language modeling module, obtains a sentence probability of each output sentence according to the mixed conditional probability, and selects and outputs an output sentence with a largest sentence probability, wherein
the mixed model module calculates the mixed conditional probability as follows:
FORMULA/TABLE/GRAPHIC
, wherein alpha is an interpolation parameter which is a constant between 0 and 1, wherein
the sentence probability of the output sentence is calculated as follows:
FORMULA/TABLE/GRAPHIC
wherein m denotes the number of words included in the Chinese sentence,
the first word sequence frequency counting unit obtains a number of times k'i that a word sequence which includes the ith word and a predefined constant number of words before the ith word emerges in the training material of the standard Ngram language model, and outputs k'i to the standard conditional probability calculating unit,
the second word sequence frequency counting unit obtains a number of times k'i-1 that a word sequence which includes the predefined constant number of words before the ith word emerges in the training material of the standard Ngram language model, and outputs k'i-1 to the standard conditional probability calculating unit,
the standard probability calculating unit calculates a ratio of k'i to k'i-1, and takes the ratio as the standard conditional probability of the ith word in the user input,
the third word sequence frequency counting unit obtains a number of times ki that a word sequence which includes the ith word and a predefined constant number of words before the ith word emerges in the cached training material, and outputs ki to the standard conditional probability calculating unit,
the fourth word sequence frequency counting unit obtains a number of times ki-1 that a word sequence which includes the predefined constant number of words before the ith word emerges in the cached training material, and outputs ki-1 to the standard conditional probability calculating unit,
the time function value calculating unit obtains a time function value of the ith word and outputs the value to the cache conditional probability calculating unit,
the cache conditional probability calculating unit calculates a ratio of ki to ki-1, and multiplies the ratio with the time function value of the ith word to obtain the cache conditional probability of the ith word in the user input,
the interpolation parameter storage unit stores an interpolation parameter whose value is configured between 0 and 1 in advance,
the first multiplying unit calculates a product of the interpolation parameter stored in the interpolation parameter storage unit and the standard conditional probability of the ith word and outputs the product to the mixed conditional probability calculating unit,
the second multiplying unit calculates a difference between 1 and the interpolation and calculates a product of the different [sic] and the cache conditional probability of the ith word, and outputs the product to the mixed conditional probability calculating unit,
the mixed conditional probability calculating unit adds the products related to the ith word and takes the sum as the mixed conditional probability of the ith word,
the sentence probability calculating unit calculates a product of the mixed conditional probability of each word to obtain the sentence probability of an output sentence, and
the output sentence selecting unit selects a maximum sentence probability calculated by the sentence probability calculating unit and outputs the output sentence corresponding to the maximum sentence probability."
IX. The appellant's arguments, where relevant to this decision, are discussed in detail below.
1. The application relates to a technique for inputting text.
2. Admission into the appeal proceedings
The current set of claims was filed after the notification of the board's summons to oral proceedings and in response to clarity objections which had been raised for the first time in the board's communication accompanying the summons to oral proceedings. Its admission into the appeal proceedings is therefore justified by an exceptional circumstance, as required by Article 13(2) RPBA 2020.
3. Clarity - Article 84 EPC
3.1 Claim 1 is directed to a text processing method for a language modeling apparatus. The apparatus includes a standard Ngram language model module, a cache-based language modeling module and a mixed model module.
3.2 According to claim 1, the standard Ngram language model module receives user input in text form and calculates a "standard conditional probability"
FORMULA/TABLE/GRAPHIC
for each word in the user input. This probability is calculated by dividing the frequency of occurrence of the word sequence wi-n+1...wi-1wi in a corpus of training material by the frequency of occurrence of the word sequence wi-n+1...wi-2wi-1 in the same corpus, where "wi denotes the ith word in a Chinese sentence". In other words, it is the probability, estimated on the basis of the corpus of training material, that the word wi occurs at the ith position of a (Chinese) input sentence given that wi-n+1...wi-1 are the n-1 preceding words.
3.3 The cache-based language modeling module similarly calculates a "conditional probability"
FORMULA/TABLE/GRAPHIC
for each word wi in the (Chinese) input sentence and outputs the conditional probabilities to the mixed module.
3.4 The mixed model module calculates "mixed conditional probabilities" from the "standard conditional probabilities" received from the standard Ngram language modeling module and the "conditonal probabilities" received from the cache-based language modeling module:
FORMULA/TABLE/GRAPHIC
Since the received probabilities were calculated for the words wi of the (Chinese) input sentence, the skilled person understands that the mixed conditional probabilities are also calculated for the words wi of the input sentence.
3.5 According to claim 1, the mixed model module then obtains a "sentence probability of each output sentence" and selects and outputs an output sentence "with a largest sentence probability".
The skilled reader of the claim would understand that the sentence probability of an output sentence should somehow depend on the output sentence, for example on the words that make up the output sentence.
3.6 However, claim 1 specifies that the sentence probability of "the" output sentence is calculated as follows:
FORMULA/TABLE/GRAPHIC
and the feature "the sentence probability calculating unit calculates a product of the mixed conditional probability of each word to obtain the sentence probability of an output sentence" near the end of the claim confirms that this is how the sentence probability of each (and not just "the") output sentence is calculated.
Hence, according to claim 1 the sentence probability of an output sentence is obtained by multiplying the mixed conditional probabilities of the words wi of the input sentence. This means that the sentence probability of an output sentence depends only on the input sentence, not on the output sentence, and that the sentence probabilities of the output sentences are all identical.
3.7 Since the skilled reader of the claim would understand that the claimed definition of the "sentence probability" of an output sentence cannot be correct, claim 1 is not clear.
3.8 In addition, claim 1 fails to specify how the output sentences relate to the input sentence. The step describing the functionality of the mixed model module refers to "each output sentence", but claim 1 does not specify how the mixed model module obtains the output sentences. This is a further reason why claim 1 is not clear.
3.9 At the oral proceedings, the appellant argued that the invention was specifically intended to improve the prediction accuracy for Chinese text input. In Chinese the same word could have different meanings depending on the context. The user input was entered in alphabetic script, which was then mapped to Chinese characters.
However, the clarity problems identified above are independent of the language and the character set in which the user input is expressed.
3.10 Hence, claim 1 does not meet the requirements of Article 84 EPC.
4. Since the sole request on file cannot be allowed, the appeal is to be dismissed.
For these reasons it is decided that:
The appeal is dismissed.