# 逐一匹配计算差异 defsimple_distance(pre_order, cur_order): dis = 0 for i, semantic in enumerate(cur_order): if i >= len(pre_order) or pre_order[i] != semantic: dis += 1 return dis
# 计算数组之间的编辑距离 defedit_distance(arr1, arr2): dp = [[i + j for j in range(len(arr2)+1)] for i in range(len(arr1)+1)] for i in range(1, len(arr1)+1): for j in range(1, len(arr2)+1): d = 1 if arr1[i-1] == arr2[j-1]: d = 0 dp[i][j] = min(dp[i-1][j]+1, dp[i][j-1]+1, dp[i-1][j-1]+d) return dp[len(arr1)][len(arr2)]
# 计算数组之间余弦距离(会扩大一定倍数) defcos_distance(pre_order, cur_order, element_num): pre_vector = [0for _ in range(element_num)] for i, item in enumerate(pre_order): index = element_dict[item] pre_vector[index] = i + 1 cur_vector = [0for _ in range(element_num)] for i, item in enumerate(cur_order): index = element_dict[item] cur_vector[index] = i + 1 logger.info(f"pre_vector={pre_vector}") logger.info(f"cur_vector={cur_vector}") v1, v2 = pre_vector, cur_vector cross = sum([v1[i]*v2[i] for i in range(len(v1))]) v1_std = sum([item*item for item in v1])**0.5 v2_std = sum([item*item for item in v2])**0.5 result = cross/(v1_std*v2_std) C = 0.8 return abs(result-1) * C
Bussiness Bug Fix
Supported
Week_Trans
demand: word like “(本/上)周五”、”今天”、”明天” trans to formatted date string, such as “2021-09-10” interface: https://github.com/ryanInf/Time-NLPY supporter: QianJin Xiang reback: Useful current state: bug closed
Abbreviation_Match
demand: Sometimes user use an abbreviation of a noun, which is possiblly not be exact match in database.An idea is generating an abbreviation dictionary using the existing database. interface: A excel which is generated by a python script(based on dictionary tree). supporter: NengZheng Jin reback: Correct but unavailable current state: Use other methods to fix the bug, bug closed.
Other Normal Bussiness Bug
Composed Element
Some Composed Element will contain 2 or more element in one token, the procedure should split it at a good time.
Situation-01 Some elements like Residual_Maturity might be “6L”, “2.8+3.2”. The first value need to be divided evenly to “Agency_Fee” and “Ticket_Fee”, while the second assign “2.8” to “Agency_Fee” and “3.2” to “Ticket_Fee”.
Situation-02 When we encouter some special cases, we need to copy one instruction into multiple instructions, but not all the instruction are same. For example, we might encouter 3 product name consequently.