Theme NexT works best with JavaScript enabled
0%

2021-09-10-组会

^ _ ^

From 2021-09-04 to 2021-09-10
Fix bugs of project(cj_text_analyze)

Core Logic Bug Fix

1
2
3
4
5
6
7
8
9
# dis = simple_distance(pre_order, cur_order)
# dis = edit_distance(pre_order[0:len(cur_order)], cur_order)
dis = cos_distance(pre_order, cur_order, state.element_num)
if dis > 0:
sub_score = 2**dis
logger.info(f"dis={dis};棋盘状态加分={score}-{sub_score}")
score -= sub_score
else:
logger.info(f"dis={dis};棋盘状态加分={score}")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# 逐一匹配计算差异
def simple_distance(pre_order, cur_order):
dis = 0
for i, semantic in enumerate(cur_order):
if i >= len(pre_order) or pre_order[i] != semantic:
dis += 1
return dis

# 计算数组之间不同元素个数差异
def count_distence(arr1, arr2):
return max(len(set(arr1) - set(arr2)), len(set(arr2) - set(arr1)))

# 计算数组之间的编辑距离
def edit_distance(arr1, arr2):
dp = [[i + j for j in range(len(arr2)+1)] for i in range(len(arr1)+1)]
for i in range(1, len(arr1)+1):
for j in range(1, len(arr2)+1):
d = 1
if arr1[i-1] == arr2[j-1]:
d = 0
dp[i][j] = min(dp[i-1][j]+1, dp[i][j-1]+1, dp[i-1][j-1]+d)
return dp[len(arr1)][len(arr2)]

# 计算数组之间余弦距离(会扩大一定倍数)
def cos_distance(pre_order, cur_order, element_num):
pre_vector = [0 for _ in range(element_num)]
for i, item in enumerate(pre_order):
index = element_dict[item]
pre_vector[index] = i + 1
cur_vector = [0 for _ in range(element_num)]
for i, item in enumerate(cur_order):
index = element_dict[item]
cur_vector[index] = i + 1
logger.info(f"pre_vector={pre_vector}")
logger.info(f"cur_vector={cur_vector}")
v1, v2 = pre_vector, cur_vector
cross = sum([v1[i]*v2[i] for i in range(len(v1))])
v1_std = sum([item*item for item in v1])**0.5
v2_std = sum([item*item for item in v2])**0.5
result = cross/(v1_std*v2_std)
C = 0.8
return abs(result-1) * C

Bussiness Bug Fix

Supported

Week_Trans

demand: word like “(本/上)周五”、”今天”、”明天” trans to formatted date string, such as “2021-09-10”
interface: https://github.com/ryanInf/Time-NLPY
supporter: QianJin Xiang
reback: Useful
current state: bug closed

Abbreviation_Match

demand: Sometimes user use an abbreviation of a noun, which is possiblly not be exact match in database.An idea is generating an abbreviation dictionary using the existing database.
interface: A excel which is generated by a python script(based on dictionary tree).
supporter: NengZheng Jin
reback: Correct but unavailable
current state: Use other methods to fix the bug, bug closed.

Other Normal Bussiness Bug

Composed Element

Some Composed Element will contain 2 or more element in one token, the procedure should split it at a good time.

Situation-01
Some elements like Residual_Maturity might be “6L”, “2.8+3.2”. The first value need to be divided evenly to “Agency_Fee” and “Ticket_Fee”, while the second assign “2.8” to “Agency_Fee” and “3.2” to “Ticket_Fee”.

Situation-02
When we encouter some special cases, we need to copy one instruction into multiple instructions, but not all the instruction are same.
For example, we might encouter 3 product name consequently.