Files
wiki_crawler/backend/__pycache__/service.cpython-313.pyc

27 lines
6.4 KiB
Plaintext
Raw Normal View History

2025-12-20 17:08:54 +08:00
<EFBFBD>
gFie<00><00>^<00>SSKJrJrJr SSKJr SSKJr SSK J
r
"SS5r \ "5r g) <09>)<03>select<63>update<74>and_)<01>insert<72>)<01> db_instance)<01> normalize_urlc<00>^<00>\rSrSrSrS\4SjrS\S\4Sjr S\S\4S jr
S\S
\4S jr S r g )<0E>CrawlerService<63>c<00><00>[Ulg)N)r<00>db)<01>selfs <20>'E:\Proj\wiki_crawler\backend\service.py<70>__init__<5F>CrawlerService.__init__s <00><00><1D><04><07><00>urlc <00>j<00>[U5nURRR5n[ URR
R R5RURR
R RU:H5nURU5R5nU(aUSSS.sSSS5 $UR[URR
5RUS9RURR
R R55R5nUSnUR[URR5RXrSS95 USS.sSSS5 $!,(df  g=f) u!注册新任务并初始化队列rF)<02>task_id<69> is_new_taskN)<01>root_url<72>pending<6E>rr<00>statusT)r r<00>engine<6E>beginr<00>tasks<6B>c<>id<69>wherer<00>execute<74>fetchone<6E> pg_insert<72>values<65> returning<6E>queue)rr<00> clean_url<72>conn<6E> find_stmt<6D>existing<6E>new_taskrs r<00> register_task<73>CrawlerService.register_task
s4<00><00>!<21>#<23>&<26> <09> <11>W<EFBFBD>W<EFBFBD>^<5E>^<5E> !<21> !<21> #<23>t<EFBFBD><1E>t<EFBFBD>w<EFBFBD>w<EFBFBD>}<7D>}<7D><EFBFBD><EFBFBD>1<>1<>2<>8<>8<><14><17><17><1D><1D><1F><1F>9Q<39>9Q<39>U^<5E>9^<5E>_<>I<EFBFBD><1B>|<7C>|<7C>I<EFBFBD>.<2E>7<>7<>9<>H<EFBFBD><17>#+<2B>A<EFBFBD>;<3B>u<EFBFBD>E<> $<24> #<23><1C>|<7C>|<7C><19>$<24>'<27>'<27>-<2D>-<2D>(<28>/<2F>/<2F><19>/<2F>C<>M<>M<>d<EFBFBD>g<EFBFBD>g<EFBFBD>m<EFBFBD>m<EFBFBD>o<EFBFBD>o<EFBFBD>N`<60>N`<60>a<><0E><16>h<EFBFBD>j<EFBFBD> <15><1F>q<EFBFBD>k<EFBFBD>G<EFBFBD> <11>L<EFBFBD>L<EFBFBD><19>$<24>'<27>'<27>-<2D>-<2D>(<28>/<2F>/<2F><07>W`<60>/<2F>a<> <0E> '<27>t<EFBFBD><<3C>%$<24> #<23> #<23>s<00>BF$<03>CF$<03>$
F2r<00>urlsc<00><><00>SnURRR5nUHsn[U5n[ URR
5R UUSS9RSS/S9nURU5nURS:<3A>dMnUS- nMu SSS5 S U0$!,(df  S U0$=f)
u7批量存入新发现的待处理 URL自动去重rrrrr)<01>index_elementsrN<> added_count)
rrrr r$r'r%<00>on_conflict_do_nothingr"<00>rowcount) rrr/r2r)rr(<00>stmt<6D>ress r<00>add_urls<6C>CrawlerService.add_urls!s<><00><00><17> <0B> <11>W<EFBFBD>W<EFBFBD>^<5E>^<5E> !<21> !<21> #<23>t<EFBFBD><1B><03>)<29>#<23>.<2E> <09> <20><14><17><17><1D><1D>/<2F>6<>6<>#<23>!<21>$<24>7<><12>)<29>(<28><19>E<EFBFBD>8J<38>(<28>K<> <15> <1B>l<EFBFBD>l<EFBFBD>4<EFBFBD>(<28><03><16><<3C><<3C>!<21>#<23><1F>1<EFBFBD>$<24>K<EFBFBD><1C>$<24><1E>{<7B>+<2B>+<2B>$<24> #<23><1E>{<7B>+<2B>+<2B>s<00>A-B-<03> B-<03>-
B><07>limitc <00><><00>URRR5n[URRR
R 5R[URRR
RU:HURRR
RS:H55RU5nURU5R5Vs/sHoUSPM nnU(a<>UR[URR5R[URRR
RU:HURRR
R RU555R!SS95 SSS5 SU0$s snf!,(df  SW0$=f)u&原子化获取待处理 URL 并锁定rr<00>
processing<EFBFBD>rNr/)rrrrr'rrr!rrrr9r"<00>fetchallr<00>in_r%)rrr9r)r5<00>rr/s r<00>get_pending_urls<6C>CrawlerService.get_pending_urls2sQ<00><00> <11>W<EFBFBD>W<EFBFBD>^<5E>^<5E> !<21> !<21> #<23>t<EFBFBD><19>$<24>'<27>'<27>-<2D>-<2D>/<2F>/<2F>-<2D>-<2D>.<2E>4<>4<><14>T<EFBFBD>W<EFBFBD>W<EFBFBD>]<5D>]<5D>_<EFBFBD>_<EFBFBD>,<2C>,<2C><07>7<><14><17><17><1D><1D><1F><1F>9O<39>9O<39>S\<5C>9\<5C>]<5D><0E><13>e<EFBFBD>E<EFBFBD>l<EFBFBD> <11>#'<27>,<2C>,<2C>t<EFBFBD>"4<>"=<3D>"=<3D>"?<3F>@<40>"?<3F>Q<EFBFBD>a<EFBFBD>D<EFBFBD>"?<3F>D<EFBFBD>@<40><13><14> <0C> <0C><1A>4<EFBFBD>7<EFBFBD>7<EFBFBD>=<3D>=<3D>)<29>/<2F>/<2F><1C>T<EFBFBD>W<EFBFBD>W<EFBFBD>]<5D>]<5D>_<EFBFBD>_<EFBFBD>4<>4<><07>?<3F><14><17><17><1D><1D><1F><1F>AT<41>AT<41>AX<41>AX<41>Y]<5D>A^<5E>_<><16><1C>f<EFBFBD>L<EFBFBD>f<EFBFBD>1<><12>$<24><17><04>~<7E><1D><>A<01> $<24> #<23><17><04>~<7E><1D>s<00>CG<03>8G<06>B?G<03>G<03>
G'<07>resultsc <00><><00>URRR5nUGH n[UR5nUR [ URR5RUUURURURS95 UR [URR5R[URRR R"U:HURRR RU:H55RSS95 GM# SSS5 S[%U50$!,(df  N=f)u*保存正文、向量并闭环队列状态)r<00>
source_url<EFBFBD>title<6C>content<6E> embedding<6E> completedr<N<>inserted)rrrr rr"r$<00>chunksr%rErFrGrr'r!rrr<00>len)rrrBr)r6r(s r<00> save_results<74>CrawlerService.save_resultsCs<00><00> <11>W<EFBFBD>W<EFBFBD>^<5E>^<5E> !<21> !<21> #<23>t<EFBFBD><1E><03>)<29>#<23>'<27>'<27>2<> <09><14> <0C> <0C><1D>d<EFBFBD>g<EFBFBD>g<EFBFBD>n<EFBFBD>n<EFBFBD>-<2D>4<>4<> '<27>#,<2C>!<21>i<EFBFBD>i<EFBFBD> #<23> <0B> <0B>"%<25>-<2D>-<2D> 5<><16><12><15> <0C> <0C><1A>4<EFBFBD>7<EFBFBD>7<EFBFBD>=<3D>=<3D>)<29>/<2F>/<2F><1C>T<EFBFBD>W<EFBFBD>W<EFBFBD>]<5D>]<5D>_<EFBFBD>_<EFBFBD>4<>4<><07>?<3F><14><17><17><1D><1D><1F><1F>AT<41>AT<41>Xa<58>Aa<41>b<><16><1C>f<EFBFBD>K<EFBFBD>f<EFBFBD>0<><12><1F>$<24>&<1B>C<EFBFBD><07>L<EFBFBD>)<29>)<29>'$<24> #<23>s <00>D(E"<03>"
E0)rN) <0A>__name__<5F>
__module__<EFBFBD> __qualname__<5F>__firstlineno__r<00>strr-<00>int<6E>listr7r@rL<00>__static_attributes__<5F>rrr r sL<00><00><1E>=<3D><13>=<3D>.,<2C><03>,<2C>4<EFBFBD>,<2C>"<1E><03><1E>C<EFBFBD><1E>"*<2A>C<EFBFBD>*<2A>$<24>*rr N) <0A>
sqlalchemyrrr<00>sqlalchemy.dialects.postgresqlrr$<00>databaser<00>utilsr r <00>crawler_servicerVrr<00><module>r\s+<00><01>+<2B>+<2B>><3E>!<21> <20>R*<2A>R*<2A>j!<21>"<22>r