Gnosis Core Devs Call July 24, 2024
Summary
TLDRDuring a technical call, the team discussed various issues including a car strike affecting airport operations, latency improvements in blockchain technology, and the development of a public dashboard. They also addressed recent updates in infrastructure, research, testing, and client software, including a hot fix for a post-merge network issue. The call concluded with a discussion on the timeline, root cause, and fix for a recent incident involving block production.
Takeaways
- 😀 The meeting attendees discussed the impact of a car strike that led to one person having to sleep in the airport, with issues in Split being mentioned.
- 🛂 An airport in Split experienced a technological meltdown with no computers working, leading to manual handling of flight announcements, tickets, and check-ins.
- 👀 There was suspicion that the car strike might be blamed for the airport issues, but it was suggested that it could be a more technical issue with systems needing rebooting.
- 🔄 The meeting covered updates on infrastructure, with the new cluster migration working as expected and no issues reported from Gateway.
- 📊 A dashboard was introduced for latency improvements and system status observation, with a backend observing incoming transactions and their inclusion into the blockchain.
- 🔍 The team is researching latency issues and working with Nethermind and validators to identify the root causes of latency problems.
- 🚀 Aragon 3's first Alpha version was released, offering significant improvements such as faster state archiving and transaction-level state history granularity.
- 🛠️ A hotfix version 1.2.7.1 of a client was released to address a caching mechanism issue found in post-houra networks, which could cause corruption in certain scenarios.
- 🔄 The team is working on getting Shadow to work, facing issues with block production and debugging contract creation failures.
- 🔒 A discussion about system contract tracing was had, with the need for a way to trace system contract calls to help with debugging.
- 📝 The meeting concluded with a recap of a recent incident involving block production issues on a network, the root cause, and the quick fix implemented.
Q & A
What was the initial issue discussed in the meeting?
-The initial issue discussed was the impact of a car strike on travel, with one participant having to sleep in the airport due to flight disruptions.
What was the problem at the Split airport?
-The problem at the Split airport was a complete technological meltdown, with no computers working, manual flight announcements, handrawn tickets, and manual check-ins.
What was the impact of the technological meltdown on the airport operations?
-The impact included manual processing of transactions, no sitting software assignation, and the bar processing transactions by cash and writing everything down manually.
What is the purpose of the dashboard being developed in the meeting?
-The dashboard is being developed to observe the status of the system, including incoming transactions, decryption key messages, and their inclusion into the blog.
What is the current status of the network according to the dashboard?
-The current status of the network shows not a lot of activity, and the number of transactions and keys included is being monitored.
What is the issue with the block being built without transactions?
-The issue is that the keys don't arrive in time, and the cutoff is currently hardcoded to 3 seconds into the block, which might be causing delays.
What is the plan for improving latency?
-The plan includes building a dashboard, working with Nethermind and the validators to bring down latency, and potentially making the latency value configurable.
What is the current status of the Hive test?
-The Hive test is being developed with a technical plan divided into five parts, with part one already merged and work ongoing on part two.
What was the issue with the NEthermind client?
-There was a hotfix released for NEthermind version 1271 to address an issue with aggressive caching mechanisms that caused potential corruption in caching mechanisms.
What is the significance of the release of Aragon 3?
-Aragon 3 brings significant improvements such as the ability to archive notes in a matter of hours and reduced state history granularity from block level to transaction level.
What is the current issue with Shadow?
-There are two issues with Shadow: one is that it produces invalid blocks, and the other is a more difficult-to-debug issue related to contract creation failures.
What was the timeline and resolution of the issue with the NEthermind client?
-The issue started with the release of version 1270, which had a pre-warming feature that caused problems. The workaround was to disable pre-warming, and a hotfix version 1271 was released to properly address the issue.
What is the current recommendation for validators running the NEthermind client?
-The current recommendation is to run the 1271 experimental version for Shadow, as it has been updated to address the issues found in version 1270.
Outlines
😀 Personal Experiences with Car Strike and Technical Issues
The speaker recounts personal experiences affected by a car strike, leading to an airport sleepover, and discusses the technological meltdown at the Split airport, where all computers were down, causing manual operations for flights, check-ins, and transactions. They mention the suspicion that the car strike might be blamed for the incident, despite it being a technological issue rather than a new attack. The conversation then shifts to the start of a meeting, with the absence of Philip, and an agenda focusing on infrastructure updates and research topics.
🔍 Investigating System Latency and Dashboard Development
The discussion revolves around system latency issues and the development of a custom dashboard in collaboration with Nethermind and G the validators. The dashboard is designed to monitor the system's status and performance, including the number of transactions and key messages. The speaker shares the dashboard and discusses the importance of having all validators running compatible software versions to ensure accurate investigation. The conversation also touches on the technical cutoff for block building and the possibility of making this value configurable.
🛠️ Client Updates and NE Mind Improvements
The speaker provides updates on client developments, including a hotfix for post-houra networks related to an aggressive caching mechanism that caused potential issues. A hotfix version 1271 is released to address this. Additionally, the first Alpha version of Aragon 3 is announced, highlighting significant improvements in state history granularity and note archiving. The conversation also includes troubleshooting with Shadow, uncovering a bug in Nethermind related to block production, and the need for further debugging on contract creation failures.
🔄 Addressing Block Production and System Contract Tracing
The focus is on resolving issues with block production in the Shadow network, where invalid blocks are being generated. A bug in Nethermind is identified and discussed, along with the potential for system contract tracing to aid in debugging. The conversation explores the challenges of implementing tracing for system contracts and the need for a custom RPC system to request specific trace information.
🔧 Refactoring and Testing Updates
Updates on technical plans for the F5 test are shared, with the development divided into five parts, one of which has already been merged. Genesis refactoring and new scripts to expedite test development are mentioned. Additionally, work on Ethereum tests for migration to Genesis is underway, with the aim to run and fix any issues encountered during the process.
🚫 Hotfix Release and Incident Analysis
A hotfix release for a feature in version 1270 of a client is discussed, which was causing edge case issues due to improper hooking up in certain code parts. The hotfix aims to correct this by disabling the feature for houra chains. The conversation also includes an analysis of an incident that occurred with block proposals on a chain, the root cause of which was the pre-warming of blocks feature. A workaround and a hotfix release are provided to address the issue, and advice is given to validators on which versions to run.
📢 Conclusion and Call for Feedback
The meeting concludes with a call for feedback on the 1271 experimental version for validators and a discussion on the importance of testing new features in various scenarios to avoid unexpected issues. The incident timeline and root cause are summarized, and the need for caution in feature implementation is highlighted. The conversation ends with thanks to the participants and a reminder to enjoy the summer.
Mindmap
Keywords
💡car out strike
💡airport
💡technological meltdown
💡migration
💡latency
💡dashboard
💡shutterz
💡block
💡validator
💡Aragon 3
💡hot fix
Highlights
The meeting started with a discussion about a car strike affecting travel plans, with one participant having to sleep in the airport.
There was a technological meltdown at the airport, with no computers working and flights being manually announced.
A manual check-in process was implemented with handwritten tickets and a paper list.
The bar processed transactions by cash and wrote everything down manually.
There was no sitting software, leading to a manual onboarding process on the plane.
The meeting agenda included infrastructure updates, research topics, testing updates, client updates, and a discussion on recent incidents.
The new cluster migration was reported to be working as expected.
Research is being conducted on latency improvements, with a custom dashboard being built in collaboration with Nethermind and validators.
A public dashboard was created to observe the system's status and behavior.
The importance of the number of blocks built using Shutterz transactions was highlighted as a key metric.
A script is being developed to send Shutterz transactions for the next plot whenever a Shutterz block is built.
The discussion on latency customization highlighted the need for a balance between performance and network health.
A hotfix version 1271 was released for Nethermind, addressing issues found in post-houra networks.
Aragon 3's first Alpha version was released, featuring significant improvements in state archiving and state history granularity.
Issues with Shadow block production were discussed, uncovering a bug in Nethermind.
The possibility of implementing tracing for system contracts was discussed to aid in debugging.
A technical plan for the F5 test was developed, with the first part already merged and work ongoing on the second part.
Work has started on Ethereum tests for migration to Genesis, aiming to run tests on Genesis and fix any issues encountered.
The incident involving invalid blocks and the pre-warming feature was discussed, with a hotfix released to address the issue.
A rollback to version 1260 was advised for Shutter validators to avoid issues on the chain.
The importance of testing new features on mainnet before enabling them for POA chains was emphasized.
Transcripts
yeah for sure of course yeah just to
have something we can look at if you can
share the screen yourself and me and
Philip we can we can review the note
later so no
worries yeah absolutely thank you
have any of you been personally affected
by car out
strike I had to sleep in the airport
yesterday oh no I haven't
luckily yeah
which
airport uh
split
oh well I've never been to okay I
haven't been to split very often but
every time I've been there was always a
problem so I think they're I think
they're blaming cross strike for this
one
yeah it was a bit suspicious because
Cloud strike incent should be fixed by
now but it was a complete technological
meltdown no computers were working at
all I mean it depends how quickly they
can fix things uh yeah it should not be
a new attack but if they you know if if
they're out of their depth and then they
have to go and reboot every single
machine one by one it could take days so
yeah yeah they had so the screens were
not working they had to announce every
flight manually tickets were
handrawn and the check in was also
manual with
like yeah a paper list and just crossing
names well it's a good thing the printer
was working
yeah no no I don't think they had a
printer like everything was written by
hand also the bar was also processing
transactions by cash and then writing
down everything manually with a pen ouch
like trust me bro
policy it was funny like there was no
sitting software assignation so
basically you were just onboarded in the
plane and they just like pi whatever s
you want like we'll figure out
later how many people try to get
business
plus I mean it was a Welling flight so
everyone had the same economy economy so
but yeah if it was a that would be
funny uh we
missing
someone so I see Marcos from NE
mine should we get
started yeah to four yeah let's get
started it's
fine okay so thank you for attending uh
today is no SC the call July 24th Philip
cannot make it so I will Le the call
so let's start over let's follow agenda
let's start over with infrastructure any
updates from Gateway no no issues no so
the migration right the new cluster
works as
expected okay thank
you from research topics any updates
from
shatter yeah so we are currently
researching on latency
and lency improvements um therefore uh
we are um building our own dashboard and
uh yeah working together with nethermind
and G the validators um to bring down
like find out the reason for for the
laty we have currently um uh beside that
uh we also built a um srisk as requested
a dashboard which is going to be public
um to observe like the status um of the
the status of the system and how it's
behaving um if you want I can briefly
share my screen and show you what I
have yeah sure that will
be um does it work at the same
time I see I see uh yeah give it a try
other otherwise we can switch Yeah Yeah
I can see I can see your
thatw yeah um yeah still like if if
someone is interested I I can share the
link it's currently public but we
wouldn't want to um share it publicly um
so mainly what's a more General one is
the Observer one so we have built a back
end which is kind of observing the whole
system like incoming transactions
decription key messages and um if they
get included into the blog so um yeah
briefly overview here like uh how system
is performing and how many keys we get
and how many um transactions get
included included So currently there's
not a lot of activity on the network uh
for one I think I believe um there is I
don't know if Gateway is uh running um
are running the shutter shutterz version
of the nethermind c um I
guess yes so it yesterday your colleague
said that you're not running it right
now um but I don't know yeah it's would
be great to to just find this out so
that we can exclude this possibility
because I think it's um five out of
eight validators are run by you guys and
then we can continue with the
investigation because uh sometimes so
the underlying problem is sometimes um
The Block is being built without shiz
transactions uh the reason for it is
that the keys don't arrive in time I
believe the cut off is currently is
hardcoded to 3 seconds into the block
into the slot so um yeah we currently
investigating on the performance so that
we get bring that down but also we
discussed with another mind team that we
can make this value uh
configurable um but yeah for
investigation it would be great that we
know all of the valid datas which are
registered are running with the
software um yeah so so this will be
extended also we will shared in the
shuta group and um also if you have uh
metrics you want to see um feel free to
to suggest yeah and give
feedback I guess can you go back to the
dashboard so the received key shirts and
decryption Keys should it
match
um yes where where it should match I
have to check with the guys who who
built the metric because uh could also
like the key shares could also be um
more sh there are more shares than
decryption keys right I see so on once
we have one keyer slot that means we can
the cryp the transactions so in a way
this is aoxy for
adoption uh yes yes exactly so you could
instead of computing I guess this is a
very small interval if you do uh like
receive decryption
keys in an average time let's say of I
don't know like one hour divided by five
then you would get the adoption which
would be an interesting yes yeah that's
right so so basically here you can see
like whenever we um there is a shutterz
validator in in the decryption Keys per
slot so you can see it's not every slot
and but it's every validator like it's
every slot where the validator is
registered and um so that's a Mis
misleading metric currently so there's
always one empty key which is going to
be produced so that we see see the Keers
are still alive it's it's kind of like
the slot key um but if this number
increases as well then we it's a metric
also for um uh adoption
basically well thank you okay and then
we if you want can you share this
in nosis geps channel is that private
now for you yeah I think that's fine
yeah can do that and also we have
addition additional insights on
connectivity and pinks so I think it's
open for for everyone who got the
link Fredick two two questions uh maybe
I missed the the obvious one but is has
any block being built uh using shuttered
uh mol transactions yes yes many blocks
yeah okay but that would be a metric
that I would be to me the number one
metric I'd be looking at like how many
blocks have been built in this
way yeah but for that we need uh so so
yeah then um what do you mean by build
in this way so if there's no transaction
then uh it's obviously just build
normally right okay yeah with at least
one transaction in it I mean yeah yeah
exactly and and for that we are
currently also um uh writing a script
which sends like whenever there is a
shutterz block it sends uh shutterz
transactions for the next plot
so that we can exactly build this metric
cool all right and the second one more
comment you mentioned like customizing
this latency I believe we cannot just uh
in NE mind kind of arbitrarily uh
customize the latency up just because uh
yeah other other validators would not
accept my block but I'm not sure if
that's what you meant yeah yeah that's
that's completely right so I mean at T
like time equal zero for the slots the
the block needs to be least so I guess
at one point there needs to be a cut off
um the the rather the question more
would be like what is the good value
which we can have in the short term so
that until we have like significantly
improved on the latency and instead of
having it hardcoded which would always
require a new release to change the
value we make it configurable but
obviously it should have
boundaries yeah all right so there is a
maximum value until the like where the
network becomes unhealthy at one point
right that's I think from our side
everything thank you so much for showing
that uh I guess for the recording can we
maximize the notes
somehow I'm not sure that done automatic
yeah for sure let
check okay
great thank
you I don't think there is any other
topic regarding research so let's move
to testing any updates regarding
Hive yeah hello uh during the past week
we developed a technical plan for the F5
test uh the final improvs will be
divided into five parts and we have
already merged part one which includes
all the latest Upstream updates uh
currently I am working on the second
part we which will be include Genesis
refactoring to use the same approach as
substream and also adding new scripts
and tools to uh speed up test
development and the
thank you and uh regarding yeah
Marcos yeah also we started working on
the ethereum uh test to do the migration
to to
genosis and so that's also something
that we wanted to share uh we the
process will be like uh trying to run
those no those test for nether mine
which currently in the stop that the
Repository has is not possible this hard
coder for B and and and GE but uh we are
working on that part now then we'll be
like try to run them on genosis and fix
any any problem that we may see in the
in the
progress Co yeah thank you I was just G
to ask for that so that's pretty
exciting yeah give him some time for AR
yeah don't worry we'll take cover it's
fine just it's a good good T LIF there
okay uh client updates NE mind good to
see you Lucas thanks for
coming uh no
problem uh
so uh one important thing we have
released a hot fix version
1271 uh this includes a hot fix for
uh post houra networks mostly in in the
context and it was found on chadow issue
so because of in 127 we have some very
aggressive cashing mechanism that uh
increase uh increase the
performance uh we
actually explicitly didn't hook them up
in in in the hour to chains at the
beginning uh which was a mistake because
um uh well the feature was kind if if if
the feature wasn't disabled explicitly
it it Creed out to some parts of the
code and while it wasn't hooked up
correctly other parts H and it caused a
potential Edge case issue that uh after
uh some specific combination of a bad
block that's being produced uh then
might be some corruption in in in some
kind of caching mechanism so the
workaround would be to turn off this
feature uh but in the default conf
configs it is turned on so we released a
hot fix and the hot fix correctly wires
it up for houra so we uh we were
debating should we just turn the feature
off in the hot fix or just or just port
a code that's was already in master that
uh hooks it up um for other chains also
and we decided to to do the L so uh yeah
so it's there I can also send in the
chat uh if someone wants to use 127 and
don't update I can also send the uh how
to turn this feature off uh so he could
also use
1270 so that's basically
it okay thank you yeah I was going to
touch on the incident but we can do that
at the
end
uhon so yesterday we released the first
Alpha version of Aragon 3 so we've been
working on it for years and it's still
it's still on
Alpha but what Aragon 3 brings uh it it
brings two two significant
improvements um and to like now you can
Sy archive note in a matter of hours not
days and also the granularity of State
history is reduced from the Block Level
to the transaction level um and um yeah
and under the hood uh it's been a lot of
work but it's it's not polished yet but
everyone everyone curious is of
course uh like we we highly recommend
that people start playing with it and
experimenting with with the
alpha okay
thanks uh next
G yeah so I'm working on getting Shadow
to work um I have two issues currently
one of them is that yeah block produced
are invalid um so but like block
production is working simply it produces
invalid blocks uh that triggered an
issue and uncovered a bug in in nether
mind so you're welcome I guess um and uh
the other problem this one is a bit more
difficult to
debug um yeah I don't exactly know
what's wrong because it's uh I mean I'm
still I'm still debugging that's why I
was hoping uh to get more info as to
what uh went wrong with nether mine in
case the blocks I produced we the issue
but it's not uh
so I don't know looks like okay from
what I can tell right now is every time
there's a contract creation it uh it
fails but yeah it's just it's just an
intuition I've been debugging I I don't
know so one thing I would appreciate um
if a couple weeks ago we talked about
the ability to trace
uh was like system contracts as well
yeah
wuk uh yeah so the invalid blog had only
withdrawals in it so it would be either
withdrawals contract or uh or block
reward contract that you you have issues
with
interesting cool yes so that that might
help for the first BG I was describing
unfortunately uh it's uh I need to fix
the second one before I can really
tackle the first um but yeah so at some
point we with M we were talking about
the possibility to implement tracing
also for system contracts um I would
love to have this feature so that I can
see if what I'm like the system
contracts I'm calling on Shadow have an
issue uh but yeah otherwise I'm a bit in
the dark but I keep I keep looking and
hopefully I'll find something um yeah
that's pretty much yep so for system
contract tracing the only confusing
thing is where the traces would be
attached to because normally the traces
are uh in order and for transactions in
block and those transactions are not
explicitly in Block so yeah it's it's a
bit confusing and we would have to
probably make it as a option to to to
trace them and not
default so things like that so I mean
that's fine by me we also don't have to
make it an official release if that's
too complicated I'm happy to have a
custom RPC system where I can just uh
ask ask for it if it's a prototype I
don't really care it's just about
figuring out what the what the output
sorry what the updates
are uh that's
all oops uh oh yes I'm yes it is yeah
sorry I wasn't sure if you m
it okay thank you um I guess next one is
ref can you CL it
down it's Frozen for me the the video
but I can
see anyway I assume the next is R we
don't have more
clients so on my end
uh I did the pr to ref so hopefully to
make the
easier uh the if you are aware the
current strategy is not to Fork the
client but buil on it its
extensibility importing W as a
library uh they want to support that and
it's working okay so far but diff is
pretty big so we will be working on
having examples on the repo for the type
of things that we have to modify for
nosis and they they have been receptive
to minimize this div so that's good
meanwhile uh the fix that lucash gave me
for the Genesis block on definites works
so thank you and I'll see how far I can
get uh with that at least to work on um
post houra while I'll figure out the
block properties
issue cool so that's it for
clients and I don't think we have any
other topic of research so maybe we can
just talk a little bit at least just
just explain what happened even though
we just repeat what we said in the group
so if does anyone want to
volunteer to explain a bit of timeline
of the issue root cause and
fix not sure if ukash want to chime in
again about the very technical stuff I
technical but I don't remember the
timeline
details okay the uh the timeline
actually I'm not having that in front of
my eyes right now but as I recall it
started at uh last Friday last Friday
yeah uh uh isn't that last Wednesday
because that's when I ran the first
that's when I managed to join the the
chain uh yeah but uh the issue was that
you joined the chain with GE note and G
started to propose some blocks right but
uh those blocks were seemed to be in
line with the chain right so at least
every invalid block which was proposed
by G was following the
proper head number of the head right and
there was one block uh which was
proposed by GFF which was at least from
the block number looked like it was 24
blocks behind the current head and at
this very moment it triggered so this
issue which was explained by ukash
before so one of the most recent
features uh from 1270 version so
pre-warming of the blocks uh C that uh
we treated a invalid valid block as a
invalid right uh the simplest workaround
of that as we managed to find out later
on was to use the flag which wookash
posted on the chat so disable disabling
the
pre-warming but uh afterwards we
analyzed it and suddenly we fixed that
on Master Branch but yet we did not uh
release the 1280 version because it's
still under
testing and yeah so we released 1271
version which properly hooks up
everything in in that
feature uh and yeah the uh 1271 is
released it's released on D note as well
and we also released 1271 experimental
version for shutter so shutter version
is also updated so shutter validators
can go back to the 1271 if if if needed
but in meantime during the problematic
situation we
also rebased our shutter version back to
1260 as soon as possible uh and we
advised Gateway validators and our twin
St validators uh to to move to this
version on gnosis to to avoid such
situation on chain
and yeah from from
the the most important information was
that 127 version 1270 version was the
only one affected we had some notes uh
on 126 Z which were not affected so we
confirmed before before finding out the
root CA that only version affected was
the latest one
1270 not sure if something more needs to
be added yeah
frck uh yeah just a quick question so um
the
1271 shutter experimental version is the
one which you would advise to run for
the validators right now yes yes yes we
updated the image uh as it is unofficial
uh we are just updating it uh very
quickly so once we had a 1271 released I
just Char picked the version to this
release branch and we rebuilded the
image so 1271 shuttered the the name
name Remains the Same of the okay great
uh so since right now I think the only
validators running it is Gateway and
nether mind um it would be great to uh
get um a feedback like once um all
validators or shutterz registered
validators are running this version yeah
I've asked yesterday uh twin St to
confirmed that they are running 1260 and
I advised to upgrade that we'll check
today if they had a
chance the guy who is managing that is
had a chance to
update would be awesome if you yeah if
you can let me know so that we can uh
continue with the testing yeah y yeah
sure definitely we'll we'll let you know
today and also from important point of
view important point is that it was
pretty much Edge scenario something
which we didn't thought that it could
happen
because it was a invalid block which
should be on some very specific State
not not the recent State uh yeah so we
didn't expect that situation thankfully
it happened on chiato so thanks gilam
for for actually noticing that bring the
yeah so so my reasoning was to make a
stage roll out and we tested mostly on
Main net so go to main net first that's
why this wasn't yet in the uh release
for enabled for POA chains but what we
should have done potentially is
explicitly disabled the feature in in
the configs uh for our rough chains
because it kind of messed up uh the cod
in a repl right so that that's uh that's
the uh like kind of root cause
so being uh being a bit too cautious
there caused the issue in in in a way uh
which is
ironic okay thank you so much for the
details very appreciated and thanks for
fixing it
quickly and I think that's all any other
topic anyone wants to bring up or
discuss
so let's call it here thank you so much
everything for attending enjoy the
summer take some
[Music]
time and
yeah thanks thank you for much thank
you
guys bye
関連動画をさらに表示
[SIG-Network] Ingress NGINX meeting 20210706
Blue Screen of Death(BSOD) | CrowdStrike’s Mistake: Inside the Microsoft Outage |Must Watch
Webinar | Navigating Bullying and Sexual Harassment in the Workplace
Special report: Major computer outages occur worldwide
Linux gegen CrowdStrike?
SpaceX Starlink Revealed This By Accident
5.0 / 5 (0 votes)